递归根据大小对文件进行排序

我需要在文件夹中找到最大的文件。
如何递归扫描文件夹并按大小对内容排序？

我尝试使用ls -R -S，但这也列出了目录。
我也尝试使用find。

是否要分别列出每个子目录中的文件，还是要查找所有子目录中的所有文件并按大小列出它们，而不管它们位于哪个子目录中？另外，“目录”和“文件夹”是什么意思？您似乎正在使用它们来描述不同的事物。

您是在说只想列出给定目录中的文件及其子目录中的文件，而不只是显示子目录吗？请尝试清理您的问题，这不是很清楚。

相关的unix.stackexchange.com/questions/158289/…

#1 楼

您也可以仅通过du来执行此操作。为了安全起见，我正在使用以下版本的du：

$ du --version
du (GNU coreutils) 8.5

方法：

$ du -ah ..DIR.. | grep -v "/$" | sort -rh

方法的分解
命令du -ah DIR将产生一个给定目录DIR中所有文件和目录的列表。 -h会产生我喜欢的人类可读尺寸。如果您不希望它们，请放弃该开关。我使用head -6只是为了限制输出量！

$ du -ah ~/Downloads/ | head -6
4.4M    /home/saml/Downloads/kodak_W820_wireless_frame/W820_W1020_WirelessFrames_exUG_GLB_en.pdf
624K    /home/saml/Downloads/kodak_W820_wireless_frame/easyshare_w820.pdf
4.9M    /home/saml/Downloads/kodak_W820_wireless_frame/W820_W1020WirelessFrameExUG_GLB_en.pdf
9.8M    /home/saml/Downloads/kodak_W820_wireless_frame
8.0K    /home/saml/Downloads/bugs.xls
604K    /home/saml/Downloads/netgear_gs724t/GS7xxT_HIG_5Jan10.pdf

很容易将其最小到最大排序： />

$ du -ah ~/Downloads/ | sort -h | head -6
0   /home/saml/Downloads/apps_archive/monitoring/nagios/nagios-check_sip-1.3/usr/lib64/nagios/plugins/check_ldaps
0   /home/saml/Downloads/data/elasticsearch/nodes/0/indices/logstash-2013.04.06/0/index/write.lock
0   /home/saml/Downloads/data/elasticsearch/nodes/0/indices/logstash-2013.04.06/0/translog/translog-1365292480753
0   /home/saml/Downloads/data/elasticsearch/nodes/0/indices/logstash-2013.04.06/1/index/write.lock
0   /home/saml/Downloads/data/elasticsearch/nodes/0/indices/logstash-2013.04.06/1/translog/translog-1365292480946
0   /home/saml/Downloads/data/elasticsearch/nodes/0/indices/logstash-2013.04.06/2/index/write.lock

不显示目录，仅显示文件：

$ du -ah ~/Downloads/ | sort -rh | head -6
10G /home/saml/Downloads/
3.8G    /home/saml/Downloads/audible/audio_books
3.8G    /home/saml/Downloads/audible
2.3G    /home/saml/Downloads/apps_archive
1.5G    /home/saml/Downloads/digital_blasphemy/db1440ppng.zip
1.5G    /home/saml/Downloads/digital_blasphemy

如果要从输出中排除所有目录，可以使用带有点号的技巧字符。这假设您的目录名称不包含点，并且您要查找的文件也包含点。然后，您可以使用grep -v '\s/[^.]*$'过滤掉目录： -r代替tail -6。

$ du -ah ~/Downloads/ | grep -v "/$" | sort -rh | head -6 
3.8G    /home/saml/Downloads/audible/audio_books
3.8G    /home/saml/Downloads/audible
2.3G    /home/saml/Downloads/apps_archive
1.5G    /home/saml/Downloads/digital_blasphemy/db1440ppng.zip
1.5G    /home/saml/Downloads/digital_blasphemy
835M    /home/saml/Downloads/apps_archive/cad_cam_cae/salome/Salome-V6_5_0-LGPL-x86_64.run

grep -v“ / $”部分似乎没有按预期执行操作，因为目录未附加斜杠。有谁知道如何从结果中排除目录？

– JanWarchoł
2015年2月16日在10:14

@JanekWarchol-顺带一提，要省略目录，您必须更改策略，并使用find仅生成文件列表，然后对其进行正式计算。

–slm♦
15年2月16日在14:50

这也发现了迪尔斯

– ekerner
17年8月17日在22:29

这不仅列出文件，还列出目录:(

–罗马·高夫曼
18-2-26在21:09

基于该解决方案以及本文中提供的解决方案：unix.stackexchange.com/questions/22432/…，我仅使用以下命令即可对文件产生结果：find。型f -exec du -ah {} + | grep -v“ / $” |排序-rh

–flochtililoch
20-4-16在1:12

#2 楼

如果要使用GNU find在当前目录及其子目录中查找所有文件，并根据文件大小（不考虑其路径）列出它们，并假设文件名都不包含换行符，则可以执行以下操作： br />

find . -type f -printf "%s\t%p\n" | sort -n

来自GNU系统上的man find：

   -printf format
          True; print format  on  the  standard  output,
          interpreting  `\'  escapes and `%' directives.
          Field widths and precisions can  be  specified
          as  with the `printf' C function.  Please note
          that many of the  fields  are  printed  as  %s
          rather  than  %d, and this may mean that flags
          don't work as you  might  expect.   This  also
          means  that  the `-' flag does work (it forces
          fields to be  left-aligned).   Unlike  -print,
          -printf  does  not add a newline at the end of
          the string.  The escapes and directives are:

          %p     File's name.
          %s     File's size in bytes.

来自man sort： >

   -n, --numeric-sort
          compare according to string numerical value

不幸的是，它在Mac上不起作用，显示：查找：-printf：未知的主要或运算符

–罗马·高夫曼
18-2-26在21:09

@RomanGaufman是的，这就是答案指定GNU查找的原因。如果您在Mac上安装了GNU工具，它也将在那里工作。

– terdon♦
18-2-26在21:33

#3 楼

请尝试以下命令：

ls -1Rhs | sed -e "s/^ *//" | grep "^[0-9]" | sort -hr | head -n20

它将递归列出当前目录中的前20个最大文件。

注意：选项-h for sort在OSX / BSD上不可用，因此您必须从sort安装coreutils（例如通过brew），并将本地bin路径应用于PATH，例如
>或者使用：

export PATH="/usr/local/opt/coreutils/libexec/gnubin:$PATH" # Add a "gnubin" for coreutils.

对于最大的目录，请使用du，例如： > or：

ls -1Rs | sed -e "s/^ *//" | grep "^[0-9]" | sort -nr | head -n20

完美，这是第一个可在Mac上运行且不显示目录的解决方案:)-谢谢！

–罗马·高夫曼
18-2-26在21:20

如何过滤以仅显示行数> = X的文件？（例如X = 0）

–矩阵
19年5月1日在15:24

#4 楼

这将以递归方式查找所有文件，并按大小对其进行排序。它打印出所有文件大小（以kb为单位），并四舍五入，因此您可能会看到0 KB文件，但是它足够接近我的使用，并且可以在OSX上使用。

也可以在Ubuntu 14.04上工作！

–林大伟
16年7月22日在22:12

这列出了目录，而不仅仅是文件:(

–罗马·高夫曼
18-2-26在21:11

@RomanGaufman-感谢您的反馈！从我的测试中找到。 -type f查找文件...它是递归工作的，您是对的，但是它列出了找到的所有文件，而不是目录本身

–布拉德公园
18-2-27在12:44

Xargs已在1980年代使用。自1989年David Korn引入execplus以来，这是一个坏主意。

–schily
18年6月15日在8:21

#5 楼

使用zsh时，您将找到最大的文件（就外观大小而言，如ls -l输出中的size列，而不是磁盘使用情况），其中：

ls -ld -- **/*(DOL[1])

最大的文件：

ls -ld -- **/*(DOL[1,6])

要按文件大小排序，可以使用ls的-S选项。一些ls实现还为-U提供了一个ls选项，用于不对列表进行排序（因为此处已经按大小按zsh对列表进行了排序）。

#6 楼

Mac / Linux跳过目录的简单解决方案：

find . -type f -exec du -h {} \; | sort -h

#7 楼

BSD或OSX中的等效项是

$ du -ah simpl | sort -dr | head -6

#8 楼

尝试使用带有排序选项的以下命令使文件夹的大小按升序排列

du -sh * | sort -sh

不需要-s排序。要么？

– Mircealungu
20年3月13日在22:37

#9 楼

由于各种原因，这是一个令人难以置信的共同需求（我喜欢在目录中找到最新的备份），并且是一个非常简单的任务。
我将提供一个使用find的Linux解决方案xargs ，stat，tail，awk和sort实用程序。
大多数人提供了一些独特的答案，但我更喜欢我的，因为它可以正确处理文件名，并且用例可以轻松更改（修改stat和sort参数）
我还将提供一个Python解决方案，即使在Windows上也应允许您使用此功能。 />

find . -type f -print0 | xargs -0 -I{} stat -c '%s %n' {} | sort -n

和以前一样，但是这次返回最大的文件。

# Each utility is split on a new line to help 
# visualize the concept of transforming our data in a stream
find . -type f -print0 | 
xargs -0 -I{} stat -c '%s %n' {} | 
sort -n | 
tail -n 1 |
awk '{print }'

相同的模式，但是现在选择最新的文件而不是最大的文件。

# (Notice only the first argument of stat changed for new functionality!)
find . -type f -print0 | xargs -0 -I{} stat -c '%Y %n' {} | 
sort -n | tail -n 1 | awk '{print }'

说明：

查找：以递归方式从当前目录中查找所有文件，并使用空字符将其打印出
xargs：utility to使用标准输入提供的参数执行命令。对于输出的每一行，我们都希望在该文件上运行stat实用程序
stat：Stat是一个非常出色的命令，它包含许多用例。我要打印两列，第一列是块大小（％s），第二列是文件名（％n）
sort：使用数字开关对结果进行排序。由于第一个参数是整数，因此我们的结果将正确排序
tail：仅选择输出的最后一行（由于列表已排序，这是最大的文件！）
awk：选择第二个列，其中包含文件名，并且是递归目录中最大的文件。

Python解决方案

#!/usr/bin/env python
import os, sys
files = list()
for dirpath, dirname, filenames in os.walk(sys.argv[1]):
    for filename in filenames:
        realpath = os.path.join(dirpath, filename)
        files.append(realpath)
files_sorted_by_size = sorted(files, key = lambda x: os.stat(x).st_size)
largest_file = files_sorted_by_size[-1]
print(largest_file)

该脚本需要花费更长的时间来解释，但是基本上，如果将其另存为脚本，它将搜索命令行上提供的第一个参数，并返回该目录中最大的文件。该脚本不进行错误检查，但应该使您了解如何在Python中进行处理，这为您提供了一种很好的平台无关的解决此问题的方法。

#10 楼

这个答案来自类似的问题

find . -type f -exec du -ah {} + | sort -rh | more

#11 楼

在AIX和HP-UX以外的任何平台上都可以使用的东西是：

find . -ls | sort +6 | tail

编程黑洞网