Count files by extension

TL;DR:

find . -type f | sed 's/.*\///; s/^\.//; /\./! s/$/./; s/.*\.//' | sort | uniq -c | sort -nr

The most popular command you can google appears to be this:

find . -type f | sed 's/.*\.//' | sort | uniq -c

Here for example.

The problem is that it incorrectly counts files without extensions:

1 css
1 csv
1 git/COMMIT_EDITMSG
1 git/config
1 git/description
1 git/HEAD
4 gitignore
...

So it needs an improvement. The first step is to cut the path and keep only the base name:

find . -type f | sed 's/.*\///; s/.*\.//' | sort | uniq -c

Now we no longer have paths:

1 COMMIT_EDITMSG
1 config
1 css
1 csv
...

The next step would be adding a dot at the end of extension-less files so empty extension gets counted separately:

find . -type f | sed 's/.*\///; /\./! s/$/./; s/.*\.//' | sort | uniq -c

Now we have a useful output:

263
  1 bash
  7 coffee
  1 css
  1 csv
  1 exe
  1 fish
  4 gitignore
  1 iml
...

Next step is to add a fix for hidden files without extension (like .gitignore):

find . -type f | sed 's/.*\///; s/^\.//; /\./! s/$/./; s/.*\.//' | sort | uniq -c

Now we have a perfect list sorted by extension. To sort by count, you can add sort -n for ascending order or sort -nr for descending:

find . -type f | sed 's/.*\///; s/^\.//; /\./! s/$/./; s/.*\.//' | sort | uniq -c | sort -nr
1058 php
 268
  47 md
  36 json
...

The final explanation:

  • find . -type f find every file in the current directory

  • sed:

    • 's/.*\///' cut the path, keeping only basename

    • 's/^\.//' remove leading dot because it does not indicate an extension

    • '/\./! s/$/./' add a dot if there is no dot at all

    • 's/.*\.//' remove the file name, keeping only an extension

  • sort because uniq works only on sorted output

  • uniq -c find duplicates and count them

  • sort -nr optional part, sort extensions by a number of entries

It does not handle common double extensions like .tar.gz but it was always enough for me.

Comments

Comments powered by Disqus