One billion files on Linux

Posted Aug 19, 2010 17:08 UTC (Thu) by bcopeland (subscriber, #51750)
Parent article: One billion files on Linux

When trying to look at that many files, you need to avoid running stat() on every one of them or trying to sort the whole list.

Underlying this issue is that today's directories (for ext4 at least) are not set up to iterate in inode order. The consequence is that if you do a walk of the files in the order they are stored in the directory, and the inodes aren't in the cache, you have to seek all over the disk to get to the inode information. I remember reading once that the htree designers planned at some point to group the files in htree leaves into buckets based on inode; I wonder if anything ever came of that?

One billion files on Linux

Posted Aug 19, 2010 20:16 UTC (Thu) by ricwheeler (subscriber, #4980) [Link]

One thing you can do (and upstream, tools like rm do this now) is to get a bunch of entries back from readdir and then sort them by inode number.

That removes the random, seeky nature of the list for file systems that suffer from this (ext3/4, reiserfs, other?).

For the more advanced layouts, you should look to btrfs.