One billion files on Linux

Posted Aug 19, 2010 10:57 UTC (Thu) by cesarb (subscriber, #6266)
In reply to: One billion files on Linux by liljencrantz
Parent article: One billion files on Linux

> But in what situations will it make more sense to not group a billion of file items into logical groups?

Things like squid cache directories, git object directories, ccache cache directories, that hidden thumbnails directory in your $HOME... They all have in common that the files are named by a hash or something similar. There is no logical grouping at all here; it is a completely flat namespace.

Most of these work around the large number of files in a single directory this causes by extracting some bits (usually 4 or 8) of the hash and using it as the name of a subdirectory (which works because the hashes used have an almost perfect uniform distribution). Sometimes more than one level is used. If the filesystem can easily deal with a huge number of files in a single directory, this extra complexity is not needed.

There is also Maildir directories, which use one file per message, and the only logical grouping is a "folder" or similar. If you have a million messages in a single "folder" (for instance one named "linux-kernel-mailing-list" which has all the messages you collected since 1999), you need a filesystem which can deal with a million files in a single directory. And here the names are not hashes, so the scheme above fails (and even if it worked, it is not a Maildir anymore).

One billion files on Linux

Posted Aug 19, 2010 18:34 UTC (Thu) by liljencrantz (guest, #28458) [Link]

The advantage of putting all files in the same directory is that it's slightly easier to code it that way. The disadvantage is that you have directories that effectively can't have their content listed using ls, you likely can't even count the number of files in the directory. Basically some kind of storage tar pit. I think I'll stick to using subfolders. And once mailing lists with more than say 10 million messages in them become common, I'll start worrying about a subfoldered replacement for maildir. :-)