|
|
Log in / Subscribe / Register

Why kernel.org is slow

Why kernel.org is slow

Posted Jan 13, 2007 0:17 UTC (Sat) by brouhaha (subscriber, #1698)
Parent article: Why kernel.org is slow

How difficult would it be to revise the ext3 code to try to keep directory blocks contiguous as it attempts for files? The on-disk structures shouldn't need to change, so it wouldn't break compatibility.

If that's hard to do, how about a tunable parameter to control how much space is initially allocated to directories, and try to keep that initial allocation contiguous?


to post comments

Why kernel.org is slow

Posted Jan 18, 2007 10:21 UTC (Thu) by forthy (guest, #1525) [Link]

It's a bit a mystery for me why nobody has attacked this problem a long time ago. Directory read was always a pain in the neck, and you can imagine how slow it is if you compare locate with find (and how big the impact of rebuilding the locate database is).

From a more abstract point of view, the directory is a data base with file names, and a n:1 relation between file names and parent directories. The relation between overall file system size and directory size is quite good, i.e. the directory size is a small percentage figure. On a larger file sever here with about 1TB space used, the locatedb (which contains just everything) is only ~64MB. Even when you use a larger, less space-efficient directory structure, 128MB/TB should be completely sufficient. A modern RAID array can read 128MB in a fraction of a second, the memory is there to keep it all, so a find / -name '*' can - if well implemented - print a result within a second or less.

I'd suggest the following to the file system implementors: Forget everything you'd read about Unix directories. Start from scratch. Get a decent knowledge about how data bases work, the directory is a data base. An extremely simple one, so to say. Create a single directory file for the directory data base; make sure that it won't fragment much over time (if the directory grows beyond the previously allocated space, allocate a larger space, and copy the directory over completely). Do read-aheads and all the other caching stuff like for any other file, when accessing the directory data base. Keep the file names easy to access by using a large hash table (on disk - not to be computed on the fly!). Hash key is computed as usual from the directory id+file name hash.

And for the locking: Make sure that readers never have to lock a directory. They'll maybe get stale content, when a writer adds or removes files from a directory, but that's ok. You can never rely on getdents() entries to be valid when you open() them later. Writers should use a RCU mechanism for updating directories.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds