LWN.net Logo

The value of negative dentries

A "directory entry" (dentry) is an internal data structure used to hold the results of looking up a file in the filesystem. The Linux "dentry cache" keeps a number of recently used dentries around; they tend to be useful, since files are often accessed more than once over a short period of time. Finding a file in the dentry cache can save a lot of time by avoiding a full filesystem lookup.

The kernel also hangs on to "negative dentries," which indicate that the given file does not exist. Andrea Arcangeli recently noted that these negative dentries can take up quite a bit of memory, and wondered what possible use they could be. His message included a patch to force negative dentries out of memory quickly.

It turns out, though, that "this file does not exist" can be useful information. A quick strace run on a GNOME application, for example, turns up dozens of lookups on nonexistent files as the application gropes around looking for the unbelievable number of libraries it needs. Similarly, apache is continually looking for .htaccess files, shells look for executables, etc. It is more than worthwhile to be able to determine that a file doesn't exist without an expensive filesystem call - especially for file names that are often looked up. So negative dentries will stay.

There is one optimization that can be made, though. In Andrea's case, the negative dentries were created by deleting a large directory full of files. When a file is deleted, it is relatively unlikely that it will be looked up again soon, and keeping a negative dentry around is less useful. In this case, perhaps, it is better to just forget about the file name altogether.


(Log in to post comments)

The value of negative dentries

Posted Jun 6, 2002 20:24 UTC (Thu) by miketa (guest, #1660) [Link]

A rather too long time ago, when I was doing my PhD on file systems, I developed a system which used what amounts to a dentry cache, and found the same thing: negative entries take up a lot of space but are greatly improved the cache hit rate. I was able to improve things considerable by keeping a per-directory hash table, where each hash entry was a single bit, hashed on the entry name, setting the bits corresponding to each entry (this was quite easy in my case as the underlying structure kept directories entries on hash chains). On lookup, the first action was to hash the entry name (which would be needed anyway) then then look up the bit entry - if it was zero then the entry definitely *did not* exist. Since the hash tables were generally sparse, this was a real win. The costliest operation would be delete, since if the underlying structure is not a hash table you'd need to rehash all entries, unless you made each hash entry a count, limited to a few bits, eg., with two bits the four possible values would mean zero-one-two-many (so delete would decrement the count unless it was "many").

Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds