Dealing with negative dentries
Dealing with negative dentries
Posted May 10, 2022 6:20 UTC (Tue) by nickodell (subscriber, #125165)Parent article: Dealing with negative dentries
Here's a short summary of why it happened:
1. In the linked bug, curl is being used as a Docker healthcheck for Elasticsearch. It's getting run every second and creating ~20,000 negative dentries each time.
2. NSS is a library for loading and validating SSL certificates. It's used in curl and Firefox. It can load a set of certificate authorities from the filesystem.
3. But loading these is slow, so it builds an indexed SQLite database of SSL certs so that cert lookup is fast.
4. The location that the database is being built on could be a network filesystem, so if that happens, the database should be built up in memory, *then* written to disk in one go, for speed reasons.
5. But how can it detect if it's a slow network FS or a fast disk FS? It measures it using stat(). But if it stats a file which exists, the file could be in the dentry cache already, which would throw off the measurement. So it measures the stat() time of a nonexistant file with a random number in the filename.
6. It repeats that measurement 10,000 files, or for 33ms, whichever comes first. Also, it measures both /tmp and the directory it loads the certificates from.
There are a couple of contributing factors to the problem:
1. The developers of NSS, Mozilla, mostly care about performance for Firefox. In Firefox, NSS is only loaded once per process. curl also loads NSS once per process, but each process much shorter lived.
2. Firefox is used on desktop systems, which get rebooted more often.
3. AFAIK, there's no good cross-platform way to determine if a path is on a network filesystem.
4. Elasticsearch gets run on systems with gobs and gobs of memory, and it's not rebooted for a really long time.
The story does have a happy ending, though. These days, NSS will make a temporary dir, look up its 10000 non-existant files in that directory, and deletes the temporary dir. As I understand it, that cleans up all but one negative dentry.
https://github.com/nss-dev/nss/blob/18668d2e34500a6f14b68...
But it's easy to see why the NSS developers started looking up nonexistent files - it solved a pressing performance problem in a cross-platform way that looked to be nearly free. It seems like negative dentries are something of an attractive nuisance - easy to misuse without knowing the performance costs.
