User: Password:
Subscribe / Log in / New account

Widening ext4's readdir() cookie

Widening ext4's readdir() cookie

Posted Mar 29, 2013 2:56 UTC (Fri) by neilbrown (subscriber, #359)
In reply to: Widening ext4's readdir() cookie by paulj
Parent article: Widening ext4's readdir() cookie

"jfs" has a fairly simple second index. Each directory entry contains a stable index number, and there is a table indexed by this number that leads to the name in the btree. Look for "modify_index" in jfs_dtree.c and explore uses of "struct dir_table_slot".

Some problems with this are:
- readdir will not return names in natural b-tree order, so you will get lots of seeking. Not a problem when the whole directory fits easily in cache, may be a problem for v.large directories.
- every B-tree modification (page-split, page-delete) requires updating random entries in the index table.

i.e. there is a genuine IO cost here.

Having readdir return entries in natural B-Tree order is clearly appealing, and a separate table doesn't really allow this.

By "Internal chaining" I mean that if a particular hash value is in use, you add 1 and try again. When you find a hash value to use, you store it (or the difference between the hash of the name and the chosen value, which can probably be stored with fewer bits) with the name. When you delete entries you need to be careful to leave place-holders if any subsequent entries are part of an internal chain leading from here.

In the common case of no collisions there is zero IO overhead (where as with the jfs approach there is always IO overhead). In the rarer case were there are collisions, there is an overhead that probably gets substantially worse as the chain length increases. As long as any internal chain is within one block there is a processing overhead but no IO overhead. Once it crosses a block boundary you get a real overhead, but probably comparable with the JFS overhead. Once you have a chain touching three blocks it probably gets a lot worse. But that should be extremely unlikely.

So it seems like a solution with excellent common-case performance, which degrades only in rare circumstances. Worst case performance (where all names hash to the same value) is probably not much worse than a linear search.

(Log in to post comments)

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds