User: Password:
Subscribe / Log in / New account

Widening ext4's readdir() cookie

Widening ext4's readdir() cookie

Posted Mar 28, 2013 9:07 UTC (Thu) by paulj (subscriber, #341)
In reply to: Widening ext4's readdir() cookie by paulj
Parent article: Widening ext4's readdir() cookie

And further, this bit of the article made me scratch my head and wonder which of my understanding, the description of the issues involved in the article, and/or the code is suboptimal:

"Although there is no notion of an offset inside a B-tree, the implementers of modern filesystems must still support the readdir API (albeit reluctantly); indeed, support for the API is a POSIX requirement. Therefore, it is necessary to find some means of supporting "directory position" semantics. This is generally done by fudging the returned offset value, instead returning an internally understood "cookie" value. The idea is that the kernel computes a hash value that encodes some notion of the current position in a directory (tree) and returns that value (the cookie) to user space.

Encoding the position in the tree in the cookie through a hash? But the tree is indexing something with an absolute order - the very node in the true MUST be associated with a unique value already (e.g. the name), surely? Why would you then hash this?

(I'm assuming I'm having a pre-morning-caffeine stupid moment :) ).

(Log in to post comments)

Widening ext4's readdir() cookie

Posted Mar 28, 2013 9:13 UTC (Thu) by paulj (subscriber, #341) [Link]

I guess the issue must be the order the B-tree is indexing is ephemeral (e.g. it's of the index in the number of entries). But then, is there no way to make that order more stable?

Widening ext4's readdir() cookie

Posted Mar 28, 2013 9:23 UTC (Thu) by dark (guest, #8483) [Link]

I think the problem here is that the underlying ordered data entries are simply too big (they're whole filenames) and telldir() returns a long. No matter how you plan to map the names to long offsets, you'll find some combination of mutations to the directory that makes the offset invalid.

Widening ext4's readdir() cookie

Posted Mar 28, 2013 9:32 UTC (Thu) by paulj (subscriber, #341) [Link]

Yes, names would be too big. Cpurse, you don't need an index into the space of all possible names. You need an index into the entries (not necessarily names) that existed and have existed in this directory. I wonder is there not a way to get a stable order, with a reliable "next" operation, out of that?

Widening ext4's readdir() cookie

Posted Mar 28, 2013 9:57 UTC (Thu) by dlang (subscriber, #313) [Link]

remember, the OS has no idea when (or even if) the application is going to make it's next request. The spec allows the application to read one filename a week and still be guaranteed to see all files that existed when it started the read with no duplicates.

If there were no resource constraints, the system could use RCU type strategies to make a copy of the directory and just use the token as a numeric offset into this copy, but since the system has no way of ever knowing when it could release this copy, it would have to keep it forever.

remember that with NFS, a server is required to be consistent even across reboots.

Widening ext4's readdir() cookie

Posted Mar 28, 2013 10:24 UTC (Thu) by paulj (subscriber, #341) [Link]

E.g. why not make the order a ring? Store last created ID with the directory. Use an index into that order that is easily capable of answering "What is the next value in the order?" - tree indices typically are capable of this.

To create an entry:

- locate the ID in your ordered index (B-tree, whatever)
- Determine the next free ID by walking the index to the next existing ID
- Create your entry with that free ID, update your index
- Update the directory's last created ID

Insertions have to be done atomically wrt directory and index and any readers. Removals do not need to modify the last-created ID in the directory though. But it does not matter a jot how the internal structure of the index changes.

No collisions, no hashes, no encoding the structure of indices into cookies. No nasty hacks. Just work with the grain by making the underlying order you're indexing have the properties you want.

You are limited to # of directory entries being the size of the ring, but it seems NFS compatibility would allow that ring to be (2^64)-1.

Why wouldn't that work?

Widening ext4's readdir() cookie

Posted Mar 28, 2013 11:31 UTC (Thu) by paulj (subscriber, #341) [Link]

Actually the ring isn't necessary. Any way to pick a free ID will do.

Widening ext4's readdir() cookie

Posted Apr 4, 2013 1:56 UTC (Thu) by vasi (guest, #83946) [Link]

HFS+ could do this. Each directory entry has a unique ID (the "catalog node ID"), which fits in a long. You can lookup the CNID to get the file name, and then use that name as your position in the directory.

This is still imperfect. Because each CNID is associated with a file name, hard links require an extra level of indirection. And obviously if the file at the current position is deleted, or renamed, things won't work as expected.

This is all moot, since Linux's HFS+ driver just uses the offset as f_pos, I'm not sure why. Maybe the f_pos values are required to be increasing? CNIDs are not.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds