|Did you know...?|
LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.
Ostensibly, the Linux Storage, Filesystem, and Memory Management (LSFMM) Summit is broken up into three tracks, but for the most part there is enough overlap between the Storage and Filesystem parts that joint sessions between them are the norm. The only session where that wasn't the case was a discussion led by Chuck Lever and Jim Lieb that spanned two slots. It covered user-space file servers and, in particular, FedFS and user-space NFS. As I was otherwise occupied in the Storage-only track, Elena Zannoni sat in on the discussion and provided some detailed notes on what went on.
Lever kicked things off by describing FedFS (short for "Federated Filesystem") as a way for administrators to create a single root namespace from disparate portions of multiple remote filesystems from multiple file servers. For that to work, there are filesystem entities known as "referrals" that contain the necessary information to track down the real file or directory entry on another filesystem. By building up referrals into the hierarchy the administrator wants to present, a coherent "filesystem" made up of files from all over can be created.
Referrals are physically stored in objects known as "junctions". Samba uses symbolic links to store the metadata needed to access the referred-to data, while FedFS uses the extended attributes (xattrs) on an empty directory for that purpose. At last year's meeting, it was agreed to converge on a single format for junctions. Symbolic links would be used, though Samba would still use the linked-to name, while FedFS would use xattrs attached to the link.
Since then, that decision was vetoed by the NFS developers, Lever said. So FedFS will stay with the empty directory to represent junctions. These empty directories are similar to the Windows concept of "reparse points", according to Steve French. It is an empty directory with a special attribute to distinguish it from a normal empty directory.
It would be nice to be able to add a new type of inode (or mode bit) to the virtual filesystem (VFS) to support that, French said. But Ted Ts'o noted that doing so would require all filesystems to change to support it
Lever also explained that a single administrative interface that could manage junctions for both Samba and NFS is needed. In answer to a question from Jeff Layton, Lever said that FedFS was looking for help from the kernel on reparse points (or junctions), as well as performance help to reduce lookups for discovering these referrals and where they go. Lieb noted that the latter would also help the Ganesha user-space NFS file server.
Layton went on to explore why the symlink scheme could not be used, but Trond Myklebust was clear that using symbolic links was too ugly and hacky. It also limited the referral information to a single page, so supporting multiple protocols for a single referral was difficult. It is a "nasty hack" that Samba uses, but it is not sensible to spread it further, Myklebust said. It was agreed that more discussion was needed before any kind of proposal could be made.
The session switched over to Lieb at that point. He had a number of topics where user-space file servers (like Ganesha) needed kernel help. The first is for file-private locks, which may have been solved with Layton's work on that feature. Lieb hopes to see that get merged in the 3.15 merge window. The next step will be to get GNU C library (glibc) patches to support the new style of locks merged.
Another problem for Ganesha is with filtering inotify events. It would like to be able to get events for anything that some other filesystem does to the exported directories, while not getting notified for events generated by its own activities. The inotify events are used to invalidate caches in Ganesha and it is getting swamped by events from its own actions, Lieb said. Patches have been posted, but he would like to see the feature get added.
Dealing with user credentials is another area where Ganesha could use some kernel help. Right now, for many operations, Ganesha must do seven system calls to perform what is (conceptually) one operation. It must call setuid(), setfsgid(), setgroups(), then the operation followed by an unwinding of the three credentials calls. He would like a simpler way, with fewer system calls.
After a long discussion, it became clear that what Lieb was looking for was a way to cache a particular set of credentials in the kernel that could be recalled whenever Ganesha needed to do an operation as that user. Currently, the constant set up and teardown of the credentials information is time consuming. Ts'o thought he had a solution for that problem and he suggested that he and Lieb finish the discussion outside of the meeting.
The never-ending battle for some sort of enhanced version of readdir() was next up. There is a need to get much more information than what readdir() provides. A number of proposals have been made over the years, including David Howells's xstat() system call. Those proposals have all languished for lack of someone driving the effort. The older patches need to be resurrected, refreshed, and reposted, but it will require finding someone to push them before they will be merged.
The last problem discussed is support for access-control lists (ACLs). There are two kinds of ACLs being used today: POSIX and rich (NFSv4) ACLs. They have different semantics and the question is how the kernel can support both. Currently, the kernel has support for POSIX ACLs, but Samba, NFSv4, and others use the rich ACLs.
One possible solution would be to just add rich ACLs to Linux, essentially sitting parallel to the existing POSIX ACLs. But Al Viro believes it is too complicated to have two similar features with slightly different semantics both in the kernel. There is also some thought that perhaps POSIX ACLs could be emulated by rich ACLs, but it is unclear if that is true. In the end, the kernel needs to do the ACL checking, since races and other problems are present if user space tries to do that job. The slot ended before much in the way of conclusions could be drawn.
[ Thanks to Elena Zannoni for her extensive notes. Thanks, also, to the Linux Foundation for travel support to attend LSFMM. ]
Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds