|
|
Log in / Subscribe / Register

Network filesystem topics

By Jake Edge
May 21, 2018

LSFMM

At the 2018 Linux Storage, Filesystem, and Memory-Management Summit (LSFMM), Steve French led a discussion of various problem areas for network filesystems. Unlike previous sessions (in 2016 and 2017), there was some good news to report because the long-awaited statx() system call was released in Linux 4.11. But there is still plenty of work to be done to better support network filesystems in Linux.

French said that statx() was a great addition that would help multiple filesystems that do not use local block devices for their storage; that includes Samba using SMB 3.1.1 and NFS 4.2. The "birth time" (or creation time) attribute is "super important" for Samba, he said. The next step is to get more of the Windows attribute bits supported in statx() and also in the FS_IOC_[GS]ETFLAGS ioctl() commands.

[Steve French]

There are numerous features that Windows provides, but Linux does not, which makes life more difficult for network filesystems. There is no way to do safe caching of file and directory data because leases and delegations are not supported on Linux servers. Also, there still is no support for rich access-control lists (RichACLs) despite lots of work and testing that went on over the years. There has not been much patch activity lately, he said, but Andreas Gruenbacher has posted 28 versions of the patch set over time. The problems that have cropped up are generally due to trying to map user IDs and the like between three separate domains (perhaps server, client, and on-disk, though French did not say).

Broader support for the variants of the fast copy operation is badly needed, he said. The cp --reflink command uses the FICLONERANGE ioctl() command, but not copy_file_range(); in fact, no utilities use copy_file_range(), though it should be the default. It will fall back to other forms of copying, if needed, but can make the copy operation complete thousands of times faster in many cases. French said he got an email from a user asking about a copy operation in the cloud that was taking an hour or so. He suggested using a different command, which was faster, but the customer asked why cp (and other tools such as rsync) did not simply use the faster operation.

Case-insensitive lookups are another problem area; Samba emulates it, but it is expensive to do so. Ric Wheeler noted that XFS supports doing case-insensitive lookups while preserving the case of the filenames on disk; he suggested perhaps doing the same in user space for Samba. French said that might make sense as this problem has been around for a long time.

In general, macOS and Windows are both SMB friendly, but Linux is not, he said. Though he did describe a demo at a recent storage conference, where different clients on a "bad hotel network" were all able to edit the same file using SMB. It was rather eye-opening, especially when compared to ten years ago, to see Linux, macOS, Windows, Android, and iOS all interoperating that way.

Many of the standard utilities are not transferring data in large enough chunks. For example, rsync defaults to 4KB and the largest it will use is 128KB, but NFS is able to handle much larger transfers and SMB is larger still. For the network filesystems, transferring 8MB chunks would make much more sense.

He mentioned a double handful of other features that would make things easier for Samba, NFS, and others, but it was not clear who was working on those features or planning to do so—something that is also true for some of the features mentioned earlier. For example, Dave Chinner said that someone needs to update cp to bring it into the copy_file_range() world. French said that he had sent some patches to the rsync maintainers (who may well be easier to find than cp maintainers), but that there was no response. The upshot was that network filesystems, especially those that are meant to interoperate with Windows, are not getting the attention that they need from the Linux world.


Index entries for this article
KernelFilesystems/Network
ConferenceStorage, Filesystem, and Memory-Management Summit/2018


to post comments

Network filesystem topics

Posted May 21, 2018 21:43 UTC (Mon) by jra (subscriber, #55261) [Link] (7 responses)

Just a comment:

"Ric Wheeler noted that XFS supports doing case-insensitive lookups while preserving the case of the filenames on disk; he suggested perhaps doing the same in user space for Samba."

Samba has done this for many years already of course.

Network filesystem topics

Posted May 22, 2018 1:02 UTC (Tue) by dgc (subscriber, #6611) [Link] (6 responses)

> Samba has done this for many years already of course.

Yes, it has.

Just to set the record straight, the XFS ascii-ci implementation was done (>10 years ago, IIRC) specifically for avoiding the Samba CI code for performance reasons. On non-trivial directory sizes, the XFS implementation is thousands of times faster than the Samba CI code because it doesn't have to read the entire directory contents on each lookup to search for CI matches.

There are patches for kernel filesystem UTF-8 CI support (which has the same benefits but for UTF8 encoded names), but there's been issues with them that are being worked through at the moment.

-Dave.

Network filesystem topics

Posted May 22, 2018 3:39 UTC (Tue) by jra (subscriber, #55261) [Link] (2 responses)

Yeah, I wasn't trying to imply we did it elegantly (you simply *can't* avoid reading the entire directory when you get a cache miss on case when you're in userspace), only that we already did it :-).

Network filesystem topics

Posted May 22, 2018 13:06 UTC (Tue) by epa (subscriber, #39769) [Link] (1 responses)

I suppose you could do some hack like testing for this, THIS and This, before you fall back to reading the whole directory.

Network filesystem topics

Posted May 22, 2018 15:39 UTC (Tue) by jra (subscriber, #55261) [Link]

That sounds like a good idea, but in fact just about all of the clients already do case-preserving correctly, so when we get a miss it's *extremely* likely that the file doesn't exist (and the client is just making sure that's so).

These code paths are already some of the most complex in Samba (getting the *absolutely* correct error message returns here is *essential* to make real applications work here) so I'm loathe to add any more complexity here.

Network filesystem topics

Posted May 23, 2018 13:03 UTC (Wed) by trondmy (subscriber, #28934) [Link] (2 responses)

The problem with networked filesystems is that typically the server owns the case folding algorithm, and the client has no a priori understanding of how it works. When looking at this problem for NFSv4, we found it basically means that we either have to accept a certain amount of dentry aliasing, or we have to impose artificial limitations in order to avoid that behaviour (e.g. by limiting the number of cached dentries to 1 per file per directory).

The easiest solution is to allow for dentry aliasing, but that then leads to interesting corner cases when creating, linking, renaming or unlinking files. For instance, you end up no longer being able to assume that cached dentries are still valid after an unlink or rename operation, and you can no longer perform negative dentry caching when files are created or linked to...

Network filesystem topics

Posted May 24, 2018 12:56 UTC (Thu) by JFlorian (guest, #49650) [Link] (1 responses)

I'd argue that the true "easiest" solution is a case-sensitive filesystem, but obviously that ship sailed with its disaster in tow. What an unfortunate decision.

Network filesystem topics

Posted May 25, 2018 2:12 UTC (Fri) by zlynx (guest, #2285) [Link]

I'm not sure which you're claiming as the disaster.

I would claim that anything other than raw bytes is the disaster. Especially after my experience with OSX and HFS where I once had to completely reformat a laptop to remove a few files after Apple decided to change their Unicode normalization rules.

There was simply no way to get to the old filenames and no way to access the files.

Let the GUI deal with case sensitivity if it must. Let the OS API access a file by an unambiguous stream of bytes terminated by a null.


Copyright © 2018, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds