Poettering: Revisiting how we put together Linux systems

Posted Sep 1, 2014 14:13 UTC (Mon) by ken (subscriber, #625)
Parent article: Poettering: Revisiting how we put together Linux systems

I wonder how well this will work with dependencies that are not filesystem based. Any IPC that relies on one server running on the system is still going to be problematic.

What do you do when the base OS runs a version that is incompatible with what the app is expecting? somehow force two versions to run anyway ??

Poettering: Revisiting how we put together Linux systems

Posted Sep 1, 2014 16:00 UTC (Mon) by mezcalero (subscriber, #45103) [Link] (26 responses)

Yes, the design reduces the API compatibility question a lot, but does not remove it. I mean, there needs to be an interface somewhere, if we want components to talk to each other.

In our concept we put a lot of emphasis on the kernel/userspace interfaces, on the bus interfaces and on file system formats. So far the kernel API has probably been the most stable API for all we ever had on Linux, so we should be good on that. File formats tend to stay pretty compatible too over a long period of time. Currently we suck though at keeping bus API stable. If we ever want to get to a healthy app ecosystem, this is where we need to improve things.

Poettering: Revisiting how we put together Linux systems

Posted Sep 1, 2014 16:21 UTC (Mon) by ken (subscriber, #625) [Link] (25 responses)

Well I'm not so sure about the file formats. at least not when it comes to configuration data.

I once used to have home directories on NFS and mount it from different computers. But that has not worked well in years as the "desktop" people apparently cant handle having different versions of the same program.

I won't repeat what came out of my mouth when I tested a new version of a distro but used my old home directory and evolution converting the on disk storage format to a new one but failing to understand its own config so nothing worked in the new version and obviously totally broke everything for the old version. I don't run evolution anymore or try to use the same homedir from different distro versions.

in the 90s I used the same nfs home dir for solaris and diffrent linux versions now doing it with only linux and different version of the same distor is just asking for trouble.

Poettering: Revisiting how we put together Linux systems

Posted Sep 1, 2014 22:52 UTC (Mon) by droundy (subscriber, #4559) [Link]

I am also frustrated with the unwillingness of desktop developers to handle the shared-home-directory configuration. At least browsers refuse to run, but my experience has been that Bad Things happen if I run gnome on multiple computers using the same account with a NFS shared home directory.

Interestingly, if this plan were to take off, it might force desktop developers to be more considerate in what they do with your home directory, since at its essence the scheme uses a single home directory for multiple running operating systems.

Poettering: Revisiting how we put together Linux systems

Posted Sep 2, 2014 0:16 UTC (Tue) by mezcalero (subscriber, #45103) [Link] (23 responses)

Well, file formats is one thing. Concurrent random access to the same files another.

At parsing old file formats our software generally has been pretty good. Doing concurrent access on the same files is a much much harder problem. And quite frankly I don't think it is really worthy goal, and something people seldom test. In particular, since NFS systems are usually utter crap, and you probably find more NFS setups where locking says it works but actually doesn't and is a NOP.

If it was about me I'd change GNOME to try to lock the home directory as soon as you logged in, so that you can only have a single GNOME session at a time on the same home directory. It's the only honest thing to do, since we don't test against this kind of parallel use. However, we can't actually really do such a singleton lock, since NFS is a pile of crap, and as mentioned locking more often doesn't work than it does. And you cannot really emulate locking with the renames, and stuff, because then you get not automatic cleanup of the locks when the GNOME session abnormally terminates.

Or in other words: concurrent graphical sessions on the same $HOME are fucked...

Poettering: Revisiting how we put together Linux systems

Posted Sep 2, 2014 9:34 UTC (Tue) by nix (subscriber, #2304) [Link] (17 responses)

And quite frankly I don't think it is really worthy goal, and something people seldom test. In particular, since NFS systems are usually utter crap, and you probably find more NFS setups where locking says it works but actually doesn't and is a NOP.

This might be true in some interoperable situations, but it hasn't been true of Linux-only deployments for a very long time indeed (well over a decade, more like a decade and a half).

I've had $HOME on NFS for eighteen years now, with mail delivery and MUAs *both* on different clients (neither on the server) for the vast majority of that time. Number of locking-related problems in all that time? Zero -- I can remember because I too expected locking hell, and was surprised when it didn't happen. NFS locking works perfectly well, or at least well enough for long-term practical use.

Really, the only problem I have with NFS is interfaces like inotify which totally punt on the problem of doing file notification on networked filesystems, and desktop environments that then assume that inotify is sufficient and that they don't need to find a way to ship inotifies on servers over to interested clients, when for people with $HOME on NFS single-machine inotify is utterly useless.

That's the only problem I have observed -- and because people like me exist, you can be reasonably sure that major NFS regressions will get reported fairly promptly, so there won't be many other problems either.

Oh yeah -- there is one other problem: developers saying 'NFS sucks, we don't support it' and proceeding to design their software in the expectation that all the world is their development laptop. Not so.

Poettering: Revisiting how we put together Linux systems

Posted Sep 2, 2014 20:23 UTC (Tue) by mezcalero (subscriber, #45103) [Link] (16 responses)

NFS locking on Linux is a complete disaster. For example, the Linux NFS client implicitly forwards BSD locks made on an NFS share to the server where the kernel picks it up as POSIX locks. Hence: if you lock a file with BSD locks as well as POSIX on a local file system, then that works fine, they don't conflict. If you do the same on NFS then you get a deadlock.

And yeah, this happens, because BSD locks are per-fd and hence actually usable to use. And POSIX locks are per-process, which makes them very hard to use (especially as *any* close() invoked by the fd on the file drops the lock implicitly), but then again they support byte-range locking. Hence people end up using both inter-mixed quite frequently, maybe not on purpose, but certainly in real-life.

So yeah, file locking is awful on Linux anyway, and it's particularly bad on NFS.

Poettering: Revisiting how we put together Linux systems

Posted Sep 2, 2014 21:36 UTC (Tue) by bfields (subscriber, #19510) [Link] (9 responses)

And yeah, this happens, because BSD locks are per-fd and hence actually usable to use.

For what it's worth, note Jeff Layton's File-private POSIX locks have been merged now.

Poettering: Revisiting how we put together Linux systems

Posted Sep 3, 2014 19:30 UTC (Wed) by ermo (subscriber, #86690) [Link] (8 responses)

I was going to naïvely ask why POSIX hadn't adopted the superior (in the context of e.g. NFS) BSD-style locking and then you posted that link, which contains this little gem at the very top:

"File-private POSIX locks are an attempt to take elements of both BSD-style and POSIX locks and combine them into a more threading-friendly file locking API."

Sounds like the above is just what the doctor ordered?

Poettering: Revisiting how we put together Linux systems

Posted Sep 3, 2014 23:07 UTC (Wed) by nix (subscriber, #2304) [Link] (7 responses)

It doesn't really help. The problem is that local fses have multiple locks which do not conflict with each other, but the NFS protocol has only one way to signal a lock to the remote end. So there's a trilemma: either you use that for all lock types (and suddenly they conflict remotely where they did not locally), or you don't signal one lock type at all (and suddenly you have things not locking at all remotely where they did locally), or you use a protocol extension, which has horrible compatibility problems.

I don't see a way to solve this without a new protocol revision :(

Poettering: Revisiting how we put together Linux systems

Posted Sep 4, 2014 14:03 UTC (Thu) by foom (subscriber, #14868) [Link]

It does help because it removes any reason to use the BSD lock API (at least when running on Linux, new enough kernel). Before that addition, the POSIX lock programming model was so broken, nobody sane would ever *want* to use it.

Yet, on Linux, local POSIX locks interoperate properly with POSIX locks via NFS, so, if software all switches to using POSIX locks, it'll work properly when used both locally and remotely at the same time.

Of course, very often, nothing is ever running on the NFS server that touches the exported data (or at least, nothing that needs to lock it) -- the NFS server is *just* a fileserver. In such an environment, using BSD locks over NFS on linux works properly too.

Poettering: Revisiting how we put together Linux systems

Posted Sep 5, 2014 0:44 UTC (Fri) by mezcalero (subscriber, #45103) [Link] (5 responses)

I think a big step forward would actually be if the NFS implementations were honest, and would return a clean error if they cannot actually provide correct locking. But that's not what happens, you have no way to figure what is going on a file system...

Just pretending that locking works, even if it doesn't, and returning success to apps is really the worst thing to do...

Poettering: Revisiting how we put together Linux systems

Posted Sep 8, 2014 16:12 UTC (Mon) by nix (subscriber, #2304) [Link] (4 responses)

You're suggesting erroring if a lock of one type is held on on a file when an attempt is made to take out a lock of the other type? I suspect this is the only possible fix, if you can call it a fix. Now we just have to hope that programs check for errors from the locking functions! But of course they will, everyone checks for errors religiously :P

Poettering: Revisiting how we put together Linux systems

Posted Sep 8, 2014 18:56 UTC (Mon) by bfields (subscriber, #19510) [Link] (1 responses)

That wouldn't help. I think he's suggesting just returning -ENOLCK to BSD locks unconditionally. I agree that that's cleanest but in practice I suspect it would break a lot of existing setups.

I suppose you could make it yet another mount option and then advocate making it the default. Or just add support NFS protocol support for BSD locks if it's really a priority, doesn't seem like it should be that hard.

Poettering: Revisiting how we put together Linux systems

Posted Sep 9, 2014 13:56 UTC (Tue) by nix (subscriber, #2304) [Link]

That wouldn't help. I think he's suggesting just returning -ENOLCK to BSD locks unconditionally. I agree that that's cleanest but in practice I suspect it would break a lot of existing setups.

Given how awful POSIX locks are (until you have a very recent kernel and glibc 2.20), and how sane people therefore avoided using the bloody things, I'd say it would break almost every setup relying on locking over NFS at all. A very bad idea.

Poettering: Revisiting how we put together Linux systems

Posted Sep 9, 2014 14:43 UTC (Tue) by foom (subscriber, #14868) [Link] (1 responses)

I don't think he was suggesting that, but that's actually what BSD does with BSD/POSIX locks.
A BSD lock will block a POSIX lock, and v.v.. (At least that's what happens locally; no idea what the BSD's NFS clients do.)

Linux also had that behavior a long time ago IIRC. Not sure why it changed, that was before I paid attention.

Poettering: Revisiting how we put together Linux systems

Posted Sep 9, 2014 15:27 UTC (Tue) by bfields (subscriber, #19510) [Link]

Huh. A freebsd man page agrees with you:

https://www.freebsd.org/cgi/man.cgi?query=flock&sektion=2
If a file is locked by a process through flock(), any record within the file will be seen as locked from the viewpoint of another process using fcntl(2) or lockf(3), and vice versa.

Recent linux's flock(2) suggests the Linux behavior was an attempt to match BSD behavior that has since changed?:

http://man7.org/linux/man-pages/man2/flock.2.html
Since kernel 2.0, flock() is implemented as a system call in its own right rather than being emulated in the GNU C library as a call to fcntl(2). This yields classical BSD semantics: there is no interaction between the types of lock placed by flock() and fcntl(2), and flock() does not detect deadlock. (Note, however, that on some modern BSDs, flock() and fcntl(2) locks do interact with one another.)

Strange. In any case, changing the local Linux behavior is probably out of the question at this point.

Poettering: Revisiting how we put together Linux systems

Posted Sep 3, 2014 23:05 UTC (Wed) by nix (subscriber, #2304) [Link] (5 responses)

Oh yes, I can see how that's problematic -- though TBH it sounds like a bug (the server should export BSD locks as BSD locks, though I can understand the protocol difficulties in doing so). The fact remains that it can't be that common: it has never happened to me at all, and everything I do, I do over NFS.

What I really want -- and still seems not to exist -- is something that gives you the POSIXness of local filesystems (and things like ceph, IIRC) while retaining the 'just take a local filesystem tree, possibly constituting one or many or parts of local filesystems, and export them to other machines' property of NFS: i.e,. not needing to make a new filesystem or move things around madly on the local machine just in order to export the fs. I know, this property is really hard to retain due to the need to make unique inums on the remote machine without exhausting local state, and NFS doesn't quite get it right -- but it would be very nice if it could be done.

Poettering: Revisiting how we put together Linux systems

Posted Sep 4, 2014 15:09 UTC (Thu) by bfields (subscriber, #19510) [Link] (4 responses)

"What I really want -- and still seems not to exist -- is something that gives you the POSIXness of local filesystems"

What exactly are you missing?

"not needing to make a new filesystem or move things around madly on the local machine just in order to export the fs. I know, this property is really hard to retain due to the need to make unique inums on the remote machine without exhausting local state"

I'm not sure I understand that description of the problem. The problem I'm aware of is just that it's difficult to determine given a filehandle whether the object pointed to by that filehandle is exported or not.

"NFS doesn't quite get it right"

Specifically, if you export a subtree of a filesystem then it's possible for someone with a custom NFS client and access to the network to access things outside that subtree by guessing filehandles.

Poettering: Revisiting how we put together Linux systems

Posted Sep 8, 2014 15:55 UTC (Mon) by nix (subscriber, #2304) [Link] (3 responses)

On the POSIXness side of things, I'd like the atomicity guarantees you get from a local fs, rather than having just rename() be atomic; I'd like to not have to deal with silly-rename leaving spew all over my disks that it is hard to figure out when it is safe to clean up; I'd like the same ACL system on the local and the remote filesystems rather than its being mapped through a crazy system designed to be interoperable with Windows... oh, and decent performance would be nice (like NFSv4 allegedly has, though I haven't yet managed to get NFSv4 to work -- haven't tried hard enough, I think its requirements for strong authentication are getting in my way).

Clearly NFS can't do all this: silly-rename and the rest are intrinsic to (the way NFS has chosen to do) statelessness. So I guess we need something else.

As for the not-quite-rightness of NFS's lovely ability to just ad-hoc export things, I have seen spurious but persistent -ESTALEs from nested exports and exports crossing host filesystems in the last year or two, and am still carrying round a horrific patch to make them go away (I was going to submit it, but it's a) horrific and b) I have to retest and make sure it's actually still needed: the underlying bug may have been fixed).

Poettering: Revisiting how we put together Linux systems

Posted Sep 8, 2014 16:30 UTC (Mon) by rleigh (guest, #14622) [Link] (1 responses)

At least with NFSv4 and ZFS, ACLs are propagated to client systems just fine (it's storing NFSv4 ACLs natively in ZFS on disk). For a combination of FreeBSD server and client at least. With a FreeBSD server and Linux client, NFSv4 ACL support isn't working for me, though the standard ownership and perms work correctly. I put this down to the Linux NFS client being less sophisticated and/or buggy, but I can't rule out some configuration issue.

Poettering: Revisiting how we put together Linux systems

Posted Sep 8, 2014 18:49 UTC (Mon) by bfields (subscriber, #19510) [Link]

With a FreeBSD server and Linux client, NFSv4 ACL support isn't working for me, though the standard ownership and perms work correctly. I put this down to the Linux NFS client being less sophisticated and/or buggy, but I can't rule out some configuration issue.

The actual kernel client code is pretty trivial, so the bug's probably either in the FreeBSD server or the client-side nfs4-acl-tools. Please report the problem.

Poettering: Revisiting how we put together Linux systems

Posted Sep 8, 2014 18:46 UTC (Mon) by bfields (subscriber, #19510) [Link]

I think its requirements for strong authentication are getting in my way

The spec does require that it be implemented, but you're not required to use it. If you're using NFS between two hosts with a recent linux boxes then you're likely already using NFSv4. (It's default since RHEL6, for example.)

silly-rename and the rest are intrinsic

See the discussion of OPEN4_RESULT_PRESERVE_UNLINKED in RFC 5661. It hasn't been implemented. I don't expect it's hard, so will probably get done some time depending on the priority, at which point you'll no longer see sillyrenames between updated 4.1 clients and servers.

spurious but persistent -ESTALEs from nested exports and exports crossing host filesystems

Do let us know what you figure out (linux-nfs@vger.kernel.org, or your distro).

Poettering: Revisiting how we put together Linux systems

Posted Sep 2, 2014 11:01 UTC (Tue) by helge.bahmann (subscriber, #56804) [Link]

> If it was about me I'd change GNOME to try to lock the home directory as soon as you logged in, so that you can only have a single GNOME session at a time on the same home directory

How about vnc/nx & friends?

Poettering: Revisiting how we put together Linux systems

Posted Sep 2, 2014 18:11 UTC (Tue) by paulj (subscriber, #341) [Link] (2 responses)

Concurrent access wasn't the issue in the case you're responding to. It was access to the same $HOME with different versions of software, non-concurrently.

Note, version here doesn't just specify the release version of the software concerned, but ABI issues like 64 v 32bit. You might have some software where one ABI's version can read and upgrade files from the other but not other way around.

Does this mean $HOME may need to have dependencies on apps?

Poettering: Revisiting how we put together Linux systems

Posted Sep 3, 2014 23:09 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

It does anyway, even on a single machine. A classic example is KDE, which has a subsystem which automatically upgrades configuration files (and has a little language to specify the modifications required). Update your system without quitting KDE, and it is at least theoretically possible that a newly launched application can trigger an upgrade, and then an already-running instance of the same application can try to read the upgraded configuration file and barf.

This is a really hard problem to solve as long as you permit more than one instance of an application (not a desktop!) requiring configuration to run at once :( which is clearly a desirable property!

Poettering: Revisiting how we put together Linux systems

Posted Sep 4, 2014 21:54 UTC (Thu) by Wol (subscriber, #4433) [Link]

Why is it hard? The config file is best as text :-) and should contain a revision number.

And more to the point, old config versions shouldn't be wiped as a matter of course, they should exist in parallel. Of course, this then has the side effect that when the second, older version gets upgraded it doesn't upgrade the old config but will spot and use the pre-existing newer config. Is that good or bad? I don't know.

Cheers,
Wol

Poettering: Revisiting how we put together Linux systems

Posted Sep 8, 2014 14:51 UTC (Mon) by Arker (guest, #14205) [Link]

"Or in other words: concurrent graphical sessions on the same $HOME are fucked..."

And yet they worked just fine until GNOME came along...