Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Posted Sep 2, 2014 20:23 UTC (Tue) by mezcalero (subscriber, #45103)In reply to: Poettering: Revisiting how we put together Linux systems by nix
Parent article: Poettering: Revisiting how we put together Linux systems
And yeah, this happens, because BSD locks are per-fd and hence actually usable to use. And POSIX locks are per-process, which makes them very hard to use (especially as *any* close() invoked by the fd on the file drops the lock implicitly), but then again they support byte-range locking. Hence people end up using both inter-mixed quite frequently, maybe not on purpose, but certainly in real-life.
So yeah, file locking is awful on Linux anyway, and it's particularly bad on NFS.
Posted Sep 2, 2014 21:36 UTC (Tue)
by bfields (subscriber, #19510)
[Link] (9 responses)
For what it's worth, note Jeff Layton's File-private POSIX locks have been merged now.
Posted Sep 3, 2014 19:30 UTC (Wed)
by ermo (subscriber, #86690)
[Link] (8 responses)
"File-private POSIX locks are an attempt to take elements of both BSD-style and POSIX locks and combine them into a more threading-friendly file locking API."
Sounds like the above is just what the doctor ordered?
Posted Sep 3, 2014 23:07 UTC (Wed)
by nix (subscriber, #2304)
[Link] (7 responses)
I don't see a way to solve this without a new protocol revision :(
Posted Sep 4, 2014 14:03 UTC (Thu)
by foom (subscriber, #14868)
[Link]
Yet, on Linux, local POSIX locks interoperate properly with POSIX locks via NFS, so, if software all switches to using POSIX locks, it'll work properly when used both locally and remotely at the same time.
Of course, very often, nothing is ever running on the NFS server that touches the exported data (or at least, nothing that needs to lock it) -- the NFS server is *just* a fileserver. In such an environment, using BSD locks over NFS on linux works properly too.
Posted Sep 5, 2014 0:44 UTC (Fri)
by mezcalero (subscriber, #45103)
[Link] (5 responses)
Just pretending that locking works, even if it doesn't, and returning success to apps is really the worst thing to do...
Posted Sep 8, 2014 16:12 UTC (Mon)
by nix (subscriber, #2304)
[Link] (4 responses)
Posted Sep 8, 2014 18:56 UTC (Mon)
by bfields (subscriber, #19510)
[Link] (1 responses)
That wouldn't help. I think he's suggesting just returning -ENOLCK to BSD locks unconditionally. I agree that that's cleanest but in practice I suspect it would break a lot of existing setups.
I suppose you could make it yet another mount option and then advocate making it the default. Or just add support NFS protocol support for BSD locks if it's really a priority, doesn't seem like it should be that hard.
Posted Sep 9, 2014 13:56 UTC (Tue)
by nix (subscriber, #2304)
[Link]
Posted Sep 9, 2014 14:43 UTC (Tue)
by foom (subscriber, #14868)
[Link] (1 responses)
Linux also had that behavior a long time ago IIRC. Not sure why it changed, that was before I paid attention.
Posted Sep 9, 2014 15:27 UTC (Tue)
by bfields (subscriber, #19510)
[Link]
If a file is locked by a process through flock(), any record within the file will be seen as locked from the viewpoint of another process using fcntl(2) or lockf(3), and vice versa. Recent linux's flock(2) suggests the Linux behavior was an attempt to match BSD behavior that has since changed?:
Since kernel 2.0, flock() is implemented as a system call in its own right rather than being emulated in the GNU C library as a call to fcntl(2). This yields classical BSD semantics: there is no interaction between the types of lock placed by flock() and fcntl(2), and flock() does not detect deadlock. (Note, however, that on some modern BSDs, flock() and fcntl(2) locks do interact with one another.) Strange. In any case, changing the local Linux behavior is probably out of the question at this point.
Posted Sep 3, 2014 23:05 UTC (Wed)
by nix (subscriber, #2304)
[Link] (5 responses)
What I really want -- and still seems not to exist -- is something that gives you the POSIXness of local filesystems (and things like ceph, IIRC) while retaining the 'just take a local filesystem tree, possibly constituting one or many or parts of local filesystems, and export them to other machines' property of NFS: i.e,. not needing to make a new filesystem or move things around madly on the local machine just in order to export the fs. I know, this property is really hard to retain due to the need to make unique inums on the remote machine without exhausting local state, and NFS doesn't quite get it right -- but it would be very nice if it could be done.
Posted Sep 4, 2014 15:09 UTC (Thu)
by bfields (subscriber, #19510)
[Link] (4 responses)
What exactly are you missing?
"not needing to make a new filesystem or move things around madly on the local machine just in order to export the fs. I know, this property is really hard to retain due to the need to make unique inums on the remote machine without exhausting local state"
I'm not sure I understand that description of the problem. The problem I'm aware of is just that it's difficult to determine given a filehandle whether the object pointed to by that filehandle is exported or not.
"NFS doesn't quite get it right"
Specifically, if you export a subtree of a filesystem then it's possible for someone with a custom NFS client and access to the network to access things outside that subtree by guessing filehandles.
Posted Sep 8, 2014 15:55 UTC (Mon)
by nix (subscriber, #2304)
[Link] (3 responses)
Clearly NFS can't do all this: silly-rename and the rest are intrinsic to (the way NFS has chosen to do) statelessness. So I guess we need something else.
As for the not-quite-rightness of NFS's lovely ability to just ad-hoc export things, I have seen spurious but persistent -ESTALEs from nested exports and exports crossing host filesystems in the last year or two, and am still carrying round a horrific patch to make them go away (I was going to submit it, but it's a) horrific and b) I have to retest and make sure it's actually still needed: the underlying bug may have been fixed).
Posted Sep 8, 2014 16:30 UTC (Mon)
by rleigh (guest, #14622)
[Link] (1 responses)
Posted Sep 8, 2014 18:49 UTC (Mon)
by bfields (subscriber, #19510)
[Link]
The actual kernel client code is pretty trivial, so the bug's probably either in the FreeBSD server or the client-side nfs4-acl-tools. Please report the problem.
Posted Sep 8, 2014 18:46 UTC (Mon)
by bfields (subscriber, #19510)
[Link]
The spec does require that it be implemented, but you're not required to use it. If you're using NFS between two hosts with a recent linux boxes then you're likely already using NFSv4. (It's default since RHEL6, for example.)
See the discussion of OPEN4_RESULT_PRESERVE_UNLINKED in RFC 5661. It hasn't been implemented. I don't expect it's hard, so will probably get done some time depending on the priority, at which point you'll no longer see sillyrenames between updated 4.1 clients and servers.
Do let us know what you figure out (linux-nfs@vger.kernel.org, or your distro).
Poettering: Revisiting how we put together Linux systems
And yeah, this happens, because BSD locks are per-fd and hence actually usable to use.
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
That wouldn't help. I think he's suggesting just returning -ENOLCK to BSD locks unconditionally. I agree that that's cleanest but in practice I suspect it would break a lot of existing setups.
Given how awful POSIX locks are (until you have a very recent kernel and glibc 2.20), and how sane people therefore avoided using the bloody things, I'd say it would break almost every setup relying on locking over NFS at all. A very bad idea.
Poettering: Revisiting how we put together Linux systems
A BSD lock will block a POSIX lock, and v.v.. (At least that's what happens locally; no idea what the BSD's NFS clients do.)
Huh. A freebsd man page agrees with you:
Poettering: Revisiting how we put together Linux systems
https://www.freebsd.org/cgi/man.cgi?query=flock&sektion=2
http://man7.org/linux/man-pages/man2/flock.2.html
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
With a FreeBSD server and Linux client, NFSv4 ACL support isn't working for me, though the standard ownership and perms work correctly. I put this down to the Linux NFS client being less sophisticated and/or buggy, but I can't rule out some configuration issue.
Poettering: Revisiting how we put together Linux systems
I think its requirements for strong authentication are getting in my way
silly-rename and the rest are intrinsic
spurious but persistent -ESTALEs from nested exports and exports crossing host filesystems