LWN.net Logo

Quotes of the week

File locking on Linux is just broken. The broken semantics of POSIX locking show that the designers of this API apparently never have tried to actually use it in real software. It smells a lot like an interface that kernel people thought makes sense but in reality doesn't when you try to use it from userspace.
-- Lennart Poettering

Yeah, yeah, maybe you're waiting for flower power and free sex. Good for you. But if you are, don't ask the Linux kernel to wait with you. Ok?
-- Linus Torvalds (see below)

Now that I've had a look at the whole series, I'll make an overall comment: I suspect that the locking is sufficiently complex that we can count the number of people that will be able to debug it on one hand. This patch set didn't just fall off the locking cliff, it fell into a bottomless pit...
-- Dave Chinner

This is a disaster. I can't see for the life of me why we haven't had 100,000 bug reports.
-- Joel Becker (OCFS2 users might want to be careful with 2.6.35-rc for now)
(Log in to post comments)

Quotes of the week

Posted Jul 1, 2010 7:42 UTC (Thu) by michaeljt (subscriber, #39183) [Link]

Re the first quote:

1) What does TBFKAYIBYNYAAYB stand for?
2) Doesn't file locking over the network also depend on the protocol and the server at the other end supporting it properly? I am reluctant to blame the OS on your own machine for that.

Quotes of the week

Posted Jul 1, 2010 11:45 UTC (Thu) by Darkmere (subscriber, #53695) [Link]

The problem is really that the local machine will say "yes, here have a lock" while it isn't locked in reality.

Typical example, two machines at home, one of them has /home on ext3, the other nfs-mounts the other machines /home, so they share profiles, settings mail and such.

This will then cause interesting situations with file locking due to how the networked filesystem/clients manage the locks.

Quotes of the week

Posted Jul 1, 2010 13:55 UTC (Thu) by epa (subscriber, #39769) [Link]

It's astonishing that anyone ever wrote a file locking implementation (over NFS or anything else) which returns 'success' without having taken the lock. The only possible behaviour is to return failure in this case. Perhaps locking doesn't really matter, and most applications run fine without it; but that's for the application code to decide, not a reason for the kernel to lie to userspace.

Quotes of the week

Posted Jul 1, 2010 15:40 UTC (Thu) by k8to (subscriber, #15413) [Link]

Yeah, partly why this is so fucked is that unix has a pretty sane file namespace with files owned by competing services/systems/users pretty well separated. The result is most programs don't need working locks, even if they think they do. Low pressure to improve.

POSIX/NFS file locking madness

Posted Jul 3, 2010 17:36 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

Perhaps locking doesn't really matter, and most applications run fine without it; but that's for the application code to decide, not a reason for the kernel to lie to userspace.

You seem to have missed the point of this hack, which is that the application already exists and was not designed to make this decision. It was designed with the assumption that under normal conditions, you can get a lock. So lying to userspace is the only option for getting the most desirable system-level end result.

It's the same reason a disk drive may claim to have written data persistently when it hasn't and a web browser (in the old days) would claim to be Internet Explorer when it wasn't.

All of these complaints are valid criticisms of system-level locking design, but not of the design of one little piece of the system when all the other pieces are already designed.

Quotes of the week

Posted Jul 1, 2010 16:18 UTC (Thu) by nix (subscriber, #2304) [Link]

Also that (as Lennart points out in his followup post) that the local machine may say 'yes, here's a BSD lock' and then you run it on a machine importing the same filesystem over NFS and it will say 'yes, here's a BSD lock' and you've got locks both times, but the server silently 'upgraded' the BSD lock to a POSIX lock for you, so you've now locked the same file twice and you can't tell.

You know your locking system is broken when a program can't even interoperate with other instances of *itself*.

Quotes of the week

Posted Jul 1, 2010 19:42 UTC (Thu) by clugstj (subscriber, #4020) [Link]

Re the first quote:

He says "File locking on Linux is just broken", but then complains about:
1) Other OS's NFS Server implementations
2) Old Linux versions that had bugs/misfeatures
3) POSIX file locking sematics
4) Actual problems in Linux

On order to correct 1-3, "Linux" would have to:
1) Fix bugs in other people's kernels
2) Build a time machine
3) Build a time machine and use it to fix an old spec

Seems to me he is just padding his complaint with extraneous stuff that also pisses him off.

Quotes of the week

Posted Jul 1, 2010 22:44 UTC (Thu) by mezcalero (subscriber, #45103) [Link]

Well, it is fine if things used to be broken and got fixed then. But that is not the case on Linux. I don't care too much about POSIX, I'd be happy to use a Linux-specific locking API that fixes the problems I pointed out. But that does not exist and things got worse than they used to be. i.e., still it is not sanely detectable via fpathconf() or a similar API whether locks are supposed to work, and which kind of locks work. Then, the Linux-NFS-style "upgrading" of BSD to POSIX locks is a complete desaster and should never have been added to Linux (and that is a "feature" very recently added). It creates more problems than it solves. And this goes on and on...

So regarding your three points you raised:

1) make this detectable for apps via fpathconf() or so.
2) make this also detectable for apps via fpathconf() or so.
3) fix the APIs. POSIX is not the holy grail. Where it is broken we can always introduce new and fixed APIs. That's what we have been doing on Linux all the time.

Lennart (who wrote the original blog story)

Quotes of the week

Posted Jul 2, 2010 8:49 UTC (Fri) by nix (subscriber, #2304) [Link]

I don't care too much about POSIX, I'd be happy to use a Linux-specific locking API that fixes the problems I pointed out
It is very rare for people to have this luxury, in any case, so we're stuck with the bloody awful POSIX file locking API for now.

I wonder how many programs would be broken if we changed said API to work in a sane fashion (i.e. POSIX locks follow fds, POSIX and BSD locks can block each other rather than being two competing systems, locks are broken when the fd on which the lock was taken is closed?). I can't imagine too terribly many programs actually depend on the current disastrously broken semantics of POSIX locks.

However, even there it is interesting to try to figure out what sane semantics for lock sharing versus fd duplication/sharing actually are. We want locks to follow file descriptors in that if we dup() an fd with a lock on it, both of them share the lock; but do we want the locks to be shared across fork()? If dup()/fork() should not lead to the child holding any locks on its inherited fds, then locks cannot follow file descriptors (as those are shared over fork()) nor can they follow what POSIX calls 'file descriptions' (as these are duplicated by dup(), which we probably want to share locks). We'd need a new kernel-internal entity just for this. Ew. I suspect it makes more sense to have locks strictly be an fd entity, so they are shared by dup() and fork() both.

This is all kind of academic though because locks probably won't be changing :((

Quotes of the week

Posted Jul 4, 2010 9:13 UTC (Sun) by neilbrown (subscriber, #359) [Link]

Any program that would break if we changed locking so that POSIX and BSD locks could block each other would already break when used on NFS, as NFS maps one to the other. I have seen this with some mail client which tried to take all possibly locks just to be sure it was safe. So it would take a POSIX lock and a BSD lock on the mail file (and probably create a .lock file as well). It didn't work over NFS.

Fortunately this program could be compile-time configured to just use one sort of lock, so that was done.

If we changed POSIX locks to follow fds it is very likely that nothing would break. However I wouldn't recommend it. Rather we should add new fcntl commands, F_SETLK_FD and F_SETLKW_FD (or similar) which set a lock and tie it to the FD rather than to the process. This just needs someone to write (and test) the code. I suspect there wouldn't be a big battle getting it merged....

My understanding of the origin of POSIX locking is that the BSD/SUN VNODE interface (Which is the interface to specific filesystem implementations) didn't support anything better. The concept of an 'fd' doesn't make it down below the VNODE. The filesystem knows when an fd is closed, but it doesn't know which one. vn_close() is given the 'vnode' (which is per-file), the open flags (read/write) and the credentials (uid/gid).
It cannot use any of these to differentiate between a close of an fd which holds the lock, and a close of an fd which doesn't. Given that interface the best you can do is tie the lock to some global datum - e.g. the pid. That at least ensures that when a process exits, any locks it held are closed.
If we needed an example of why stable APIs are nonsense, this would be it. It appears that it was the need for a stable VNODE interface that gave us POSIX locks!

[[ to read about vnodes, search for "Vnodes: An Architecture for Multiple File System Types in Sun UNIX" by S.R Kleiman. ]]

Quotes of the week

Posted Jul 5, 2010 15:38 UTC (Mon) by nix (subscriber, #2304) [Link]

Any program that would break if we changed locking so that POSIX and BSD locks could block each other would already break when used on NFS, as NFS maps one to the other. I have seen this with some mail client which tried to take all possibly locks just to be sure it was safe. So it would take a POSIX lock and a BSD lock on the mail file (and probably create a .lock file as well). It didn't work over NFS.
One of procmail's four hundred million locking variants does this, as well. Thankfully it has an extensive set of tests at configure time so it won't choose that option if you let it test locking over NFS.
If we changed POSIX locks to follow fds it is very likely that nothing would break. However I wouldn't recommend it. Rather we should add new fcntl commands, F_SETLK_FD and F_SETLKW_FD (or similar) which set a lock and tie it to the FD rather than to the process.
Hm, we'd have to wait for userspace to pick it up, but thankfully adding new fcntl() commands is fairly easy even for programs with no configure step: they can just #ifdef the new constant.
The filesystem knows when an fd is closed, but it doesn't know which one.
... and it presumably also doesn't know if there are many fds still open to a single file. Sigh, nice internal API design guys.

Quotes of the week

Posted Jul 5, 2010 23:53 UTC (Mon) by mb (subscriber, #50428) [Link]

> but thankfully adding new fcntl() commands is fairly easy even for programs with no configure step: they can just #ifdef the new constant.

Well, that doesn't tell you anything about whether the kernel actually supports the new fcntl. You'll have to call the fcntl and see whether it fails (with some specific error code? ENOSYS?) and retry with the older fallback.

Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds