A kernel change breaks GlusterFS
Posted Mar 28, 2013 10:55 UTC (Thu) by
mkerrisk (editor, #1978)
In reply to:
A kernel change breaks GlusterFS by zlynx
Parent article:
A kernel change breaks GlusterFS
But there *was* no ABI change. A 64-bit value continued to hold 64 bits.
The question is where you consider the definition of the ABI to be. Is
it some documented standard ("this is a 64-bit field"), or is it "the behavior as (it appears to be) implemented" ("only 32 bits are ever used in this field")? The GlusterFS folks clearly took it to be the latter. One can argue that it was a questionable decision, but given the problem they were trying to solve, and the constraints on how much information they could pass in the cookie sent over NFSv3, it wasn't a completely insane thing to do, given the observed kernel behavior.
This is exactly like C programmers who complain when their undefined behavior changes.
The analogy doesn't really hold. For C, there is a very carefully defined standard that thoroughly specifies behavior and notes the cases where behavior is undefined. For much of the kernel API, there is nothing like such precise documentation/specification. This leaves user-space programmers trying to make guesses about what is or is not permissible, and that is exactly the hole that the Gluster folk fell into. And as noted by another commenter, the Samba folk fell into the same hole. The fact that two independent groups fell into the same hole is quite telling, in my view.
Back to kernel examples, if some user space program begins
relying on the number of columns when parsing the proc
files is that the kernel's problem? No. That is just
a badly written program.
I think that's a weak example to support your argument, because the advice that one should parse /proc defensively is reasonably well known. And don't get me wrong, your argument is reasonable, but I think it's far from definitive.
Returning to my point about documented standards versus "the behavior as (it appears to be) implemented"... The Linux kernel violates standards in a number of
places, and when it comes down to contradictions between documented
behavior (man pages and standards) versus existing implementation, Linus
always firmly plumps for the latter (unless the existing behavior is
causing actual pain to user space).
And take a look at the EPOLLWAKEUP example referred to in the article. In that case, the problem was that a program was setting random bits in the epoll API that formerly had no meaning. The application had *no* good reason to set those bits, because they had absolutely no effect (and unfortunately there was no kernel check to give an error when that was done). When someone tried to give those bits a meaning, the application broke. The response was not to say: user-space, go fix your stupid application; that would just inflict pain on thousands of users as their binaries break. Instead the response was: we'll need to modify this kernel patch in such a way that it does not break user-space (and that changed _decreased_ the usability of the kernel feature that was added). The argument in that case that the kernel should change was much weaker than the argument would have been for accommodating GlusterFS, if the GlusterFS problem had actually been detected in time
You can't have it both ways. Linus pretty consistently goes one way, and I can see his point (though I've disagreed with some specific cases in the past).
(
Log in to post comments)