Revisiting "too small to fail"
Revisiting "too small to fail"
Posted May 23, 2017 10:02 UTC (Tue) by nix (subscriber, #2304)In reply to: Revisiting "too small to fail" by vbabka
Parent article: Revisiting "too small to fail"
Posted May 23, 2017 12:13 UTC (Tue)
by epa (subscriber, #39769)
[Link] (5 responses)
Posted May 26, 2017 23:39 UTC (Fri)
by nix (subscriber, #2304)
[Link] (4 responses)
I asked them what exactly it's meant to do if the server goes down, and they look at me, puzzled, as if this is impossible to conceive of.
(These are grizzled-enough veterans that they probably consider Unix to be what BSD did -- POSIX? Who reads that?)
Posted May 30, 2017 1:37 UTC (Tue)
by neilbrown (subscriber, #359)
[Link] (3 responses)
This must be pre-4.3BSD (or there abouts). They clearly don't know about EDQUOT.
> I asked them what exactly it's meant to do if the server goes down
We are talking about "disk I/O" here. There is no server. NFS just pretends, fakes a lot of stuff, glosses over the differences, mostly works but sometimes doesn't do quite what you want.
If you want a new contract between the application and the storage backend, you need to write one. You cannot just assume that an old contract can magically work in a new market place.
We could invent O_REMOTE which the application uses to acknowledge that the data might be stored in a remote location, and that it is prepare to handle the errors that might be associated with that - e.g. ETIMEDOUT ???
It really doesn't help to just whine because something doesn't magically match your perceived use-case. The only sensible way forward is to provide a clear design of a set of semantics that you would like to be available. Then we can have a meaningful discussion.
Posted May 31, 2017 13:45 UTC (Wed)
by nix (subscriber, #2304)
[Link] (2 responses)
I suspect they were just shocked to get -EINTR from an NFS disk and were trying to argue that you should never need to check for short reads ever. Running out of quota, rather than disk space, is a sufficiently obscure edge case that even I'd forgotten about it. (They were doubly sure that you shouldn't need to check for -ENOSPC when writing inside files, and were surprised when I mentioned sparse files... sigh.)
Note: I said these people were grizzled, not that they were skilled. This was just a common belief among at least some of the "grunts on the ground" in the Unix parts of the City of London in the late 90s, is all... if they were skilled they would not have been working where they were, but somewhere else in the City that paid a lot more!
> We are talking about "disk I/O" here. There is no server. NFS just pretends, fakes a lot of stuff, glosses over the differences, mostly works but sometimes doesn't do quite what you want.
Given the ubiquity of NFS and Samba, this seems unfortunate, but it is true that the vast majority of applications are not remotely ready to deal with simple network failures, let alone a split-brain situation in a distributed filesystem! (Given the number of errors that even distributed consensus stores make in this area, I'm not sure *anyone* is truly competent to write code that runs atop something like that.)
> It really doesn't help to just whine because something doesn't magically match your perceived use-case. The only sensible way forward is to provide a clear design of a set of semantics that you would like to be available. Then we can have a meaningful discussion.
Agreed, but I'd also like to find one that doesn't break every application out there, nor suddenly stop them working over NFS: that's harder! (My $HOME has been on NFSv3, remote from my desktop, for my entire Unix-using life, so I have a very large and angry dog in this race: without NFS I can't do anything at all. Last week I flipped to NFSv4, and unlike last time, a couple of years back, it worked perfectly.)
Something like your proposed *_REMOTE option would seem like a good idea, but even that has problems: one that springs instantly to mind is libraries that open an fd on behalf of others. That library might be ready to handle errors, but what about the other things that fd gets passed off to? (The converse also exists: maybe the library doesn't use that flag because almost all it does with it is hands the fd back, so it never gets updated, even though the whole of the rest of the application is ready to handle -ESPLITBRAIN or whatever.)
Frankly I'm wondering if we need something better than errno and the ferociously annoying short reads thing in this area: a new set of rules that allows you to guarantee no short reads / writes but comes with extras wrapped around that, perhaps that the writes might later fail and throw an error back at you over netlink or something.
That all seems very asynchronous, but frankly we need that for normal writes too: if a writeback fails it's almost impossible for an application to tell *what* failed without fsync()ing everywhere and spending ages waiting... but this probably requires proper AIO so you can submit IOs, get an fd or other handle to them, then query for the state of that handle later. And we know how good Linux is in *that* area :( just because network I/Os are even more asynchronous by nature than other I/Os, and more likely to have bus faults and the like affecting them, doesn't mean that the same isn't true of disk I/O too. Disks are on little networks, after all, and always have been, and with SANs they're on not-so-little networks too.
(This is not even getting into how much more horrible this all gets when RAID or other 1:N or N:1 stuff gets into the picture. At least RAID split across disks on different machines is relatively rare, though it has saved my bacon in the past to be able to run md partially across NBD for a while during disaster recovery!)
Posted Jun 2, 2017 18:11 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (1 responses)
My moan at the moment is people who think that just because raid CAN detect integrity errors, then it SHOULDN'T. Never mind. I ought to use it as an exercise to learn kernel programming.
But it does appear that a lot of what the kernel does is still stuck in the POSIX mindset. It would be nice if people could sit down and say "POSIX is so last century, what should linux do today?". I think the problem is, though, as Linus said, it's like herding cats ...
Cheers,
Posted Jun 3, 2017 21:55 UTC (Sat)
by nix (subscriber, #2304)
[Link]
Revisiting "too small to fail"
Revisiting "too small to fail"
Revisiting "too small to fail"
What would it mean to memory-map a file opened with O_REMOTE? That you are happy to receive SIGBUS?
What does it mean to execveat() a file, passing AT_REMOTE?? Should it download the whole file (and libraries?) and cache them locally before succeeding?
Revisiting "too small to fail"
Revisiting "too small to fail"
Wol
Revisiting "too small to fail"
