Shared pain
Shared pain
Posted Feb 3, 2012 4:42 UTC (Fri) by raven667 (subscriber, #5198)In reply to: Shared pain by dgc
Parent article: XFS: the filesystem of the future?
I'm going to go out on a limb and say that there are more people who are familiar with expected ext3 behavior than the entire number of people who have run UNIX so I do think that ext3-like behavior is what programmers in general expect these days.
Posted Feb 3, 2012 5:01 UTC (Fri)
by neilbrown (subscriber, #359)
[Link] (26 responses)
Yes, there is room for improvement - there always is. Copying a mistake because it has some good features is not a wise move.
As Dave said - if there is a problem, let's fix it properly.
(and yes, my beard is gray (or heading that way)).
Posted Feb 3, 2012 5:16 UTC (Fri)
by raven667 (subscriber, #5198)
[Link] (24 responses)
Posted Feb 3, 2012 5:25 UTC (Fri)
by dlang (guest, #313)
[Link] (23 responses)
if you could get the advantages without the drawbacks, of course it would be nice, but the same flaw in the ext3 logic that gives you one also gives you the other.
Posted Feb 3, 2012 5:49 UTC (Fri)
by raven667 (subscriber, #5198)
[Link] (22 responses)
Its not even about coding carefully, doing the "correct" thing is not even possible in many of the use cases which are protected by the default ext3 behavior such as atomically updating a file from a program which is not in C such as a shell script. I learned, along with many admins, to use the atomic rename behavior to implement "safe" updates which may have been a misunderstanding at the time but can now be considered the new requirement.
At the time this issue was discovered with ext4 there was a frank exchange of ideas and the realization that the expected rename behavior is beneficial to overall reliability and we should make it work properly. I'd be interested in seeing this kind of thing handled at the VFS layer so that the behavior is consistant across all filesystems, that sounds like a great idea.
Posted Feb 6, 2012 23:33 UTC (Mon)
by dlang (guest, #313)
[Link] (21 responses)
yes the we can look at changing the standard, but the way to do that is to talk about changing the standard, not insist that the behavior of one filesystem is the only 'correct' way to do things and that all filesystem developers don't care about your data.
Posted Feb 7, 2012 23:47 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (20 responses)
And IT IS LOGICALLY IMPOSSIBLE if the computer actually does what the programmer asked it to. THAT is the problem - the computer ends up in an "impossible" state.
And if it is logically impossible to end up there, at least in the programmer's mind, it is also logically impossible to make allowances for it and fix the system!
The state, as per the program's world view, is
If the computer crashes in the middle of this we "magically" end up in state (d) old file is full of zeroes.
How do you program to fix a state that it is not logically possible to get to? In such a way as the program is actually guaranteed to work properly and portably?
Cheers,
Posted Feb 7, 2012 23:53 UTC (Tue)
by neilbrown (subscriber, #359)
[Link] (19 responses)
Writing to a file has never made the data safe in the event of a crash. fsync is needed for that.
If the programmer did not issue 'fsync' but still expected the data to be safe after a crash, then the programmer made a programming error. It really is that simple.
Incorrectly written programs often produce pathological behaviour - it shouldn't surprise you.
Posted Feb 8, 2012 3:41 UTC (Wed)
by mjg59 (subscriber, #23239)
[Link] (6 responses)
Posted Feb 8, 2012 4:08 UTC (Wed)
by neilbrown (subscriber, #359)
[Link] (5 responses)
That's the heart of the matter to me.... but now XFS - a filesystem that didn't cripple correct applicatons - is getting a hard time because it doesn't follow the lead of a filesystem that did.
And yes, I know, technical excellence doesn't determine market success, and even the best contender must adapt or die when facing with an ill-informed market. So maybe XFS should adopt the extX model for rename even though it hurts performance in some cases - because if it doesn't people might choose not to use it - and who wants to be the best filesystem that nobody uses (though XFS is a long way from that fate).
So I'm just being a lone voice trying to teach the history and show people that the feature they like so much was originally a mistake and the programs that use it are actually incorrect (or at least not-portable) and maybe there are hidden costs in the thing they keep asking for..
I don't expect to be particularly successful, but that is no justification for being silent.
Posted Feb 8, 2012 12:38 UTC (Wed)
by mjg59 (subscriber, #23239)
[Link]
(Abstract you throughout)
Posted Feb 8, 2012 13:24 UTC (Wed)
by nye (subscriber, #51576)
[Link] (1 responses)
Then you have misunderstood the nature of the problem.
The problem is that there are cases when atomicity is required but durability is not so important. With ext3 (et al.) it is possible to get one without the other, but with XFS (et al.) atomicity can only be gained as a side-effect of durability, which is more expensive.
Thus, ext3 provides a feature which XFS does not - one which filesystem developers, as a rule, don't seem to care about, but application developers, as a rule, do. The characterisation of anyone who actually cares for that feature as 'ill-informed' is grating, even offensive to many.
General addendum, not targeted at you specifically: falling back to the observation that XFS's behaviour is POSIX-compliant is pointless because - though true - it is vacuous. In fact POSIX doesn't specify anything in the case of power loss or system crashes, hence it would be perfectly legal for a POSIX-compliant filesystem to fill your hard drive with pictures of LOLcats.
Posted Feb 8, 2012 22:29 UTC (Wed)
by dlang (guest, #313)
[Link]
with any filesystem you have atomic renames IF THE SYSTEM DOESN'T CRASH before the data is written out, that's what the POSIX standard provides.
ext3 gains it's 'atomic renames' as a side effect of a bug, it can't figure out what data belongs to what, so if it's trying to make sure something gets written out it must write out ALL pending data, no matter what the data is part of. That made it so that if you are journaling the rename, all the writes prior to that had to get written out first (making the rename 'safe'), but the side effect is that all other pending writes, anywhere in the filesystem also had to be written out, and that could cause 10's of seconds of delay.
for the casual user, you argue that this is "good enough", but for anyone who actually wants durability, not merely atomicity in the face of a crash has serious problems.
ext4 has a different enough design that they can order the rename after the write of the contents of THAT ONE file, so they can provide some added safety at relatively little cost
you also need to be aware that without the durability, you can still have corrupted files in ext3 after a crash, all it takes is any application that modifies a file in place, including just appending to the end of the file
Posted Feb 8, 2012 19:48 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (1 responses)
And a filesystem that throws away user data IS unfit for purpose. After all, what was the point of journalling? To improve boot times after a crash and get the system back into production quicker. If you need to do data integrity check on top of your filesystem check you've just made your reboot times far WORSE - a day or two would not be atypical after a crash!
Cheers,
Posted Feb 8, 2012 20:51 UTC (Wed)
by raven667 (subscriber, #5198)
[Link]
Posted Feb 8, 2012 15:13 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (11 responses)
And what is the poor programmer to do if he doesn't have access to fsync?
Or what are the poor lusers supposed to do as their system grinds to a halt with all the disk io as programs hang waiting for the disk?
Following the spec is not an end in itself. Getting the work done is the end. And if the spec HINDERS people getting the work done, then it's the spec that needs to change, not the people.
THAT is why Linux is so successful. Linus is an engineer. He understands that. "DO NOT UNDER ANY CIRCUMSTANCES WHATSOEVER break userspace" is the mantra he lives by. And filesystems eating your data while making everything *appear* okay is one of the most appalling breaches of faith by the computer that it could commit!
Cheers,
Posted Feb 9, 2012 1:26 UTC (Thu)
by dlang (guest, #313)
[Link] (10 responses)
use a language that gives them access to data integrity tools like fsync.
for shell scripts, either write a fsync wrapper, or use the sync command (which does exactly the same as fsync on ext3)
> Or what are the poor lusers supposed to do as their system grinds to a halt with all the disk io as programs hang waiting for the disk?
use a better filesystem that doesn't have such horrible performance problems with applications that try and be careful about their data.
> Following the spec is not an end in itself.
True, but what you are asking for is for the spec to be changed, no matter how much it harms people who do follow the spec (application programmers and users who care about durability)
There is no filesystem that you can choose to use that will not loose data if the system crashes. If you are expecting something different, you need to change your expectation.
Posted Feb 9, 2012 7:39 UTC (Thu)
by khim (subscriber, #9252)
[Link] (9 responses)
In a battle between applications and filesystems applications win 10 times out of 10 because without applications filesystems are pointless (and applications are pointless without the user's data). The whole discussion just highlights that XFS is categorically, absolutely, totally unsuitable for the use as general-purpose FS. And when you don't care about data integrity then ext4 without journalling is actually faster (see Google datacenters, for example). Yes. Applications don't follow the spec. When they do they are punished and fixed. Thus users who care about durability need to use filesystems which work correctly given the existing applications. Is it fair? No. It's classic vicious cycle. But said sycle is fact of the life. Ignore it at your peril. I, for one, have a strict policy to never use XFS and to don't even consider bugs which can not be reproduced with other filesystems. Exactly because XFS developers think specs trump reality for some reason. That's irrelevant. True, the loss of data in the case of system crash is unavoidable. I don't care if the window I've opened right before crash in Firefox is reopened or not. I understand that spinning rust is slow and can lose such info. But if the windows which were opened hour before that are lost because XFS replaced save state file with zeros then such filesystem is useless in practice. Long time ago XFS was prone to such data loss even if fsync was used and data was "saved" to disk days before crash. After a lot of work looks like XFS developers fixed this issue, but now they are stuck with the next step: atomic rename. It should be implemented for the FS to be suitable for real-world applications. There are even some hints that XFS have implemented it, but as long as XFS developer will exhibit this "specs are important, real applications don't" pathological thinking it's way too dangerous to even try to use XFS.
Posted Feb 9, 2012 9:12 UTC (Thu)
by dlang (guest, #313)
[Link] (8 responses)
what you seem to be saying is that these classes of programs should be forced to use filesystems that give them huge performance penalties to accommodate other programs that are more careless, so that those careless programs loose less data (not no data loss, just less)
Posted Feb 9, 2012 9:19 UTC (Thu)
by dlang (guest, #313)
[Link] (6 responses)
Posted Feb 9, 2012 17:37 UTC (Thu)
by khim (subscriber, #9252)
[Link] (5 responses)
Exactly. This is part of the very simple proof sequence. Fact 1: any application which calls fsync is very slow in ext3. You've just observed it. Yes, it's not fair to XFS. No, I don't think being fair is guaranteed in real world.
Posted Feb 9, 2012 19:26 UTC (Thu)
by dlang (guest, #313)
[Link] (4 responses)
and I think that advocating that you have the right to make this choice for everyone else is going _way_ too far.
when I have applications that loose config data after a problem happens (which isn't always a system crash, apps that have this sort of problem usually have it after the application crashes as well), my solution is backups of the config (idealy into something efficient like git), not crippling the rest of the system to band-aid the bad app.
Posted Feb 9, 2012 20:44 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (2 responses)
I know changing things around for the sake of it doesn't matter when everything goes right, but if I tell the computer "do this, *then* that, *followed* by the other", well, if I told an employee to do it and they did things in the wrong order and screwed things up as a *direct* *result* of messing with the order, they'd get the sack.
The only reason we're in this mess, is because the computer is NOT doing what the programmer asked. It thinks it knows better. And it screws up as a result.
And the fix isn't that hard - just make sure you flush the data before the metadata (or journal the data too), which is pretty much (a) sensible, and (b) what every user would want if they knew enough to care.
Cheers,
Posted Feb 9, 2012 20:52 UTC (Thu)
by dlang (guest, #313)
[Link] (1 responses)
If the system does not crash, the view of the filesystem presented to the user is absolutely consistent, and the rename is atomic.
The problem is that there are a lot of 'odd' situations that you can have where data is written to a file while it is being renamed that make it non-trivial to "do the right thing" because the system is having to guess at what the "right thing" is for this situation.
try running a system with every filesystem mounted with the sync option, that will force the computer to do exactly what the application programmers told it to do, writing all data exactly when they tell it to, even if this means writing the same disk sector hundreds of times as small writes happen. The result will be un-usable.
so you don't _really_ want the computer doing exactly what the programmer tells it to, you only want it to do so some of the time, not the rest of the time.
Posted Feb 9, 2012 21:13 UTC (Thu)
by khim (subscriber, #9252)
[Link]
Sure. YMMV as I've already noted. Good filesystem for USB sticks must flush on close(2) call. Good general purpose filesystem must guarantee rename(2) atomicity in the face of system crash. You can use whatever you want for your own system - it's you choice. But when question is about replacement of extX… it's other thing entirely. To recommend filesystem which likes to eat user's data is simply irresponsible.
Posted Feb 14, 2012 16:16 UTC (Tue)
by nye (subscriber, #51576)
[Link]
That can't possibly be the case. You must be talking about applications which do something like truncate+rewrite, which is entirely orthogonal to the discussion (and is pretty clearly a bug).
I suspect you haven't understood the issue at hand.
Posted Feb 9, 2012 17:25 UTC (Thu)
by khim (subscriber, #9252)
[Link]
In a word: yes. Always and forever. No matter what filesystem you are using you data is toast in a case of RAID failure or lightning strike. This means that we always talk about probabilities. This leads us to detailed explanation of the aforementioned phenomenon: in most cases you can not afford dedicated partitions for your database or mailserver and is this world filesystem without suitable reliability guarantees (like atomic rename in a crash case without fsync) is pointless. When your system grows it becomes good idea to dedicate server just to be a mailserver or just to be a database server. But the window of opportunity is quite small because when you go beyond handful of servers you need to develop plans which will keep your business alive in a face of hard crash (HDD failure, etc). And if you've designed your system for such a case then all these journalling efforts in a filesystem are just a useless overhead (see Google which switched from ext2 to ext4 without journal). I'm not saying XFS is always useles. No, there are exist cases where you can use it effectively. But these are rare cases thus XFS will always be undertested. And this, in turn, usually means you should stick with extX/btrfs.
Posted Feb 3, 2012 10:19 UTC (Fri)
by khim (subscriber, #9252)
[Link]
This depends on your goal, actually. If your goal is something theoretically sound, then no, it's not a wise move. If your goal is creation of something which will actually be used by real users then it's the only possible move. My beard is not yet gray, but I was around long enough to see where the guys who did "wise move" ended. I must admit that they create really TRULY nice exhibits in Computer History Museum. Meanwhile creations of "unwise" guys are used for real work. If your implementation unintentionally introduced some property and people started depending on it - it's the end of story: you are doomed to support said property forever. If you want to keep these people, obviously. If your goal is just to create something nice for the sake of art or science, then situation is different, of course. This is basic fact of life and it's truly sad to see that so many Linux developers (especially the desktop guys) don't understand that.
Shared pain
Shared pain
Shared pain
Shared pain
Shared pain
Shared pain
(a) old file exists
(b) new file is written
(c) new file replaces old file
Wol
Shared pain
Shared pain
Shared pain
Shared pain
Shared pain
Shared pain
Shared pain
Wol
Shared pain
Shared pain
Wol
Shared pain
Somehow you've forgotten about the most sane alternative:
Shared pain
Remove XFS from all the computers and use sane filesystems (extX, btrfs when it'll be more stable) exclusively.
True, but what you are asking for is for the spec to be changed, no matter how much it harms people who do follow the spec
application programmers and users who care about durability
There is no filesystem that you can choose to use that will not loose data if the system crashes. If you are expecting something different, you need to change your expectation.
Shared pain
Shared pain
Shared pain
Conclusion: most applications don't call fsync.
Fact 2: most systems out there are either "small" (where a lot of applications share one partition) or huge (where reliability of filesystem does not matter because there are other ways to keep data around like GFS).
Conclusion: any real-world filesystem needs to support all the application which are "wrong" and don't call fsync, too.
Fact 3: XFS does not provide these guarantees (and tries to cover it with POSIX, etc).
Conclusion: XFS? Fuhgeddaboudit.Shared pain
Shared pain
Wol
Shared pain
Shared pain
so you don't _really_ want the computer doing exactly what the programmer tells it to, you only want it to do so some of the time, not the rest of the time.
Shared pain
Shared pain
What you seem to be saying is that these classes of programs should be forced to use filesystems that give them huge performance penalties to accommodate other programs that are more careless, so that those careless programs loose less data
not no data loss, just less
Shared pain
Yes, there is room for improvement - there always is. Copying a mistake because it has some good features is not a wise move.
(and yes, my beard is gray (or heading that way)).
