LWN.net Logo

Shared pain

Shared pain

Posted Feb 9, 2012 1:26 UTC (Thu) by dlang (✭ supporter ✭, #313)
In reply to: Shared pain by Wol
Parent article: XFS: the filesystem of the future?

> And what is the poor programmer to do if he doesn't have access to fsync?

use a language that gives them access to data integrity tools like fsync.

for shell scripts, either write a fsync wrapper, or use the sync command (which does exactly the same as fsync on ext3)

> Or what are the poor lusers supposed to do as their system grinds to a halt with all the disk io as programs hang waiting for the disk?

use a better filesystem that doesn't have such horrible performance problems with applications that try and be careful about their data.

> Following the spec is not an end in itself.

True, but what you are asking for is for the spec to be changed, no matter how much it harms people who do follow the spec (application programmers and users who care about durability)

There is no filesystem that you can choose to use that will not loose data if the system crashes. If you are expecting something different, you need to change your expectation.


(Log in to post comments)

Shared pain

Posted Feb 9, 2012 7:39 UTC (Thu) by khim (subscriber, #9252) [Link]

Somehow you've forgotten about the most sane alternative:
Remove XFS from all the computers and use sane filesystems (extX, btrfs when it'll be more stable) exclusively.

In a battle between applications and filesystems applications win 10 times out of 10 because without applications filesystems are pointless (and applications are pointless without the user's data).

The whole discussion just highlights that XFS is categorically, absolutely, totally unsuitable for the use as general-purpose FS. And when you don't care about data integrity then ext4 without journalling is actually faster (see Google datacenters, for example).

True, but what you are asking for is for the spec to be changed, no matter how much it harms people who do follow the spec

Yes.

application programmers and users who care about durability

Applications don't follow the spec. When they do they are punished and fixed. Thus users who care about durability need to use filesystems which work correctly given the existing applications.

Is it fair? No. It's classic vicious cycle. But said sycle is fact of the life. Ignore it at your peril.

I, for one, have a strict policy to never use XFS and to don't even consider bugs which can not be reproduced with other filesystems. Exactly because XFS developers think specs trump reality for some reason.

There is no filesystem that you can choose to use that will not loose data if the system crashes. If you are expecting something different, you need to change your expectation.

That's irrelevant. True, the loss of data in the case of system crash is unavoidable. I don't care if the window I've opened right before crash in Firefox is reopened or not. I understand that spinning rust is slow and can lose such info. But if the windows which were opened hour before that are lost because XFS replaced save state file with zeros then such filesystem is useless in practice. Long time ago XFS was prone to such data loss even if fsync was used and data was "saved" to disk days before crash. After a lot of work looks like XFS developers fixed this issue, but now they are stuck with the next step: atomic rename. It should be implemented for the FS to be suitable for real-world applications. There are even some hints that XFS have implemented it, but as long as XFS developer will exhibit this "specs are important, real applications don't" pathological thinking it's way too dangerous to even try to use XFS.

Shared pain

Posted Feb 9, 2012 9:12 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

if you use applications that follow the specs (for example, just about every database, or mailserver), then XFS/ext4/btrfs/etc are very reliable.

what you seem to be saying is that these classes of programs should be forced to use filesystems that give them huge performance penalties to accommodate other programs that are more careless, so that those careless programs loose less data (not no data loss, just less)

Shared pain

Posted Feb 9, 2012 9:19 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

by the way, I've done benchmarks on applications that do the proper fsync dance needed for the data to actually be safe (durable, not just atomic filesystem renames that may or may not get written to disk), and even on an otherwise idle system ext3 was at least 2x slower, and if you have other disk activity going on at the same time, the problem only goes up (if you hae another process writing large amounts of data, the performance difference for your critical app can easily be 40x slower on ext3)

Shared pain

Posted Feb 9, 2012 17:37 UTC (Thu) by khim (subscriber, #9252) [Link]

Exactly. This is part of the very simple proof sequence.

Fact 1: any application which calls fsync is very slow in ext3. You've just observed it.
Conclusion: most applications don't call fsync.
Fact 2: most systems out there are either "small" (where a lot of applications share one partition) or huge (where reliability of filesystem does not matter because there are other ways to keep data around like GFS).
Conclusion: any real-world filesystem needs to support all the application which are "wrong" and don't call fsync, too.
Fact 3: XFS does not provide these guarantees (and tries to cover it with POSIX, etc).
Conclusion: XFS? Fuhgeddaboudit.

Yes, it's not fair to XFS. No, I don't think being fair is guaranteed in real world.

Shared pain

Posted Feb 9, 2012 19:26 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

sorry, on my systems I'm not willing to tolerate a 50x slowdown just to make badly written apps be a little less likely to be confused after a power outage.

and I think that advocating that you have the right to make this choice for everyone else is going _way_ too far.

when I have applications that loose config data after a problem happens (which isn't always a system crash, apps that have this sort of problem usually have it after the application crashes as well), my solution is backups of the config (idealy into something efficient like git), not crippling the rest of the system to band-aid the bad app.

Shared pain

Posted Feb 9, 2012 20:44 UTC (Thu) by Wol (guest, #4433) [Link]

And what is "badly written" about an app that expects the computer to do what was asked of it?

I know changing things around for the sake of it doesn't matter when everything goes right, but if I tell the computer "do this, *then* that, *followed* by the other", well, if I told an employee to do it and they did things in the wrong order and screwed things up as a *direct* *result* of messing with the order, they'd get the sack.

The only reason we're in this mess, is because the computer is NOT doing what the programmer asked. It thinks it knows better. And it screws up as a result.

And the fix isn't that hard - just make sure you flush the data before the metadata (or journal the data too), which is pretty much (a) sensible, and (b) what every user would want if they knew enough to care.

Cheers,
Wol

Shared pain

Posted Feb 9, 2012 20:52 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

it is badly written because you did not tell the computer that you wanted to make sure that the data was written to the drive in a particular order.

If the system does not crash, the view of the filesystem presented to the user is absolutely consistent, and the rename is atomic.

The problem is that there are a lot of 'odd' situations that you can have where data is written to a file while it is being renamed that make it non-trivial to "do the right thing" because the system is having to guess at what the "right thing" is for this situation.

try running a system with every filesystem mounted with the sync option, that will force the computer to do exactly what the application programmers told it to do, writing all data exactly when they tell it to, even if this means writing the same disk sector hundreds of times as small writes happen. The result will be un-usable.

so you don't _really_ want the computer doing exactly what the programmer tells it to, you only want it to do so some of the time, not the rest of the time.

Shared pain

Posted Feb 9, 2012 21:13 UTC (Thu) by khim (subscriber, #9252) [Link]

so you don't _really_ want the computer doing exactly what the programmer tells it to, you only want it to do so some of the time, not the rest of the time.

Sure. YMMV as I've already noted. Good filesystem for USB sticks must flush on close(2) call. Good general purpose filesystem must guarantee rename(2) atomicity in the face of system crash.

You can use whatever you want for your own system - it's you choice. But when question is about replacement of extX… it's other thing entirely. To recommend filesystem which likes to eat user's data is simply irresponsible.

Shared pain

Posted Feb 14, 2012 16:16 UTC (Tue) by nye (guest, #51576) [Link]

>when I have applications that loose config data after a problem happens (which isn't always a system crash, apps that have this sort of problem usually have it after the application crashes as well)

That can't possibly be the case. You must be talking about applications which do something like truncate+rewrite, which is entirely orthogonal to the discussion (and is pretty clearly a bug).

I suspect you haven't understood the issue at hand.

Shared pain

Posted Feb 9, 2012 17:25 UTC (Thu) by khim (subscriber, #9252) [Link]

What you seem to be saying is that these classes of programs should be forced to use filesystems that give them huge performance penalties to accommodate other programs that are more careless, so that those careless programs loose less data

In a word: yes.

not no data loss, just less

Always and forever. No matter what filesystem you are using you data is toast in a case of RAID failure or lightning strike. This means that we always talk about probabilities.

This leads us to detailed explanation of the aforementioned phenomenon: in most cases you can not afford dedicated partitions for your database or mailserver and is this world filesystem without suitable reliability guarantees (like atomic rename in a crash case without fsync) is pointless. When your system grows it becomes good idea to dedicate server just to be a mailserver or just to be a database server. But the window of opportunity is quite small because when you go beyond handful of servers you need to develop plans which will keep your business alive in a face of hard crash (HDD failure, etc). And if you've designed your system for such a case then all these journalling efforts in a filesystem are just a useless overhead (see Google which switched from ext2 to ext4 without journal).

I'm not saying XFS is always useles. No, there are exist cases where you can use it effectively. But these are rare cases thus XFS will always be undertested. And this, in turn, usually means you should stick with extX/btrfs.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds