Not logged in
Log in now
Create an account
Subscribe to LWN
An unexpected perf feature
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
PostgreSQL 9.3 beta: Federated databases and more
LWN.net Weekly Edition for May 9, 2013
I didn't want to read past that, sorry. I can also imagine that things should be some way or the other. They are, however, not.
Posted Mar 16, 2009 22:27 UTC (Mon) by quotemstr (subscriber, #45331)
Posted Mar 16, 2009 22:36 UTC (Mon) by bojan (subscriber, #14302)
Actually it does. If you _don't_ allow for what POSIX specifies in your applications (which is where the problem is), then there will be consequences (i.e. the applications will lose files).
This can be properly fixed in two ways:
1. By calling fsync() from the application when required.
2. By introducing something new that does what you keep talking about.
Overloading specified behaviour with unspecified things is dangerous, because it encourages application writers to do the wrong thing. We've seen that before with XFS and wrong people got blamed that time too.
Sure, Ted is a practical person, so he doesn't want to break things unnecessarily. I admire him for keeping his cool.
Posted Mar 16, 2009 23:32 UTC (Mon) by jamesh (guest, #1159)
In cases where POSIX does not specify behaviour, it is left up to the implementation. If the choice is between trying to provide the runtime atomic rename guarantee over a crash or slightly higher performance, I'd pick the first option. After all, that's why I am running a crash-resistant file system in the first place.
Are you seriously saying you can't understand the benefits of delaying IO but preserving the order of certain operations over a "do it now" fsync() call?
Posted Mar 16, 2009 23:42 UTC (Mon) by bojan (subscriber, #14302)
> Are you seriously saying you can't understand the benefits of delaying IO but preserving the order of certain operations over a "do it now" fsync() call?
1. Yes, I can understand it.
2. No, this is not what rename() specifies.
So, when an application writer thinks that it will be like that everywhere, he/she is wrong and the application may lose data. That is bad.
Hence, I'm suggesting that for the cases where ordered rename is warranted, we should have a new API.
PS. As I explained elsewhere, unordered rename has its use as well, so one cannot just assume that everyone should drop that and do ordered. It is also not practical to demand that, because too many systems would have to be audited and changed to achieve it. And before you say "but don't we have to fix more apps already" - well, the applications are buggy right now according to specification - not the other way around.
Posted Mar 17, 2009 0:18 UTC (Tue) by nix (subscriber, #2304)
Ten years ago, would you have been arguing that programs that relied on
symlinks were broken because POSIX did not require them?
Posted Mar 17, 2009 0:31 UTC (Tue) by bojan (subscriber, #14302)
If the programs correctly tested to see if the support is there and then refused to work if symlinks were not there, there would be nothing wrong with them. So, by all means, if you write an application that tests that the underlying FS has ordered renames and refuses to work otherwise with sloppy open()/write()/close()/rename() sequence, that's perfectly OK. You just need to write even _more_ code to do this then if you just used fsync(). Up to you.
Posted Mar 17, 2009 1:24 UTC (Tue) by nix (subscriber, #2304)
This is actually worse. If you get the open()/write()/fsync()/close()/
rename() sequence wrong, by leaving out the fsync(), the visible effect
during development is *nil*, even on filesystems like pre-patch ext4,
because this is a change which only has an effect when something goes
really wrong and the OS crashes or you lose power at the wrong instant,
and if that happens, any data loss will be written off to the power
failure, like as not.
Expecting any but the most skilled developers to remember that fsync()
when omitting it has *no visible negative consequence* in normal operation
is a complete and total pipe-dream. You can wish all you will, but only a
few percent will ever conform.
It is much better to arrange to do the right thing in the filesystem,
which *does* have especially skilled people hacking at it, than in the
vast mass of wildly-varying-in-quality code out there in the real world.
Posted Mar 17, 2009 2:17 UTC (Tue) by bojan (subscriber, #14302)
WOW! Programs have bugs. Imagine that ;-)
> Expecting any but the most skilled developers to remember that fsync() when omitting it has *no visible negative consequence* in normal operation is a complete and total pipe-dream.
The no negative visible consequence applies to one file system in one mode _only_ (and according to some, not even on it all the time). The rest - it depends.
If you ever tried to debug a race condition, you'd know that it can be really hard to do, because the system doesn't get into such conditions all the time. Did someone guarantee to you that programming was going to be easy? I must have missed that lesson ;-)
Oh, and for all the forgetful unskilled developers: man 2 close. I sure needed it :-(
> You can wish all you will, but only a few percent will ever conform.
And their applications will still suck and they will still rely on hacks in file systems to work. And of course, people doing this will be the ones loudest complaining that "file system is broken" when they encounter problems on another platform. Not even my six year old is this childish. But, hey - that's life.
> It is much better to arrange to do the right thing in the filesystem, which *does* have especially skilled people hacking at it, than in the vast mass of wildly-varying-in-quality code out there in the real world.
All you need to do is this:
1. Convince all FS writers to only use new semantics.
2. Convince POSIX folks to change the spec.
Good luck doing that.
PS. The vast majority of people do not program using APIs we are talking about here. They are using libraries that wrap all this up, other programming languages that have calls that wrap all this up etc. These will be written by people familiar with lower level POSIX APIs we are talking about here. For a good example, see: http://mail.gnome.org/archives/gtk-devel-list/2009-March/...
Posted Mar 17, 2009 2:23 UTC (Tue) by bojan (subscriber, #14302)
Of course, I mean your supposed vast majority that won't do the fsync here.
Posted Mar 17, 2009 2:26 UTC (Tue) by quotemstr (subscriber, #45331)
And as for getting filesystems to change -- that's going to be the case. Any widely-used filesysem will encounter the same problem that started this mess, and will either implement the same fix or suffer the fate of XFS.
Posted Mar 17, 2009 2:35 UTC (Tue) by bojan (subscriber, #14302)
BTW, people already started fixing the code. Or didn't you read that GTK thread?
PS. Even Ted's workarounds in ext4 do not do full ordered rename in all cases. These are only for the cases of the most widely known application breakage. But, if you keep insisting, he may do the lockup-on-fsync for you, ext3 style, just so that you can get that nice UI feeling in properly written apps ;-)
Posted Mar 17, 2009 2:37 UTC (Tue) by quotemstr (subscriber, #45331)
Posted Mar 17, 2009 2:44 UTC (Tue) by bojan (subscriber, #14302)
Posted Mar 17, 2009 20:37 UTC (Tue) by nix (subscriber, #2304)
I repeat: omitting fsync() has no negative visible consequence *in normal
operation* on *any* POSIX-compliant system. Turning off the power or
locking up the box is *not* 'normal operation'.
I know of no developers of anything other than full-blown databases who do
anything like that to test their programs. Thus, for nearly all programs,
omitting fsync() is harmless during the development and testing phase.
Thus, it will regularly be omitted, *no matter what* you might wish.
... and, um, changing POSIX really isn't that hard. Make a good case that
some behaviour is common enough and POSIX will bend. The Austin Group is
populated with normal human beings^W^Wraging pedants like you or I, not
gods. (There are some demigods there, though.)
It is quite possible to convince them that a change is needed, and POSIX
regularly changes semantics in new release.
Posted Mar 17, 2009 0:33 UTC (Tue) by bojan (subscriber, #14302)
Posted Mar 17, 2009 8:35 UTC (Tue) by jamesh (guest, #1159)
We are talking about a case that POSIX leaves undefined here. An OS can wipe the disk on system crash and still be POSIX compliant.
We are in the realm of implementation defined behaviour, so talking of "applications doing what POSIX requires" doesn't really make sense. Claiming that the applications are buggy in a case where the specification offers no guidance doesn't help anyone.
Ext4's crash resistance is a desirable feature that exceeds the minimum requirements needed for POSIX conformance. Preserving atomic renames over a crash also exceeds those minimum requirements.
I'd be willing to pay the performance penalty from providing this behaviour in the same way I am willing to pay the performance penalty from metadata journaling.
A filesystem's job is not to punish users for application developers' oversights.
Posted Mar 18, 2009 0:52 UTC (Wed) by xoddam (subscriber, #2322)
The *only* behaviour under discussion is recoverability across system failures. That's what POSIX doesn't (can't) guarantee, and it's what a journaling filesystem is supposed to provide *in addition* to the POSIX guarantees.
System administrators and users choose to run journaling filesystems so they don't waste time cleaning up after a crash. A journaling filesystem that makes it more, not less, likely for users to lose data isn't doing its job.
POSIX guarantees atomicity of rename -- while the system is running. Application developers code to that guarantee, without particular reference to what happens when the power is cut or some video driver scribbles on the kernel heap. If the system crashes, there is no POSIXLY_CORRECT guarantee that anything will be recoverable at all. Whether you use fsync or not.
A journaling filesystem is supposed to provide more reasonable behaviour FOR USERS. Its job is not to punish users for the corner cases that application developers didn't consider.
Posted Mar 18, 2009 0:14 UTC (Wed) by dvdeug (subscriber, #10998)
Posted Mar 17, 2009 20:44 UTC (Tue) by man_ls (subscriber, #15091)
Now you're just trolling.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds