|
|
Log in / Subscribe / Register

Wishful thinking

Wishful thinking

Posted Mar 16, 2009 22:36 UTC (Mon) by bojan (subscriber, #14302)
In reply to: Wishful thinking by quotemstr
Parent article: Garrett: ext4, application expectations and power management

> It doesn't logically follow.

Actually it does. If you _don't_ allow for what POSIX specifies in your applications (which is where the problem is), then there will be consequences (i.e. the applications will lose files).

This can be properly fixed in two ways:

1. By calling fsync() from the application when required.
2. By introducing something new that does what you keep talking about.

Overloading specified behaviour with unspecified things is dangerous, because it encourages application writers to do the wrong thing. We've seen that before with XFS and wrong people got blamed that time too.

Sure, Ted is a practical person, so he doesn't want to break things unnecessarily. I admire him for keeping his cool.


to post comments

Wishful thinking

Posted Mar 16, 2009 23:32 UTC (Mon) by jamesh (guest, #1159) [Link] (14 responses)

This whole issue is about what should happen in a case that POSIX specifies, so I don't know why you keep on bringing this up.

In cases where POSIX does not specify behaviour, it is left up to the implementation. If the choice is between trying to provide the runtime atomic rename guarantee over a crash or slightly higher performance, I'd pick the first option. After all, that's why I am running a crash-resistant file system in the first place.

Are you seriously saying you can't understand the benefits of delaying IO but preserving the order of certain operations over a "do it now" fsync() call?

Wishful thinking

Posted Mar 16, 2009 23:42 UTC (Mon) by bojan (subscriber, #14302) [Link] (13 responses)

The problem is with portability. If you write your applications to _not_ do what POSIX requires, they will be broken when they go to a different system which happens to have an FS that doesn't order renames on disk.

> Are you seriously saying you can't understand the benefits of delaying IO but preserving the order of certain operations over a "do it now" fsync() call?

1. Yes, I can understand it.
2. No, this is not what rename() specifies.

So, when an application writer thinks that it will be like that everywhere, he/she is wrong and the application may lose data. That is bad.

Hence, I'm suggesting that for the cases where ordered rename is warranted, we should have a new API.

PS. As I explained elsewhere, unordered rename has its use as well, so one cannot just assume that everyone should drop that and do ordered. It is also not practical to demand that, because too many systems would have to be audited and changed to achieve it. And before you say "but don't we have to fix more apps already" - well, the applications are buggy right now according to specification - not the other way around.

Wishful thinking

Posted Mar 17, 2009 0:18 UTC (Tue) by nix (subscriber, #2304) [Link] (10 responses)

You're acting as if POSIX is set in stone and can never change to account
for new de-facto standards, when in reality that is the *only* way it ever
changes (and often Linux is the source of such changes).

Ten years ago, would you have been arguing that programs that relied on
symlinks were broken because POSIX did not require them?

Wishful thinking

Posted Mar 17, 2009 0:31 UTC (Tue) by bojan (subscriber, #14302) [Link] (8 responses)

> Ten years ago, would you have been arguing that programs that relied on symlinks were broken because POSIX did not require them?

If the programs correctly tested to see if the support is there and then refused to work if symlinks were not there, there would be nothing wrong with them. So, by all means, if you write an application that tests that the underlying FS has ordered renames and refuses to work otherwise with sloppy open()/write()/close()/rename() sequence, that's perfectly OK. You just need to write even _more_ code to do this then if you just used fsync(). Up to you.

Wishful thinking

Posted Mar 17, 2009 1:24 UTC (Tue) by nix (subscriber, #2304) [Link] (7 responses)

The vast majority of programs, even when symlinks were optional, assumed
their presence, because the enormous majority of the installed base had
them.

This is actually worse. If you get the open()/write()/fsync()/close()/
rename() sequence wrong, by leaving out the fsync(), the visible effect
during development is *nil*, even on filesystems like pre-patch ext4,
because this is a change which only has an effect when something goes
really wrong and the OS crashes or you lose power at the wrong instant,
and if that happens, any data loss will be written off to the power
failure, like as not.

Expecting any but the most skilled developers to remember that fsync()
when omitting it has *no visible negative consequence* in normal operation
is a complete and total pipe-dream. You can wish all you will, but only a
few percent will ever conform.

It is much better to arrange to do the right thing in the filesystem,
which *does* have especially skilled people hacking at it, than in the
vast mass of wildly-varying-in-quality code out there in the real world.

Wishful thinking

Posted Mar 17, 2009 2:17 UTC (Tue) by bojan (subscriber, #14302) [Link] (6 responses)

> The vast majority of programs, even when symlinks were optional, assumed their presence, because the enormous majority of the installed base had them.

WOW! Programs have bugs. Imagine that ;-)

> Expecting any but the most skilled developers to remember that fsync() when omitting it has *no visible negative consequence* in normal operation is a complete and total pipe-dream.

The no negative visible consequence applies to one file system in one mode _only_ (and according to some, not even on it all the time). The rest - it depends.

If you ever tried to debug a race condition, you'd know that it can be really hard to do, because the system doesn't get into such conditions all the time. Did someone guarantee to you that programming was going to be easy? I must have missed that lesson ;-)

Oh, and for all the forgetful unskilled developers: man 2 close. I sure needed it :-(

> You can wish all you will, but only a few percent will ever conform.

And their applications will still suck and they will still rely on hacks in file systems to work. And of course, people doing this will be the ones loudest complaining that "file system is broken" when they encounter problems on another platform. Not even my six year old is this childish. But, hey - that's life.

> It is much better to arrange to do the right thing in the filesystem, which *does* have especially skilled people hacking at it, than in the vast mass of wildly-varying-in-quality code out there in the real world.

All you need to do is this:

1. Convince all FS writers to only use new semantics.
2. Convince POSIX folks to change the spec.

Good luck doing that.

PS. The vast majority of people do not program using APIs we are talking about here. They are using libraries that wrap all this up, other programming languages that have calls that wrap all this up etc. These will be written by people familiar with lower level POSIX APIs we are talking about here. For a good example, see: http://mail.gnome.org/archives/gtk-devel-list/2009-March/...

Wishful thinking

Posted Mar 17, 2009 2:23 UTC (Tue) by bojan (subscriber, #14302) [Link]

> people doing this

Of course, I mean your supposed vast majority that won't do the fsync here.

Wishful thinking

Posted Mar 17, 2009 2:26 UTC (Tue) by quotemstr (subscriber, #45331) [Link] (3 responses)

The POSIX spec doesn't need to change one bit. Both behaviors entirely conform to POSIX.

And as for getting filesystems to change -- that's going to be the case. Any widely-used filesysem will encounter the same problem that started this mess, and will either implement the same fix or suffer the fate of XFS.

Wishful thinking

Posted Mar 17, 2009 2:35 UTC (Tue) by bojan (subscriber, #14302) [Link] (2 responses)

I see FS implementers shaking in their boots :-)

BTW, people already started fixing the code. Or didn't you read that GTK thread?

PS. Even Ted's workarounds in ext4 do not do full ordered rename in all cases. These are only for the cases of the most widely known application breakage. But, if you keep insisting, he may do the lockup-on-fsync for you, ext3 style, just so that you can get that nice UI feeling in properly written apps ;-)

Wishful thinking

Posted Mar 17, 2009 2:37 UTC (Tue) by quotemstr (subscriber, #45331) [Link] (1 responses)

Care to link to this thread?

Wishful thinking

Posted Mar 17, 2009 2:44 UTC (Tue) by bojan (subscriber, #14302) [Link]

Already have. You have to go a few posts up.

Wishful thinking

Posted Mar 17, 2009 20:37 UTC (Tue) by nix (subscriber, #2304) [Link]

>> Expecting any but the most skilled developers to remember that fsync()
>> when omitting it has *no visible negative consequence* in normal
>> operation is a complete and total pipe-dream.
>
> The no negative visible consequence applies to one file system in one
> mode _only_ (and according to some, not even on it all the time). The
> rest - it depends.

I repeat: omitting fsync() has no negative visible consequence *in normal
operation* on *any* POSIX-compliant system. Turning off the power or
locking up the box is *not* 'normal operation'.

I know of no developers of anything other than full-blown databases who do
anything like that to test their programs. Thus, for nearly all programs,
omitting fsync() is harmless during the development and testing phase.
Thus, it will regularly be omitted, *no matter what* you might wish.

... and, um, changing POSIX really isn't that hard. Make a good case that
some behaviour is common enough and POSIX will bend. The Austin Group is
populated with normal human beings^W^Wraging pedants like you or I, not
gods. (There are some demigods there, though.)

It is quite possible to convince them that a change is needed, and POSIX
regularly changes semantics in new release.

Wishful thinking

Posted Mar 17, 2009 0:33 UTC (Tue) by bojan (subscriber, #14302) [Link]

Oh, and if you want to change POSIX, please do so. I have no objection. As if my opinion mattered here ;-)

Wishful thinking

Posted Mar 17, 2009 8:35 UTC (Tue) by jamesh (guest, #1159) [Link]

> If you write your applications to _not_ do what POSIX requires, they will
> be broken when they go to a different system which happens to have an FS
> that doesn't order renames on disk.

We are talking about a case that POSIX leaves undefined here. An OS can wipe the disk on system crash and still be POSIX compliant.

We are in the realm of implementation defined behaviour, so talking of "applications doing what POSIX requires" doesn't really make sense. Claiming that the applications are buggy in a case where the specification offers no guidance doesn't help anyone.

Ext4's crash resistance is a desirable feature that exceeds the minimum requirements needed for POSIX conformance. Preserving atomic renames over a crash also exceeds those minimum requirements.

I'd be willing to pay the performance penalty from providing this behaviour in the same way I am willing to pay the performance penalty from metadata journaling.

A filesystem's job is not to punish users for application developers' oversights.

Posted Mar 18, 2009 0:52 UTC (Wed) by xoddam (subscriber, #2322) [Link]

This is *so* not about application developers or POSIX!

The *only* behaviour under discussion is recoverability across system failures. That's what POSIX doesn't (can't) guarantee, and it's what a journaling filesystem is supposed to provide *in addition* to the POSIX guarantees.

System administrators and users choose to run journaling filesystems so they don't waste time cleaning up after a crash. A journaling filesystem that makes it more, not less, likely for users to lose data isn't doing its job.

POSIX guarantees atomicity of rename -- while the system is running. Application developers code to that guarantee, without particular reference to what happens when the power is cut or some video driver scribbles on the kernel heap. If the system crashes, there is no POSIXLY_CORRECT guarantee that anything will be recoverable at all. Whether you use fsync or not.

A journaling filesystem is supposed to provide more reasonable behaviour FOR USERS. Its job is not to punish users for the corner cases that application developers didn't consider.

Wishful thinking

Posted Mar 18, 2009 0:14 UTC (Wed) by dvdeug (subscriber, #10998) [Link]

What POSIX specifies is that a compliant system, upon a system crash, can hunt down all hard copies that have been made and burn them, after overwriting the data on the disk seven times with zeros, ones, and random data. I'm not sure how an application is supposed to allow for that.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds