|
|
Log in / Subscribe / Register

Levels of reality

Levels of reality

Posted Mar 16, 2009 13:59 UTC (Mon) by itvirta (guest, #49997)
Parent article: Garrett: ext4, application expectations and power management

It seems to me that there are two different levels of reality regarding filesystem updates. And in all of this discussion, the two levels seem to be quite confused.

A) There's the OS level which is in effect while the system is running. This is the domain for which POSIX gives all those nice guarantees about the ordering of operations and rename being atomic etc. The OS might buffer things, but it's also quite capable of reading back things from the buffer.

B) There's the storage level, the one which you see if you look at the data written to the disk. This level only becomes apparent if the OS crashes because then (and only then) the buffers are lost. Also, apparently POSIX doesn't give any guarantees in this domain.

Because of this difference, it's completely moot to say things like "POSIX allows the fs show a renamed but empty file after a crash". Of course it does, it also allows the complete fs to get trashed after a crash. I think everyone agrees that the latter one isn't a good idea. But the first one isn't either.

So POSIX semantics do not matter in case of a crash, Something else is needed.

Actually, I think all this makes fsync() quite odd all in all. If POSIX doesn't guarantee anything after a crash, then who cares about fsync(). Ok, fsync() might commit the data to the disk, but it's still allowed for the whole fs to be destroyed after the crash. So, if an fs developer says something like "call fsync, because POSIX allows things to go wrong otherwise", he is already giving guarantees above those given by POSIX. And that, I think, is slightly contradictory.

What the application developers would seem to like, is for the ordering of operations (writes and renames) to be consistent within a file(*) even in case of a crash. They don't care when something happens, they care about it happening in order. It's not required by POSIX, but so isn't any kind of saving after a crash, like journaling. Journaling is already only for the convenience of users (instead of being for compliance to standards).
(*) I mean "file" in the application point of view. With the directory entry included.

Doing the data-metadata commits in order doesn't rule out delayed allocation or any such. The fs can still delay as it likes. It also doesn't mean that everything should happen in order, just that things relevant to the same file need to. (And if the file gets deleted before anything is committed, well, good, no need to write.) But committing something that was called later (in the famous example, this would be the rename) before something that was called earlier (this would be the write) seems a bit silly, and counter-intuitive.

Ok, now yell at me for being horribly wrong.


to post comments

Levels of reality

Posted Mar 17, 2009 0:21 UTC (Tue) by jlokier (guest, #52227) [Link]

The *precise* meaning of fsync has some wiggle-room, as you noticed.

It's physically impossible to be absolutely sure of retrieving your data after any kind of crash. No storage device is immune to weird failures.
So it would be pointless to define fsync to mean that - it could never be implemented.

fsync is just a way of asking "please do what you reasonably can - flush delayed writes, etc., so I can expect the file to be in its current state after a crash following the fsync provided no really weird mega-corruption failure happened".

It's obviously very useful.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds