Guarantees and the belt-and-braces of journaling
Posted Mar 18, 2009 0:16 UTC (Wed) by xoddam
In reply to: ordered(tm) brand
Parent article: Garrett: ext4, application expectations and power management
The use-case of writing a new file and renaming it to replace an existing one is documented and recommended by expert practitioners as the best, nay the only, way to achieve atomic operations on a POSIX system.
Specifically, whenever a rename replaces an old file with a recently-written one, the application developer's intention is to achieve atomic replacement of the file's contents. Invariably. No exceptions. Even if you "kill -9 1", this will not cause corruption truncation of the target file.
POSIX *does* guarantee this atomicity, with precisely *one* exception -- if the system crashes, behaviour is undefined.
A journaling filesystem exists for one reason only to provide reasonable behaviour in the event of a system crash, ie. to extend the guarantees POSIX provides and reduce the need to recover data.
In a couple of instances, users have observed that particular up-and-coming journaling filesystems make it more (not less!) likely for them to need to recover files than the status quo. It is only sane that they should report this as a bug. It has *nothing* to do with application developers, who are using the recommended pattern and generally don't have much influence over what happens when their users' computers crash.
It is wonderful news then, that the developers of both filesystems (to my knowledge) that did exhibit such behaviour have listened to the requests of their users and let the journal extend the POSIX guarantee of atomic replacement on rename across system failures.
Hurrah and thankyou.
Discussion of fsync is a complete red herring. On older POSIX-conforming filesystems there is NO GUARANTEE AT ALL that the filesystem will be accessible after a system crash, fsync or no. On some implementations, fsync can indeed make this particular kind of data loss less likely (and application developers in-the-know have used it for this purpose). There is still no POSIXLY_CORRECT guarantee that data will not be lost, so for a filesystem developer to say that his users don't really deserve to benefit from the safety that journaling can afford until application developers have jumped through an extra latency-imposing hoop is a bit rude. Not to say, putting the cart before the horse.
By the way, Ted T'so is 90% correct to say application developers shouldn't fear fsync, and 100% wrong to say that fsync is the correct way to achieve atomic replacement with rename. Rename alone is supposed to achieve this; if a filesystem is technically capable of preserving this guarantee even across system failures then it should do so.
to post comments)