ordered(tm) brand
ordered(tm) brand
Posted Mar 16, 2009 18:06 UTC (Mon) by nybble41 (subscriber, #55106)In reply to: ordered(tm) brand by nye
Parent article: Garrett: ext4, application expectations and power management
The thing is, this *isn't* known at the filesystem level. The application knows that the rename() is useless until the write() has been committed, but there is no API to communicate this information to the filesystem. Perhaps there should be, but the lack of appropriately fine-grained userspace APIs is not the fault of the filesystem authors. All existing filesystems, ext3 included, assume that the rename() and write() operations are independent; the cases where the ordering happens to be correct from the application's point-of-view are purely accidental.
The more general issue is that application writers are depending on filesystems to provide full data journaling, which is a major performance killer and was never actually guaranteed. Metadata journaling, as used by ext3 and ext4 be default, is only a replacement for the fsck process; as with all asynchronous, non-journaled filesystems, the state of the recovered filesystem after fsck or journal playback will be internally consistent, but may not match any state which actually existed in RAM before the crash.
