Wishful thinking
Wishful thinking
Posted Mar 16, 2009 7:04 UTC (Mon) by njs (subscriber, #40338)In reply to: Wishful thinking by alonz
Parent article: Garrett: ext4, application expectations and power management
There are a lot of misunderstandings about how filesystems actually work in these threads... hopefully I won't add to them :-)
But I think it's more like: POSIX doesn't actually define any relation between operations, whether on file contents or on file metadata or both. File-system developers tend to create a linear ordering on file metadata changes because that makes it easier to implement filesystems that can survive a crash without destroying your whole partition, but they prefer not to impose any other ordering guarantees, because when they do, the users whine about how unbearably slow the filesystem is. (Also, they've never made those guarantees before, and somehow computers have worked.)
In particular, note that when it comes to crash recovery, unless you use data=journal, there is no "transaction space" for data writes at all. You may find any arbitrary subset of your writes have completed, and some may have completed partially -- only the middle of your write buffer has made it to disk -- and etc. That's just how it works.
What we're seeing here is some very limited ordering guarantees being added in for particular heuristically defined sequences of operations, where it turns out they don't hurt performance much. But apps that rely on those guarantees will still be broken when running on any other filesystems. And that's going to bite the folks who develop the next round of filesystems, because they don't know what random non-standard guarantees apps will expect them to provide.
