btrfs fscked up, too?
btrfs fscked up, too?
Posted Mar 16, 2009 13:19 UTC (Mon) by masoncl (subscriber, #47138)In reply to: btrfs fscked up, too? by forthy
Parent article: Garrett: ext4, application expectations and power management
Filesystems tend to break operations up into relatively large transactions. These include all the metadata changes to the FS over a 5 or 30 second interval. A big part of controlling the latencies of FS operations is to control the latencies of transaction commits. Regardless of how much of the commit you try to do in the background, there are always corner cases that break down to: wait for commit X to finish. Every FS has these, including ext34 (such as when the ext log wraps).
In the ext3 data=ordered model, the commit waits for all of the data writes during that transaction. If we assume the worst case of applications doing random data writes on slow spinning media, writing out all the data can take a very long time. This is what people noticed in the now famous firefox-fsync bug.
What btrfs does to limit transaction latencies is it only updates file metadata after file data IO is complete. This allows us to make atomic extent replacements in the file without having to flush all the data writes for a transaction before the commit can complete. xfs does something similar, but it only needs to make sure i_size updates are done after the extent is on disk.
This works well for single file overwrites. The rename case is different because the operations are between two different files.
I agree with Ted that fsync is the right answer, not just because it is what the standard says to do, but because skipping the fsync is explicitly what the standard says won't work. Adding these kinds of undocumented tricks to the filesystems today is sure to cause many problems for the next set of filesystem developers, who probably won't remember the famous firefox fsync bug or its evil twin the ubuntu gamer data loss on crash.
With all of that said, btrfs can give the ext3 behavior with little practical performance impact. fsyncs in btrfs almost always use a dedicated logging mechanism and don't have to wait for the full transaction commit.
So, I'll have patches in 2.6.30 that fix things in btrfs. This way we as a linux community can either document the new rename requirements or change the applications, and btrfs can move on to other problems ;)
