You still don't get my point, though. I'd agree that when all the writes
that are going on is the system chewing to itself, all you need is
consistency across crashes.
But when the system has just written out my magnum opus, by damn I want
that to hit persistent storage right now! The fsync() should bypass all
other disk I/O as much as possible and hit the disk absolutely as fast as
it can: slowing to disk speeds is fine, we're talking human reaction time
here which is much slower: I don't care if writing out my tax records
takes five seconds 'cos I just spent three hours slaving over them, five
seconds is nothing. But waiting behind a vast number of unimportant writes
(which were all asynchronous until our fsync() forced them out because of
filesystem infelicities) is not fine: if we have to wait for minutes for
our stuff to get out, we may as well have done an asynchronous write.
With btrfs, this golden vision of fast fsync() even under high disk write
load is possible. With ext*, it mostly isn't (you have to force earlier
stuff to the disk even if I don't give a damn about it and nobody ever
fsync()ed it), and in ext3 without data=writeback, fsync() is so slow when
contending with write loads that app developers were tempted to drop this
whole requirement and leave my magnum opus hanging about in transient
storage for many seconds. With ext4 at least fsync() doesn't stall my apps
merely because bloody firefox decided to drop another 500Mb hairball.
Again: I'm not interested in fsync() to prevent filesystem corruption
(that mostly doesn't happen, thanks to the journal, even if the power
suddenly goes out). I'm interested in saving *the contents of particular
files* that I just saved. If you're writing a book, and you save a
chapter, you care much more about preserving that chapter in case of power
fail than you care about some random FS corruption making off
with /usr/bin; fixing the latter is one reinstall away, but there's
nothing you can reinstall to get your data back.