btrfs fscked up, too?
Posted Mar 16, 2009 17:10 UTC (Mon) by forthy
In reply to: btrfs fscked up, too?
Parent article: Garrett: ext4, application expectations and power management
do you need an fsync on the directory?
If you have synchronous metadata updates, you don't "need" to fsync
the directory - it is updated synchronously, anyways (same with sync
mounted devices: no fsync needed, either ;-). I can understand the "linux
is evil" #ifdef, when you consider how ext2 works: Gather all data and
metadata updates for some seconds, and then flush them out in random
order. If you don't sync anything, you have a good chance that the
atomicy is maintained (unless, of course, the crash happens during the
short write period). If you sync data and directory, you have a very good
chance that durability is maintained (unless the whole ext2 file system
exploded, and now half of the files are in lost+found, and the others are
BTW mail server: A mail server needs to fsync, because durability is
required. If you receive a mail, you write it to the inbox (or indir in
case of an mdir storage system), fsync, and then reply to the smtp client
that the message has been accepted. The smtp client now considers the
message as passed, and can remove it from its spool - if it doesn't get
an ok, it has to retry later.
The question now is: Should you sync the directory? In an mdir case
(that's where the directory matters, mboxes keep their name), you create
a new entry in inbox/new. fsync only writes out the data, thus it
reorders create-write-close on disk into write-close-...-create. Only the
inode-related metadata is flushed with fsync. From the man-page it looks
like POSIX cares only about durability in fsync, not about atomicy.
Therefore, fsync is allowed to reorder operations (fsynced files end up
earlier on the disk). To maintain atomicy in the operations on the mbox,
first fsync the directory, then the file. If you are a mail reader, and
e.g. take files out of new/ and move them to cur/, first fsync cur, then
new. Otherwise, your mails may be orphaned (duplicates may be annoying,
Would all be a lot easier if the filesystem had a transaction monitor
behind it. You would say "new transaction, create new/msgid, write data,
close, make durable, close transaction" for the delivery and "new
transation, rename new/msgid -> cur/msgid, add line to index file, make
durable, close transaction" for the IMAP client. If the transaction
succeeds, tell the client ok, if it fails or is incomplete after a crash,
it will be rolled back. Note that a transaction monitor only needs to
maintain ordering within a transaction, and can reorder transactions as
it sees fit (and it can even abort transactions).
to post comments)