btrfs fscked up, too?
btrfs fscked up, too?
Posted Mar 16, 2009 17:10 UTC (Mon) by forthy (guest, #1525)In reply to: btrfs fscked up, too? by masoncl
Parent article: Garrett: ext4, application expectations and power management
do you need an fsync on the directory?
If you have synchronous metadata updates, you don't "need" to fsync the directory - it is updated synchronously, anyways (same with sync mounted devices: no fsync needed, either ;-). I can understand the "linux is evil" #ifdef, when you consider how ext2 works: Gather all data and metadata updates for some seconds, and then flush them out in random order. If you don't sync anything, you have a good chance that the atomicy is maintained (unless, of course, the crash happens during the short write period). If you sync data and directory, you have a very good chance that durability is maintained (unless the whole ext2 file system exploded, and now half of the files are in lost+found, and the others are completely missing).
BTW mail server: A mail server needs to fsync, because durability is required. If you receive a mail, you write it to the inbox (or indir in case of an mdir storage system), fsync, and then reply to the smtp client that the message has been accepted. The smtp client now considers the message as passed, and can remove it from its spool - if it doesn't get an ok, it has to retry later.
The question now is: Should you sync the directory? In an mdir case (that's where the directory matters, mboxes keep their name), you create a new entry in inbox/new. fsync only writes out the data, thus it reorders create-write-close on disk into write-close-...-create. Only the inode-related metadata is flushed with fsync. From the man-page it looks like POSIX cares only about durability in fsync, not about atomicy. Therefore, fsync is allowed to reorder operations (fsynced files end up earlier on the disk). To maintain atomicy in the operations on the mbox, first fsync the directory, then the file. If you are a mail reader, and e.g. take files out of new/ and move them to cur/, first fsync cur, then new. Otherwise, your mails may be orphaned (duplicates may be annoying, but harmless).
Would all be a lot easier if the filesystem had a transaction monitor behind it. You would say "new transaction, create new/msgid, write data, close, make durable, close transaction" for the delivery and "new transation, rename new/msgid -> cur/msgid, add line to index file, make durable, close transaction" for the IMAP client. If the transaction succeeds, tell the client ok, if it fails or is incomplete after a crash, it will be rolled back. Note that a transaction monitor only needs to maintain ordering within a transaction, and can reorder transactions as it sees fit (and it can even abort transactions).
