|
|
Log in / Subscribe / Register

btrfs fscked up, too?

btrfs fscked up, too?

Posted Mar 16, 2009 17:10 UTC (Mon) by forthy (guest, #1525)
In reply to: btrfs fscked up, too? by masoncl
Parent article: Garrett: ext4, application expectations and power management

do you need an fsync on the directory?

If you have synchronous metadata updates, you don't "need" to fsync the directory - it is updated synchronously, anyways (same with sync mounted devices: no fsync needed, either ;-). I can understand the "linux is evil" #ifdef, when you consider how ext2 works: Gather all data and metadata updates for some seconds, and then flush them out in random order. If you don't sync anything, you have a good chance that the atomicy is maintained (unless, of course, the crash happens during the short write period). If you sync data and directory, you have a very good chance that durability is maintained (unless the whole ext2 file system exploded, and now half of the files are in lost+found, and the others are completely missing).

BTW mail server: A mail server needs to fsync, because durability is required. If you receive a mail, you write it to the inbox (or indir in case of an mdir storage system), fsync, and then reply to the smtp client that the message has been accepted. The smtp client now considers the message as passed, and can remove it from its spool - if it doesn't get an ok, it has to retry later.

The question now is: Should you sync the directory? In an mdir case (that's where the directory matters, mboxes keep their name), you create a new entry in inbox/new. fsync only writes out the data, thus it reorders create-write-close on disk into write-close-...-create. Only the inode-related metadata is flushed with fsync. From the man-page it looks like POSIX cares only about durability in fsync, not about atomicy. Therefore, fsync is allowed to reorder operations (fsynced files end up earlier on the disk). To maintain atomicy in the operations on the mbox, first fsync the directory, then the file. If you are a mail reader, and e.g. take files out of new/ and move them to cur/, first fsync cur, then new. Otherwise, your mails may be orphaned (duplicates may be annoying, but harmless).

Would all be a lot easier if the filesystem had a transaction monitor behind it. You would say "new transaction, create new/msgid, write data, close, make durable, close transaction" for the delivery and "new transation, rename new/msgid -> cur/msgid, add line to index file, make durable, close transaction" for the IMAP client. If the transaction succeeds, tell the client ok, if it fails or is incomplete after a crash, it will be rolled back. Note that a transaction monitor only needs to maintain ordering within a transaction, and can reorder transactions as it sees fit (and it can even abort transactions).


to post comments

btrfs fscked up, too?

Posted Mar 16, 2009 18:07 UTC (Mon) by masoncl (subscriber, #47138) [Link]

We're mixing up a bunch of concepts here, but for a mailserver workload, in ext2 you have to fsync both the directory and the file in order to make sure a newly created file is on disk.

In ext3, ext4, reiserfs, xfs, and btrfs (and probably jfs), you only need to fsync the file. The journals include the directory data because the directory mods happened along with the file creation, and it actually isn't possible to get one without the other.

The btrfs log is a little different, but it explicitly goes out and finds the directory changes to make sure they are logged with the file during the fsync.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds