User: Password:
Subscribe / Log in / New account

JLS2009: A Btrfs update

JLS2009: A Btrfs update

Posted Nov 8, 2009 21:53 UTC (Sun) by anton (subscriber, #25547)
In reply to: JLS2009: A Btrfs update by nix
Parent article: JLS2009: A Btrfs update

Sure, if the only thing you care about in a file system is that fsync()s complete quickly and still hit the disk, use a file system that gives you that.

OTOH, I care more about data consistency. If we want to combine these two concerns, we get to some interesting design choices:

Committing the fsync()ed file before earlier writes to other files would break the ordering guarantee that makes a file system good (of course, we would only see this in the case of a crash between the time of the fsync() and the next regular commit). If the file system wants to preserve the write order, then fsync() pretty much becomes sync(), i.e., the performance behaviour that you do not want.

One can argue that an application that uses fsync() knows what it is doing, so it will do the fsync()s in an order that guarantees data consistency for its data anyway.

Counterarguments: 1) The crash case probably has not been tested extensively for this application, so it may have gotten the order of fsync()s wrong and doing the fsync()s right away may compromise the data consistency after all. 2) This application may interact with others in a way that makes the ordering of its writes relative to the others important; committing these writes in a different order opens a data inconsistency window.

Depending on the write volume of the applications on the machine, on the trust in the correctness of the fsync()s in all the applications, and on the way the applications interact with the users, the following are reasonable choices: 1) fsync() as sync (slowest); 2) fsync() as out-of-order commit; 3) fsync() as noop.

BTW, I find your motivating example still unconvincing: If you edit your magnum opus or your tax records, wouldn't you use an editor that autosaves regularly? Ok, your editor does not fsync() the autosaves, so with a bad file system you will lose the work, but on a good file system you won't, so you will also use a good file system for that, won't you? So it does not really matter for how long you slaved away on the file, a crash will only lose very little data. Or if you work in a way that can lose everything, why was the tax records after 2h59' not important enough to merit more precautions, but after 3h a fast fsync() is more important than anything else?

An example where a synchronous commit is really needed is a remote "cvs commit" (and maybe similar operations in other version control systems): Once a file is commited on the remote machine, the file's version number is updated on the local machine, so the remote commit should better stay commited, even if the remote machine crashes in the meantime. Of course, the problem here is that a cvs commit can easily commit hundreds of files; if it fsync()s every one of them separately, the cumulated waiting for the disk may be quite noticable. Doing the equivalent for all the files at once could be faster, but we have no good way to tell that to the file system (AFAIK CVS works a file at a time, so it wouldn't matter for CVS, but there may be other applications where it does). Hmm, if there are few writes by other applications at the same time, and all the fsync()s were done in the end, then fsync()-as-sync could be faster than out-of-order fsync()s: The first fsync() would commit all the files, and the other fsync()s would just return immediately.

(Log in to post comments)

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds