LWN.net Logo

Ext4 filesystem hits Android, no need to fear data loss (ars technica)

Ext4 filesystem hits Android, no need to fear data loss (ars technica)

Posted Dec 29, 2010 8:59 UTC (Wed) by neilbrown (subscriber, #359)
In reply to: Ext4 filesystem hits Android, no need to fear data loss (ars technica) by iabervon
Parent article: Ext4 filesystem hits Android, no need to fear data loss (ars technica)

fsync is only needed if your computer crashes. Not using it works perfectly when your computer doesn't crash.

By your argument, fsync will not be used consistently, and that appears to be true to some extent.

The implied conclusion seems to be that all filesystems should be mounted "-o sync" so that fsync is not needed. Strangely, I have not heard that conclusion being proposed explicitly....

I certainly agree that interfaces should be hard to misuse (the Rusty principle) but that must be balanced against the dictum against making things fool proof as then only a fool would use them. In this case, while it is quite possible to misuse fsync (e.g. by not using it), it really is an appropriate interface.

And while regression tests suites are incredibly valuable and there should be more of them etc etc, one hopes that developers don't depend of them as the sole means of ensuring correctness, but also read documentation, try to understand the systems they work with, and write code accordingly. (one also hopes for a pony.... but no, only a piece of coal in my stocking again)


(Log in to post comments)

Syncing has significant performance penalties.

Posted Dec 29, 2010 14:26 UTC (Wed) by gmatht (guest, #58961) [Link]

The applications that actually really need durability of data written in the last few seconds don't seem to be that common... one might hope that the people who wrote the database your bank uses, would know about fsync. Even on ext4 a trivial fsync takes 50ms[1], which is forever in computer times, much more even that the ~0.1ms for creating new files, but losing everything because the filesystem decided to write out the metadata before the data (violating atomicity) isn't that great either.

Apparently xsyncfs provides a system that is synchronous from the users point-of-view with an overhead of only 3%-7%. That might be an acceptable default. Xsyncfs doesn't seem very well documented, but is just one avenue of providing good reliability properties without the massive performance penalties from fsync/-o sync.

[1] www.ucc.asn.au/~mccabedj/fsync_benchmark.c

Ext4 filesystem hits Android, no need to fear data loss (ars technica)

Posted Dec 29, 2010 17:42 UTC (Wed) by iabervon (subscriber, #722) [Link]

Using fsync only really makes sense if you're trying to get stuff written to disk before sending a message out of the system; otherwise, it won't be possible to tell whether the fsync didn't actually do anything or the system crashed before it returned. So you need to use fsync after writing a received email message to disk and before telling the remote server that you've got it.

I believe that the model that most people have of filesystems is that what's recovered after a system crash is like a snapshot of the filesystem that you would see in a running system if you were taking the snapshot with ordinary system calls and could therefore see all the race conditions you can see between programs; however, there is arbitrary random damage because the system crashed, and the latest snapshot may not be particularly recent.

With this model, fsync is easy to (know to) use in cases where you want to make sure that the snapshot is sufficiently recent, but not for cases where it is necessary to avoid the recovered state being something that couldn't have been a snapshot.

Ext4 filesystem hits Android, no need to fear data loss (ars technica)

Posted Dec 30, 2010 0:14 UTC (Thu) by neilbrown (subscriber, #359) [Link]

> I believe that the model that most people have of filesystems is that ...

That is an overly naive model of a filesystem. It assumes almost completely linearisation of operations on their way to storage. Any re-ordering in the page cache before writeback or in the device queue via an IO scheduler will invalidate that model, and as you can imagine such re-ordering happens a lot.

The correct model is "nothing is safe until you call sync or fsync or some other variant", with the understanding that 'sync' is effectively called every 30 seconds or so.

I'm glad it is obvious that you need to call fsync (on both the file and the directory you created the file in) before acknowledging the receipt of a file (e.g. an email) over a network connection.

However exactly the same is true when moving a file by copying it. If you copy a file (possibly transforming it on the way) and the remove the original you really must fsync the new copy before unlinking the old. You should also fsync the directory, though if you rename the new (after fsyncing it) to replace the old, then the fsync of the directory is not required.

Note that "mv" doesn't do the fsync when moving a file between filesystems (which requires a copy/unlink). So if you use mv and then crash you could quite possibly lose both copies. And mv doesn't even have an option to request the fsync.

Now you might suggest that this should "just work" without mv needing to call fsync. But I think you would find it quite difficult to design the filesystem semantics that would allow this to always be safe, especially as you need interaction between two separate filesystems (unlink in one must not commit until writes in the other have committed). ... other than mounting everything with '-o sync' of course.

Ext4 filesystem hits Android, no need to fear data loss (ars technica)

Posted Dec 30, 2010 1:33 UTC (Thu) by iabervon (subscriber, #722) [Link]

My model can't really fail to be accurate, since it includes the possibility of arbitrary deviations from the predicted outcome. And, actually, nothing is safe at all; your storage medium might fail, your video driver might scribble over your disk or your dirty pages, your hard drive might read garbage out of memory losing power and write it with the power left in its capacitors. I actually suspect that, based on the model I stated, a more common and more extensive source of differences from some potential snapshot is things that syncing couldn't have helped with than things that syncing could have helped with (with the exception of ext4 having a particularly common and obvious divergence).

There's also been not that long in the UNIX tradition when you could be reasonably confidant that a power failure shortly after you changed something in a directory wouldn't trash other things in the directory, making it kind of irrelevant whether you'd called fsync on the directory to make sure that the disk was correct before it got corrupted.

In general, there's a tradeoff among filesystem complexity, slowness, and
deviation from non-crash state. None of these go to zero without making the others terrible, even if you call sync all the time.

(In fact, my model does require an fsync when moving a file by copying it, at least across directories; the snapshotting process could read the destination directory before you write the file and the source directory after you unlink it.)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds