LWN.net Logo

Where the the correctness go?

Where the the correctness go?

Posted Mar 14, 2009 21:18 UTC (Sat) by bojan (subscriber, #14302)
In reply to: Where the the correctness go? by endecotp
Parent article: Ts'o: Delayed allocation and the zero-length file problem

> No; POSIX requires that the effects of one process' actions, as observed by another process, occur in order. So if you do the write() before the rename(), it is guaranteed that the file will be there with the expected data in it.

We are talking about data _on_ _disk_ here, not what the process may see (which may be buffers just written, as presented by the kernel). What is on disk is _durable_, which is what we are discussing here. For durable, you need fsync.

So, rename does not specify which data on disk will be when.


(Log in to post comments)

Where the the correctness go?

Posted Mar 15, 2009 12:44 UTC (Sun) by endecotp (guest, #36428) [Link]

But __NOTHING__ specifies what data you'll find left on the disk after a crash (and after a crash is the only time when the difference between "on disk" and "in memory buffers" makes any difference). fsync() does NOT guarantee durability - it can be a no-op.

So what this all boils down to is how close each filesystem implementation comes to "non-crash" behaviour after a crash, which is a quality-of-implementation choice for the filesystems.

As far as I can see, for portable code the best bet is to stick with the write-close-rename pattern. This is sufficient for atomic changes in the non-crash case. Adding fsync in there makes it safe in the crash case for some filesystems, but not all, and there are others where it was safe without it, and others where it has a performance penalty: it's far from a clear winner at the moment.

Where the the correctness go?

Posted Mar 15, 2009 21:24 UTC (Sun) by bojan (subscriber, #14302) [Link]

> fsync() does NOT guarantee durability - it can be a no-op.

Hence, you need to have various #ifs and ifs() to figure out what works on your platform. See Mac OS X. fsync is just an example here. The point is that you must use _something_ to commit. Without that, POSIX does not guarantee anything beyond currently running processes seeing the same picture.

Where the the correctness go?

Posted Mar 16, 2009 4:49 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

ven doing s fsync doesn't mean that you won't have this corruption. the two writes could go to the disk drive's buffer and it could write the metadata out before it writes the data blocks. if it looses power in between these two steps you have the same problem

Where the the correctness go?

Posted Mar 16, 2009 13:28 UTC (Mon) by jamesh (guest, #1159) [Link]

Of course, if the drive supports barriers in its command queueing implementation it should be possible to prevent it reordering those writes.

That is likely to restrict reorderings that won't break correctness guarantees though.

Where the the correctness go?

Posted Mar 16, 2009 3:19 UTC (Mon) by k8to (subscriber, #15413) [Link]

A no-op fsync is not compliant. You've taken it quite a bit too far.

fsync explicitly says that when it returns success, the data has been handed to the storage system successfully.

It doesn't guarantee that that storage system has committed it in a durable way for all scenarios. That's another issue.

fsync does guarantee that the data has been handed to the storage medium, but makes no guarantees about the implementation of that storage medium.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds