Atomicity vs durability
Atomicity vs durability
Posted Mar 15, 2009 9:44 UTC (Sun) by alexl (guest, #19068)In reply to: Atomicity vs durability by bojan
Parent article: Ts'o: Delayed allocation and the zero-length file problem
POSIX guarantees an atomic replacement of the file, and this means *both* the data an the filenames[1]. However, POSIX doesn't specify anything about system crashes. So, this guarantee is only valid for the non-crashing case.
For the crashing case POSIX doesn't guarantee anything. In fact, many POSIX filesystems such as ext2 can (correctly, by the spec) result in a total loss of all filesystem data in the case of a system crash. And in fact, this is allowed even if the application fsync()ed some data before the crash.
Now, in order to have some way of getting better guarantees than this POSIX also supplies fsync that guarantees that the files have been written to disk. However, nowhere in the specs of fsync does it say that it guarantees that this will survive a system crash. Of course if it *does* survive it is nice to have the fsync guarantee because that means if the metadata change survived we're more likely to get the whole new file.
But, your discussions about how the "atomic" part is only refering to the filenames is bullshit. POSIX does give full guarantees for both filename and content in the case it specifies. Everything else is up to the implementation. This is why its a good idea for a robust filesystem to give the write-data-before-metadata-on-rename guarantee, since it turns an non-crash POSIX guarantee into a post-crash guarantee. (Of course, this is by no means necessary, even ext2 with full data loss on crash is POSIX compliant, its just a *good* implementation.)
[1] From the POSIX spec:
If the link named by the new argument exists, it shall be removed and old renamed to new. In this case, a link named new shall remain visible to other processes throughout the renaming operation and refer either to the file referred to by new or old before the operation began.
(Notice how this has no separation about "filenames" and "data")
