Ted does claim in the comments that rename() without fsync() is unsafe, but that applications should be granted some sort of leeway in that case. I think he'll have to move away from this position if he want to have his filesystem used in Linus's kernel, because Linus thinks that rename() without fsync() is safe and is the correct way to do things.
The case where you want fsync() is when you do something like: file A names a file, B, which exists and has valid data. You create file C, put valid data in it, and atomically change file A to name file C instead of B, and you want to be sure that file A always names a file which exists and has valid data. You can't be sure, without an fsync(), that the disk won't see the change to file A without seeing some operation on file C.
Ts'o: Delayed allocation and the zero-length file problem
Posted Mar 13, 2009 20:41 UTC (Fri) by pr1268 (subscriber, #24648)
[Link]
Linus thinks that rename() without fsync() is safe
"is safe" or _SHOULD_BE_ safe? Far be it from Linus being naïve on these things, but then again I'm certain he's been following this discussion closely for the past few days.
I do know that Linus' big soap box is about programming abstraction, and he'd certainly take the side that open/write/close/rename (in that order) should do exactly that, without any mysterious data loss.
My own "from-the-cuff" perception is that fsync(2)/fdatasync(2) are "band-aids" to address POSIX's lack of specification in this matter. One suggestion is that close(2) should implicitly include an fsync() call, and that programmers should be taught that open() and close() are expensive and best used judiciously.
Ts'o: Delayed allocation and the zero-length file problem
Posted Mar 13, 2009 21:51 UTC (Fri) by iabervon (subscriber, #722)
[Link]
Linus thinks rename() without fsync() "is safe", in the sense that if he were writing an application that was intended to safely maintain important data (like, for example, the Linux source code), he would use rename() on files without using fsync() on them first. I'm not entirely sure that he reads any forum where this has been discussed so far.
I think fsync() makes sense to have. If the system stops running, there are some things that would have happened, except that the system stopped running first. Furthermore, when the whole system stops running, it becomes difficult to know what things had happened and which had not happened. Furthermore, it's too inefficient to serialize everything, particularly for a multi-process system. Falling back to the concurrency model, you can say that the filesystem after an emergency restart should be in some state that could have been seen by a process that was running before the restart. But there needs to be a further restriction, so that you know that the system won't go back to the blank filesystem that you had before installing anything; so fsync() makes sense as a requirement that the filesystem after a restart will be some state that could have been seen after the last fsync() that returned successfully.
(Of course, any time the system crashes, you might lose some arbitrary data, since the system has crashed; but a better system will lose less or be less likely to lose things. This is qualitatively different from the perfectly reasonable habit of ext4 of deciding that the post-restart state is right after every truncate.)
Ts'o: Delayed allocation and the zero-length file problem
Posted Mar 14, 2009 0:47 UTC (Sat) by njs (guest, #40338)
[Link]
But now you've just redefined fsync(2) to mean sync(2), and that has unacceptable overhead for many real uses. (Durably spooling a 1k email message should not force that multi-gigabyte rsync to flush to disk!)
Ts'o: Delayed allocation and the zero-length file problem
Posted Mar 14, 2009 12:40 UTC (Sat) by nix (subscriber, #2304)
[Link]
Um, GNU coreutils 7.1's mv doesn't do it.
If even *coreutils*, written by some of the most insanely
portability-minded Unix hackers on the planet, doesn't do this
fsync()-source-and-target-directories thing, it's safe to say that, to a
first approximation, nobody ever does it.
The standard here is outvoted by reality.
Ts'o: Delayed allocation and the zero-length file problem
Posted Mar 14, 2009 14:20 UTC (Sat) by endecotp (guest, #36428)
[Link]
That's interesting, but there are two ways to look at it:
- Since mv doesn't fsync and mv is expected to leave things in a sane state after a crash, the kernel must be expected to "do the right thing" wrt rename().
OR
- Since mv doesn't sync, mv is not guaranteed to leave things in a sane state after a crash; if you thought that it was guaranteed to do so you were wrong.
Both
Posted Mar 14, 2009 15:25 UTC (Sat) by man_ls (subscriber, #15091)
[Link]
What I read from this very interesting discussion is that both assumptions are right depending on the circumstances. An inherently unsafe fs like ext2 is not expected to guarantee anything, and mv on ext2 may be left in a unstable state after a crash (including zero-length files). Coreutils developers probably did not see fit to fsync since it would not increase the robustness significantly in these cases: the system might crash in the middle of the fsync anyway.
But on a journalled fs like ext3 users will expect their system to be robust in the event of a crash -- and as the XFS debacle shows, not only for metadata. Both are POSIX-compliant, only ext3 is held to higher standards than ext2. What this means for ext4 is obvious.