Yeah, I think you have hit this exactly on the head. Reading through the Tyso comments on the blog I think he even confirmed that not preserving ordering is a change in behavior since ext3.
This whole discussion has really not been focused much on what actually are sane behaviors for a FS to have across an unclean shutdown. To date most writings by FS designers I've read seem to focus entirely on avoiding FSCK and avoiding FS meta-data inconsistencies. Very few people seem to talk about things like what the application sees/wants.
One of the commentors on the blog had the best point - insisting on adding fsync before every close/rename sequence (either implicitly in the kernel as has been done, or explicitly in all apps) is going to badly harm performance. 99% of these case do not need the data on the disk, just the write/close/rename order preserved.
Getting great performance by providing weak guarentees is one thing, but then insisting everyone who cares about their data use explicit calls that provide a much stronger and slower guarantee is kinda crazy. Just because POSIX is silent on this matter doesn't mean FS designers should get a free pass on transactional behaviors that are so weak they are useless.
For instance under the same POSIX arguments Ted is making it would be perfectly legitimate for a write/fsync/close/rename to still erase both files because you didn't do a fsync on the directory! Down this path lies madness - at some point the FS has to preserve order!
I wonder how bad a hit performance sensitive apps like rsync will get due to the flushing on rename patches?
Ts'o: Delayed allocation and the zero-length file problem
Posted Mar 13, 2009 19:16 UTC (Fri) by endecotp (guest, #36428)
[Link]
> you didn't do a fsync on the directory!
Yes, I was just thinking the same thing! Come on Ted, what exactly do you want us to write to be portably safe? I have just added an fsync() to my write() close() rename() code, but I checked man fsync first and it tells me that I need to fsync the directory. So is it:
? Or some re-ordering of that? Is there more? Do I have to fsync() the directories up to the root? Can I avoid all this if I call sync()?
Ts'o: Delayed allocation and the zero-length file problem
Posted Mar 13, 2009 20:17 UTC (Fri) by alexl (subscriber, #19068)
[Link]
I don't think that is quite necessary for durability.
If the metadata is not written out but the data is and then things crash, you will just have the old file as it was, and either a written file+inode with no name (moved to lost+found) or the written file with the temporary name.
As far as i can see syncing the directory is not needed. (Unless you want to guarantee the file being on disk, rather than just not breaking the atomic file replace behaviour.)
Ts'o: Delayed allocation and the zero-length file problem
Posted Mar 13, 2009 20:41 UTC (Fri) by masoncl (subscriber, #47138)
[Link]
The directory fsync requirements came from ext2. The for the journaled filesystems, and fsync on the file will get you the dir as well.
Ts'o: Delayed allocation and the zero-length file problem
Posted Mar 13, 2009 21:03 UTC (Fri) by endecotp (guest, #36428)
[Link]
OK. So if I want code that's portable to ext2, I need to fsync the directory. Maybe there aren't many people using ext2 these days, but I would like code that's genuinely portable; I do personally care about the various flash filesystems, and when I break things for BSD users they complain. So I guess the directory fsync is needed.
Thinking a bit more about this from the "application requirements" point of view, I can see three cases:
1- The change needs to be atomic wrt other processes running concurrently.
2- The change needs to be atomic if this process terminates (ctrl-C, OOM).
3- The change needs to be atomic if the system crashes.
I can't think of a scenario where the application author would reasonably say, "I need this data to be safe in cases 1 and 2 but I don't care about 3." Can anyone else?
Ts'o: Delayed allocation and the zero-length file problem
Posted Mar 13, 2009 22:45 UTC (Fri) by jgg (guest, #55211)
[Link]
> I can't think of a scenario where the application author would reasonably say, "I need this data to be safe in cases 1 and 2 but I don't care about 3." Can anyone else?
It isn't that uncommon really, anytime you want to protect against a failing program and not mess up its output file rename is the best way. For instance programs that are used with make should be careful to not leave garbage around if they crash. rsync and other downloading software does the rename trick too, for the same basic reasons. None of these uses require fsync or other performance sucking things.
The reason things like emacs and vim are so careful is because they are almost always handling critical data. I don't think anyone would advocate rsync should use fsync.
The considerable variations in what FSs do is also why, as an admin, I have a habit of knocking off a quick 'sync' command after finishing some adminy task just to be certain :)
Ts'o: Delayed allocation and the zero-length file problem
Posted Mar 16, 2009 10:37 UTC (Mon) by endecotp (guest, #36428)
[Link]
> Come on Ted, what exactly do you want us to write to be portably safe?
Ted seems to have answered this in his second blog post: YES you DO need to fsync the directory if you want to be certain that the metadata has been saved.