Not logged in
Log in now
Create an account
Subscribe to LWN
An "enum" for Python 3
An unexpected perf feature
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
PostgreSQL 9.3 beta: Federated databases and more
Ts'o: Delayed allocation and the zero-length file problem
Posted Mar 13, 2009 18:06 UTC (Fri) by jgg (guest, #55211)
This whole discussion has really not been focused much on what actually are sane behaviors for a FS to have across an unclean shutdown. To date most writings by FS designers I've read seem to focus entirely on avoiding FSCK and avoiding FS meta-data inconsistencies. Very few people seem to talk about things like what the application sees/wants.
One of the commentors on the blog had the best point - insisting on adding fsync before every close/rename sequence (either implicitly in the kernel as has been done, or explicitly in all apps) is going to badly harm performance. 99% of these case do not need the data on the disk, just the write/close/rename order preserved.
Getting great performance by providing weak guarentees is one thing, but then insisting everyone who cares about their data use explicit calls that provide a much stronger and slower guarantee is kinda crazy. Just because POSIX is silent on this matter doesn't mean FS designers should get a free pass on transactional behaviors that are so weak they are useless.
For instance under the same POSIX arguments Ted is making it would be perfectly legitimate for a write/fsync/close/rename to still erase both files because you didn't do a fsync on the directory! Down this path lies madness - at some point the FS has to preserve order!
I wonder how bad a hit performance sensitive apps like rsync will get due to the flushing on rename patches?
Posted Mar 13, 2009 19:16 UTC (Fri) by endecotp (guest, #36428)
Yes, I was just thinking the same thing! Come on Ted, what exactly do you want us to write to be portably safe? I have just added an fsync() to my write() close() rename() code, but I checked man fsync first and it tells me that I need to fsync the directory. So is it:
? Or some re-ordering of that? Is there more? Do I have to fsync() the directories up to the root? Can I avoid all this if I call sync()?
Posted Mar 13, 2009 20:17 UTC (Fri) by alexl (subscriber, #19068)
If the metadata is not written out but the data is and then things crash, you will just have the old file as it was, and either a written file+inode with no name (moved to lost+found) or the written file with the temporary name.
As far as i can see syncing the directory is not needed. (Unless you want to guarantee the file being on disk, rather than just not breaking the atomic file replace behaviour.)
Posted Mar 13, 2009 20:41 UTC (Fri) by masoncl (subscriber, #47138)
Posted Mar 13, 2009 21:03 UTC (Fri) by endecotp (guest, #36428)
Thinking a bit more about this from the "application requirements" point of view, I can see three cases:
1- The change needs to be atomic wrt other processes running concurrently.
2- The change needs to be atomic if this process terminates (ctrl-C, OOM).
3- The change needs to be atomic if the system crashes.
I can't think of a scenario where the application author would reasonably say, "I need this data to be safe in cases 1 and 2 but I don't care about 3." Can anyone else?
Posted Mar 13, 2009 22:45 UTC (Fri) by jgg (guest, #55211)
It isn't that uncommon really, anytime you want to protect against a failing program and not mess up its output file rename is the best way. For instance programs that are used with make should be careful to not leave garbage around if they crash. rsync and other downloading software does the rename trick too, for the same basic reasons. None of these uses require fsync or other performance sucking things.
The reason things like emacs and vim are so careful is because they are almost always handling critical data. I don't think anyone would advocate rsync should use fsync.
The considerable variations in what FSs do is also why, as an admin, I have a habit of knocking off a quick 'sync' command after finishing some adminy task just to be certain :)
Posted Mar 16, 2009 10:37 UTC (Mon) by endecotp (guest, #36428)
Ted seems to have answered this in his second blog post: YES you DO need to fsync the directory if you want to be certain that the metadata has been saved.
Posted Mar 13, 2009 20:28 UTC (Fri) by amk (subscriber, #19)
Why this behavour is broken? It's perfectly normal behaviour...
Posted Mar 14, 2009 11:10 UTC (Sat) by khim (subscriber, #9252)
This P2P client. Good p2p client will keep information about peers for
each file - this way if the the system is rebooted the lenghty process of
finding peers can be avoided. Since there are hundreds (sometimes
thousands) peers this means hundreds of files are rewritten every minute or
so. If filesystem can not provide guarantees without fsync - I just refuse
to use it. XFS went this way. XFS developers long argues their right to
destroy files on crush and we've all agreed that they can do this
and I can answer the question "What do you think about XFS?" with just
"Don't use it. Ever." And everyone was happy.
Looks like tytso actually fixed the problem in ext4 (even if actual
words were akin "application developers are crazy and this is incorrect
usage but we can not replace all of them") so at least I can conclude he's
more sane then XFS developers...
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds