Ts'o: Delayed allocation and the zero-length file problem
Ts'o: Delayed allocation and the zero-length file problem
Posted Mar 13, 2009 20:34 UTC (Fri) by masoncl (subscriber, #47138)In reply to: Ts'o: Delayed allocation and the zero-length file problem by alexl
Parent article: Ts'o: Delayed allocation and the zero-length file problem
We have two basic choices to accomplish this:
1) Put the new file into a list of things that must be written before the commit is done. This is pretty much what the proposed ext4 changes do.
2) Write the data before the rename is complete.
The problem with #1 is that it reintroduces the famous ext3 fsync behavior that caused so many problems with firefox. It does this in a more limited scope, just for files that have been renamed, but it makes for very large side effects when you want to rename big files.
The problem with #2 is that it is basically fsync-on-rename.
The btrfs fsync log would allow me to get away with #1 without too much pain, because fsyncs don't force full btrfs commits and so they won't actually wait for the renamed file data to hit disk.
But, the important discussion isn't if I can sneak in a good implementation for popular but incorrect API usage. The important discussion is, what is the API today and what should it really be?
Applications have known how to get consistent data on disk for a looong time. Mail servers do it, databases do it. Changing rename to include significant performance penalties when it isn't documented or expected to work this way seems like a very bad idea to me.
I'd much rather make a new system call or flag for open that explicitly documents the extra syncing, and give application developers the choice.
