LWN.net Logo

Ts'o: Delayed allocation and the zero-length file problem

Ts'o: Delayed allocation and the zero-length file problem

Posted Mar 13, 2009 15:03 UTC (Fri) by rsidd (subscriber, #2582)
In reply to: Ts'o: Delayed allocation and the zero-length file problem by welinder
Parent article: Ts'o: Delayed allocation and the zero-length file problem

It reminds me of Richard Gabriel's famous "Worse is better" screed.

From Gabriel's article:

The New Jersey guy said that the Unix folks were aware of the [PC-losering] problem, but the solution was for the system routine to always finish, but sometimes an error code would be returned that signaled that the system routine had failed to complete its action. A correct user program, then, had to check the error code to determine whether to simply try the system routine again.
Which, to me, sounds very much the same as saying, like Ted Ts'o, that a correct user program has to fsync() its data and not rely on fclose() actually flushing anything to disk. Also, there are lots of people (like scientific programmers) who write their own short file-handling code without being fanaticallyc "correct" C programmers; buffer overflows and other such bugs are probably OK for them, since their systems are trusted, but data loss really is not OK.

Still, as Gabriel says, Unix won against Lisp systems. And Windows (which was even worse up until Windows ME) won against Unix. So there's food for thought there.


(Log in to post comments)

Ts'o: Delayed allocation and the zero-length file problem

Posted Mar 13, 2009 16:19 UTC (Fri) by ajross (subscriber, #4563) [Link]

Agreed. What's the point of putting all these fancy journaling and reliability features into a file system if they don't work by default? I mean, hell, we could lose data after a system crash with ufs in 1983. Why bother with ext4?

Hiding behind POSIX here is just ridiculous. POSIX allows this absurd lack of reliability not because it's a good idea, but because filesystems available when the standard was drafted can't support it.

Worse is better

Posted Mar 13, 2009 23:03 UTC (Fri) by pboddie (subscriber, #50784) [Link]

It's very apt to bring up "worse is better" because that particular rant is all about the applications programmer having to jump through hoops so that the systems programmer can save some effort.

Although people can argue that UNIX "got things about right" in comparison to competing (and presumably discontinued) operating systems which were more clever in their implementation, there's a lot to be said for not pestering application programmers with, for example, the tedious details of fsync and friends at the expense of common sense idioms that just work, like those which assume that closed files can safely have filesystem operations performed on them. Those tedious details involving, of course, figuring out which sync-related function actually does what the developer might anticipate from one platform to the next.

Sometimes worse really is worse.

Ts'o: Delayed allocation and the zero-length file problem

Posted Mar 14, 2009 3:37 UTC (Sat) by bojan (subscriber, #14302) [Link]

> Which, to me, sounds very much the same as saying, like Ted Ts'o, that a correct user program has to fsync() its data and not rely on fclose() actually flushing anything to disk.

It's not Ted saying this. It is how it works. From man 2 close:

> A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a file system to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.)

OK?

Ts'o: Delayed allocation and the zero-length file problem

Posted Mar 16, 2009 6:13 UTC (Mon) by jamesh (guest, #1159) [Link]

People aren't asking for sync on close. Rather they'd like the rename() operation to only occur if the new file data has been written to disk.

Conversely, if the file data hasn't been written to disk then they expect that the rename over the old data won't occur.

There is no expectation that either of these operations will occur immediately, which is why they don't request that happen via fsync().

If the current method applications use when expecting this behaviour, then it'd be nice to define an API that does provide the desired semantics. That said, I can't think of any cases where you wouldn't want the new data blocks written out before renaming over an existing file.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds