User: Password:
|
|
Subscribe / Log in / New account

ext4 and data loss

ext4 and data loss

Posted Mar 13, 2009 8:29 UTC (Fri) by flewellyn (subscriber, #5047)
In reply to: ext4 and data loss by nix
Parent article: ext4 and data loss

If you just rename(), then the file will continue to exist at either the old or the new location, even if there's a crash. That's guaranteed by rename() semantics. You can't cross filesystems with it, either, so there's no I/O of the actual data.


(Log in to post comments)

ext4 and data loss

Posted Mar 13, 2009 14:50 UTC (Fri) by foom (subscriber, #14868) [Link]

> If you just rename(), then the file will continue to exist at either the old or the new location, even
> if there's a crash. That's guaranteed by rename() semantics.

Is it? If you rename from /A/file to /B/file (both on the same filesystem), what happens if the OS
decides to write out the new directory metadata for /A immediately, but delay writing /B until an
hour from now? (for performance, don't-cha-know) And then the machine crashes. So now you're
left with no file at all.

While I admit not having looked, I'll bet three cookies that's perfectly allowed by POSIX.

ext4 and data loss

Posted Mar 13, 2009 15:06 UTC (Fri) by quotemstr (subscriber, #45331) [Link]

While I admit not having looked, I'll bet three cookies that's perfectly allowed by POSIX.
You know what else is also allowed by POSIX?
  • Rejecting filenames longer than 14 characters, or that begin with a hyphen
  • Refusing to create more than 8 hard links to a file
  • Not having job control
  • Copying a process's entire address space on fork
  • Making all IO synchronous
Come on. Adhering to POSIX is no excuse for a poor implementation! Even Windows adheres to POSIX, and you'd have to be loony to claim it's a good Unix. Look: the bare minimum durability requirements that POSIX specifies are just not sufficient for a good and reliable system. rename must introduce a write barrier with respect to the data blocks for the file involved or we will lose. Not only will you not get every programmer and his dog to insert a gratuitous fsync in the write sequence, but doing so would actually be harmful to system performance.

ext4 and data loss

Posted Mar 13, 2009 18:05 UTC (Fri) by nix (subscriber, #2304) [Link]

rename must introduce a write barrier with respect to the data blocks for the file involved or we will lose.
But this is exactly the behaviour that ext4 isn't currently implementing (although it will be, by default).

Perhaps we're in vociferous agreement, I don't know.

ext4 and data loss

Posted Mar 13, 2009 22:54 UTC (Fri) by bojan (subscriber, #14302) [Link]

> Not only will you not get every programmer and his dog to insert a gratuitous fsync in the write sequence, but doing so would actually be harmful to system performance.

fsync is not gratuitous. It is the D in ACID. As you mentioned yourself, rename requires only A form ACID - and that is exactly what you get.

But, Ted being a pragmatic man, reverted this to the old behaviour, simply because he knows there is a lot of broken software out there.

The fact that good applications that never lose data are already using the correct behaviour is case in point that this is how all applications should do it.

Performance implications of this approach are different than that of the old approach from ext3. In some cases ext4 will be faster. In others, it won't. But the main performance problem is bad applications that gratuitously write hundreds of small files to the file system. This is what is causing the real performance problem and should be fixed.

XFS received a lot of criticism, for what seem to be application problems. I wonder how many people lost files they were editing in emacs on that file system. I would venture a guess, not many.

ext4 and data loss

Posted Mar 13, 2009 23:10 UTC (Fri) by quotemstr (subscriber, #45331) [Link]

It is the D in ACID. As you mentioned yourself, rename requires only A form ACID - and that is exactly what you get.
That's my whole point: sometimes you want atomicity without durability. rename without fsync is how you express that. Except on certain recent filesystems, it's always worked that way. ext4 not putting a write barrier before rename is a regression.
But the main performance problem is bad applications that gratuitously write hundreds of small files to the file system.
And why, pray tell, is writing files to a filesystem a bad thing? Writing plenty of small files is a perfectly legitimate use of the filesystem. If a filesystem buckles in that scenario, it's the fault of the filesystem, not the application. Blaming the application is blaming the victim.

ext4 and data loss

Posted Mar 13, 2009 23:46 UTC (Fri) by bojan (subscriber, #14302) [Link]

> That's my whole point: sometimes you want atomicity without durability. rename without fsync is how you express that. Except on certain recent filesystems, it's always worked that way. ext4 not putting a write barrier before rename is a regression.

Just because something worked one way in one mode of one file system, doesn't mean it is the only way it can work, nor that applications should rely on it. If you want atomicity without durability, you get it on ext4, even without Ted's most recent patches (i.e. you get the empty file). If you want durability as well, you call fsync.

> And why, pray tell, is writing files to a filesystem a bad thing?

Writing out files that have _not_ changed is a bad thing. Or are you telling me that KDE changes all of its configuration files every few minutes?

BTW, the only reason fsync is slow on ext3, is because it does sync of all files. That's something that must be fixed, because it's nonsense.

ext4 and data loss

Posted Mar 14, 2009 1:58 UTC (Sat) by quotemstr (subscriber, #45331) [Link]

Just because something worked one way in one mode of one file system...
There's plenty of precedent. The original Unix filesystem worked that way. UFS works that way with soft-updates. ZFS works that way. There are plenty of decent filesystems that will provide atomic replace with rename.
...you get it on ext4, even without Ted's most recent patches (i.e. you get the empty file).
Not from the perspective of the whole operation you don't. You set out trying to replace the contents of the file called /foo/bar, atomically. If /foo/bar ends up being a zero-length file, the intended operation wasn't atomic. That's like saying you don't need any synchronization for a linked list because the individual pointer modifications are atomic. Atomic replacement of a file without forcing an immediate disk sync is something a decent filesystem should provide. Creating a write barrier on rename is an elegant way to do that.

ext4 and data loss

Posted Mar 15, 2009 6:01 UTC (Sun) by bojan (subscriber, #14302) [Link]

> Creating a write barrier on rename is an elegant way to do that.

Except that rename(s), as specified, never actually guarantees that.

ext4 and data loss

Posted Mar 15, 2009 6:04 UTC (Sun) by bojan (subscriber, #14302) [Link]

That should have been rename(2), of course.

ext4 and data loss

Posted Mar 14, 2009 12:53 UTC (Sat) by nix (subscriber, #2304) [Link]

Sure. Fixing fsync() being sync() on ext3 is easy, as long as you don't
mind someone else's data showing up in your partially-synced files after
reboot. Oh, wait, that's a security hole.

ext4 and data loss

Posted Mar 15, 2009 6:03 UTC (Sun) by bojan (subscriber, #14302) [Link]

Actually, in ordered mode it should be made a no-op by default. The fact that it locks the machine up is a major regression.

ext4 and data loss

Posted Mar 14, 2009 1:23 UTC (Sat) by flewellyn (subscriber, #5047) [Link]

No, because rename() is only changing the metadata. The data of the file itself has not been changed by that call.

If you were to write new data to the file and THEN call rename, a crash right afterwards might mean that the updates were not saved. But the only way you could lose the file's original data here is if you opened it with O_TRUNC, which is really stupid if you don't fsync() immediately after closing.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds