LWN.net Logo

ext4 and data loss

ext4 and data loss

Posted Mar 12, 2009 2:00 UTC (Thu) by qg6te2 (guest, #52587)
Parent article: ext4 and data loss

"... POSIX never really made any such guarantee"

Perhaps the POSIX standard should be rewritten then. The overall philosophy of Unix is to abstract away mundane things such as how data is stored on disk. The user has a moral right to expect that as little data as possible was wiped out if a machine crashes (especially if caused by an OS fault and not hardware).

Applications which want to be sure that their files have been committed to the media can use the fsync() or fdatasync() system calls

I vehemently disagree with this. It will simply cause everybody to use fsync() all the time as a blunt but simple solution to the "state of disk" problem. Which in turn will lead to lower performance, until it is taken as "common knowledge" that calls to fsync() are more hints rather than real requests. Which would of course make fsync() useless.

/proc/sys/vm/dirty_expire_centiseconds
/proc/sys/vm/dirty_writeback_centiseconds


Perhaps the above two settings can be managed automatically as a way of going around the fsync() issue. For example, the more data there is waiting to be dumped to disk, the higher the risk of loss, and hence the shorter the disk commit intervals should be. This will of course reduce the effectiveness of delayed allocation, but performance without safety is not performance at all, especially if the user has to regenerate the lost data.


(Log in to post comments)

ext4 and data loss

Posted Mar 12, 2009 5:53 UTC (Thu) by quotemstr (subscriber, #45331) [Link]

The fundamental problem is that there are two similar but different operations an application developer can request:
  1. open(A)-write(A,data)-close(A)-rename(A,B): replace the contents of B with data, atomically. I don't care when or even if you make the change, but whenever you get around to it, make sure either the old or the new version is in place.
  2. open(A)-write(A,data)-fsync(A)-close(A)-rename(A,B): replace the contents of B with data, and do it now.

In practice, operation 1 has worked as described on ext2, ext3, and UFS with soft-updates, but fails on XFS and unpatched ext4. Operation 1 is perfectly sane: it's asking for atomicity without durability. KDE's configuration is a perfect candiate. Browser history is another. For a mail server or an interactive editor, of course, you'd want operation 2.

Some people suggest simply replacing operation 1 with operation 2. That's stupid. While operation 2 satisfies all the constraints of operation 1, it incurs a drastic and unnecessary performance penalty. By claiming operation 1 is simply operation 2 spelled incorrectly, you remove an important word from an application programmer's vocabulary. How else is an he supposed to request atomicity without durability?

(And using a "real database" isn't a good enough answer: then you've just punted the same problem to a far heavier system, and for no good reason.)

The last patch mentioned in the article seems to make operation 1 work correctly, and that's good enough for me. Still, people need to realize that the filesystem is a database, albeit not a relational one, and that we can use database terminology to describe it.

ext4 and data loss

Posted Mar 12, 2009 19:19 UTC (Thu) by SLi (subscriber, #53131) [Link]

Well, you can always mount your filesystem with the "sync" option if you
want the behavior you describe.

The problem is, then you cannot talk about performance. Disks are slow,
slower than you think because your system has been caching for years.

While it's in a sense unfortunate that in ext4 this happening is more
likely than in ext3 (and it's exactly that, it's still very possible in
ext3), applications relying in that not happening are broken even in
ext3-land, because it does happen (if your system crashes, which shouldn't
happen very often - get a UPS and hardware that does not need binary
drivers).

The solution of applications fsync()ing their critical data is not only
the best solution - it's virtually also the only solution, if you want to
combine any guarantee about data integrity with any performance that isn't
from 1995.

ext4 and data loss

Posted Mar 13, 2009 5:19 UTC (Fri) by qg6te2 (guest, #52587) [Link]

While it's in a sense unfortunate that in ext4 this happening is more likely than in ext3 (and it's exactly that, it's still very possible in ext3), applications relying in that not happening are broken even in ext3-land

An appeal can be made to have better written applications, or more practically, an acceptance can be made that in the real world apps are never perfect. A file system needs to deal with that (no matter what is guaranteed by POSIX) and provide a reasonable trade-off between speed and safety.

In the case of ext3, whether by side effect or design, this trade-off is at a good point. Mounting with the "sync" option sacrifices too much speed, while in the current version of ext4 the trade-off is too aggressively in the direction of speed. Not everybody can afford a UPS, nor should a UPS be required to have a disk with sane contents after a crash.

ext4 and data loss

Posted Mar 13, 2009 13:17 UTC (Fri) by jwarnica (subscriber, #27492) [Link]

Not everyone can afford a computer, either. List price for the smallest APC UPS that includes software is $59.99. Which is pretty cheep. Given the other benefits of UPSs, providing some surge and brownout protection, not having one is just stupid.

General purpose distros assume that you have what, a gig or two of memory. Not everyone can afford memory, either. And there are special case systems which would never have that kind of memory. So if you have a shitty computer, you run either older versions, or specially targeted distros. And if you are building an embedded system, you make choices appropriately.

In 2009, if you choose to have a crippled system that doesn't have a UPS, then choose your filesystem carefully.

ext4 and data loss

Posted Mar 13, 2009 16:43 UTC (Fri) by SLi (subscriber, #53131) [Link]

I think ext4's tradeoff is a very sane one. I don't expect my machine to
crash all the time (in fact I can't remember when it last did, must have
been in something like 2005). If it gives a speedup measured in tens of
percents, it's the only sane thing to do.

And for the case when it's not sane, there's f(data)sync().

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds