ext4 and data consistency

Posted May 14, 2010 17:50 UTC (Fri) by njs (subscriber, #40338)
In reply to: ext4 and data consistency by anton
Parent article: The Next3 filesystem

> We have dozens of Debian systems running on ext3 (presumably without paranoid mode), and we have not had a single problem with a dpkg database corrupted by the file system.

No filesystem goes out and corrupts the dpkg database, but dpkg failing to properly ensure on-disk consistency might make it possible for an untimely power failure (or whatever) to trash its database. How often do you pull the plug while dpkg is running?

That's why robustness is so hard -- it's almost impossible to test. That doesn't mean it isn't important. All it takes is one power failure with just the right timing to trash a datastore. Which is, of course, the whole problem here -- it means that as users we have to rely on external signals, like how I still don't really trust MySQL, because sure, I know they have transactions now, but do I *really* trust a group who was at one point talking about how useless they are to later have the necessary mind-numbing paranoia to catch every edge case? And hey, over here there's Postgres, whose developers clearly *are* absurdly paranoid, excellent...

Or, how you don't trust ext4, even though you have no statistics on it either, because of how Ted T'so's messages came across. It's just a mystery to me how his basically sensible posts gave you (and others) this image of him as some kind of data-eating monster.

ext4 and data consistency

Posted May 14, 2010 19:15 UTC (Fri) by nix (subscriber, #2304) [Link] (2 responses)

That's why robustness is so hard -- it's almost impossible to test. That doesn't mean it isn't important. All it takes is one power failure with just the right timing to trash a datastore.

Virtualization and CoW should have made this much, much easier to test in a finegrained fashion; halt the VM you're using to do the testing, CoW the file, start a new VM using the CoWed copy and mount it; note if it failed and if so how, kill the VM, remove the CoWed copy of the file and let the VM run for another few milliseconds (or, if you're being completely pedantic, another instruction!)

ext4 and data consistency

Posted May 14, 2010 19:36 UTC (Fri) by njs (subscriber, #40338) [Link] (1 responses)

That's a neat idea. I don't think we have cycle-accurate VMs in FOSS yet, but it doesn't matter for this, you can do the halt/check after every disk write, not every instruction. It still doesn't solve a major part of the problem -- you also need to exercise all the weird corner cases that only arise under certain sorts of memory pressure, or what happens if the disk is fragmented in *this* way and has *this* queue write depth and that makes the elevator algorithm tempted to reorder writes in an unfortunate way, etc. -- but it'd be really useful!

ext4 and data consistency

Posted May 14, 2010 20:41 UTC (Fri) by nix (subscriber, #2304) [Link]

I don't think we have cycle-accurate VMs in FOSS yet

They just need to be accurate enough that stuff works. We're not trying to make Second Reality run, here. I can't think of anything that runs on Core 2 but not AMD Phenom because of differing instruction timings!

all the weird corner cases that only arise under certain sorts of memory pressure

Seems to me that the balloon driver is what we want; it can add memory to the guest on command, can't it also take it away? I don't see why we can't do an analogue of what SQLite does in its testing procedures (use a customized allocator that forces specific allocations to fail). The disk-fragmentation stuff would take a lot more work, probably a custom block allocator, which is a bit tough since the block allocator is one of the things we're trying to test!

ext4 and data consistency

Posted May 15, 2010 9:57 UTC (Sat) by anton (subscriber, #25547) [Link]

No filesystem goes out and corrupts the dpkg database, but dpkg failing to properly ensure on-disk consistency might make it possible for an untimely power failure (or whatever) to trash its database.

The file system does not have to go out to do it, because it was entrusted with that data; so it can just fail to keep it consistent while staying at home. A good file system will properly ensure on-disk consistency without extra help from applications (beyond applications keeping the files consistent from the view of other processes).

How often do you pull the plug while dpkg is running?

Never. And I doubt it happens in a significant number of cases for Ubuntu users, either. And the subset of cases where ext3 corrupts the database is even smaller. That's why I questioned the drag's claim.

That's why robustness is so hard -- it's almost impossible to test.

And that's why I find the attitude that not the file system, but applications should be responsible for data consistency in case of an OS crash or power outage absurd. Instead of testing one or a few file systems, thousands of applications would have to be tested.