Ext3 and write caching by drives are the data killers...

Posted Sep 1, 2009 17:31 UTC (Tue) by Cato (guest, #7643)
In reply to: Ext3 and write caching by drives are the data killers... by ncm
Parent article: Ext3 and RAID: silent data killers?

This PC is on battery backup (UPS) already - that didn't stop the corruption though. This is a home PC, and in any case it really shouldn't be necessary to use a UPS just to avoid filesystem/LVM corruption.

Since the rebuild, I have realised that the user of the PC has been turning it off via the power switch accidentally, which perhaps caused the write cache of the disk(s) to get corrupted and is a fairly severe test. Despite several sudden poweroffs due to this, with the new setup there has been no corruption yet. It seems unlikely that the writes would be pending in the disk's write cache for so long that they couldn't be written out while power was still, but the fact is that both ext3 and LVM data structures got corrupted.

It's acknowledged that ext3's lack of journal checksumming can cause corruption when combined with disk write caching (whereas XFS does have such checksums I think). The only question is whether the time between power starting to drop and the power going completely is enough to flush pending writes (possibly reordered), while also not having any RAM contents get corrupted. Betting the data integrity of a remotely administered system on this time window is not something I want to do.

Ext3 and write caching by drives are the data killers...

Posted Sep 1, 2009 18:06 UTC (Tue) by ncm (guest, #165) [Link]

The only question is whether the time between power starting to drop and the power going completely is enough to flush pending writes (possibly reordered), while also not having any RAM contents get corrupted

That's easy: No. When power starts to drop, everything is gone at that moment. If the disk is writing at that moment, the unfinished sector gets corrupted, and maybe others. UPS for the computer and disk together helps only a little against corruption unless power drops are almost always shorter than the battery time, or it automatically shuts down the computer before getting used up. You may be better off if the computer loses power immediately, and only the disk gets the UPS treatment.

it really shouldn't be necessary to use a UPS just to avoid filesystem/LVM corruption.

Perhaps, but it is. (What is this "should"?) The file system doesn't matter near so much as you would like. They can be indefinitely bad, but can be no more than fairly good. The good news is that the UPS only needs to support the disk, and it only needs to keep power up for a few seconds; then many file systems are excellent, although the bad ones remain bad.

Ext3 and write caching by drives are the data killers...

Posted Sep 9, 2009 20:35 UTC (Wed) by BackSeat (guest, #1886) [Link] (3 responses)

It's acknowledged that ext3's lack of journal checksumming can cause corruption

It may only be semantics, but it's unlikely that the lack of journal checksumming causes corruption, although it may make it difficult to detect.

As for LVM, I've never seen the point. Just another layer of ones and zeros between the data and the processor. I never use it, and I'm very surprised some distros seem to use it by default.

Ext3 and write caching by drives are the data killers...

Posted Sep 10, 2009 20:50 UTC (Thu) by Cato (guest, #7643) [Link] (2 responses)

One interesting scenario, mentioned I think elsehwere in the comments to this article: a single 'misplaced write' (i.e. disk doesn't do the seek to new position, writing to old position) means that a data block goes into the ext3 journal.

In the absence of ext3 journal checksumming, and if there is a crash requiring replay of this journal block, horrible things will happen - presumably garbage is written to various places on disk from the 'journal' entry. One symptom may be log entries saying 'write beyond end of partition', which I've seen a few times with ext3 corruption and I think is a clear indicator of corrupt filesystem metadata.

This is one reason why JBD2 added journal checksumming for use with ext4 - I hope this also gets used by ext3. In my view, it would be a lot better to make that change to ext3 than to make data=writeback the default, which will speed up some workloads and most likely corrupt some additional data (though I guess not metadata).

Ext3 and write caching by drives are the data killers...

Posted Sep 10, 2009 21:05 UTC (Thu) by Cato (guest, #7643) [Link] (1 responses)

Actually the comment about a single incorrect block in a journal 'spraying garbage' over the disk is here: http://lwn.net/Articles/284313/

Ext3 and write caching by drives are the data killers...

Posted Sep 11, 2009 16:33 UTC (Fri) by nix (subscriber, #2304) [Link]

Note that you won't get a whole blockfull of garbage: ext3 will generally
notice that 'hey, this doesn't look like a journal' once the record that
spanned the block boundary is complete. But that's a bit late...

(this is all supposition from postmortems of shagged systems. Thankfully
we no longer use hardware prone to this!)