One interesting scenario, mentioned I think elsehwere in the comments to this article: a single 'misplaced write' (i.e. disk doesn't do the seek to new position, writing to old position) means that a data block goes into the ext3 journal.
In the absence of ext3 journal checksumming, and if there is a crash requiring replay of this journal block, horrible things will happen - presumably garbage is written to various places on disk from the 'journal' entry. One symptom may be log entries saying 'write beyond end of partition', which I've seen a few times with ext3 corruption and I think is a clear indicator of corrupt filesystem metadata.
This is one reason why JBD2 added journal checksumming for use with ext4 - I hope this also gets used by ext3. In my view, it would be a lot better to make that change to ext3 than to make data=writeback the default, which will speed up some workloads and most likely corrupt some additional data (though I guess not metadata).
Ext3 and write caching by drives are the data killers...
Posted Sep 10, 2009 21:05 UTC (Thu) by Cato (subscriber, #7643)
[Link]
Actually the comment about a single incorrect block in a journal 'spraying garbage' over the disk is here: http://lwn.net/Articles/284313/
Ext3 and write caching by drives are the data killers...
Posted Sep 11, 2009 16:33 UTC (Fri) by nix (subscriber, #2304)
[Link]
Note that you won't get a whole blockfull of garbage: ext3 will generally
notice that 'hey, this doesn't look like a journal' once the record that
spanned the block boundary is complete. But that's a bit late...
(this is all supposition from postmortems of shagged systems. Thankfully
we no longer use hardware prone to this!)