User: Password:
Subscribe / Log in / New account

Ext3 and write caching by drives are the data killers...

Ext3 and write caching by drives are the data killers...

Posted Sep 3, 2009 23:18 UTC (Thu) by landley (subscriber, #6789)
In reply to: Ext3 and write caching by drives are the data killers... by Cato
Parent article: Ext3 and RAID: silent data killers?

The paragraph starting "But what if more than one drive fails?" is misleading. You don't need another drive to fail to experience this problem, all you need is an unclean shutdown of an array that's both dirty and degraded. (Two words: "atime updates".) The second failure can be entirely a _software_ problem (which can invalidate stripes on other drives without changing them, because the parity information needed to use them is gone). Software problem != hardware problem, it's not the same _kind_ of failure.

People expect RAID to protect against permanent hardware failures, and people expect journaling to protect against data loss from transient failures which may be entirely due to software (ala kernel panic, hang, watchdog, heartbeat...). In the first kind of failure you need to replace a component, in the second kind of failure the hardware is still good as new afterwards. (Heck, you can experience unclean shutdowns if your load balancer kills xen shares impolitely. There's _lots_ of ways to do this. I've had shutdown scripts hang failing to umount a network share, leaving The Button as the only option.)

Another problematic paragraph starts with "RAID arrays can increase data reliability, but an array which is not running with its full complement of working, populated drives has lost the redundancy which provides that reliability."

That's misleading because redundancy isn't what provides this reliability, at least in other contexts. When you lose the redundancy, you open yourself to an _unrelated_ issue of update granularity/atomicity. A single disk doesn't have this "incomplete writes can cause collateral damage to unrelated data" problem. (It might start to if physical block size grows bigger than filesystem sector size, but even 4k shouldn't do that on a properly aligned modern ext3 filesystem.) Nor does RAID 0 have an update granularity issue, and that has no redundancy in the first place.

I.E. a degraded RAID 5/6 that has to reconstruct data using parity information from multiple stripes that can't be updated atomically is _more_ vulnerable to data loss from interrupted writes than RAID 0 is, and the data loss is of a "collateral damage" form that journaling silently fails to detect. This issue is subtle, and fiddly, and although people keep complaining that it's not worth documenting because everybody should already know it, the people trying to explain it keep either getting it _wrong_ or glossing over important points.

Another point that was sort of glossed over is that journaling isn't exactly a culprit here, it's an accessory at best. This is a block device problem which would still cause data loss on a non-journaled filesystem, and it's a kind of data loss that a fsck won't necessarily detect. (Properly allocated file data elsewhere on the disk, which the filesystem may not have attempted to write to in years, may be collateral damage. And since you have no warning you could rsync the damaged stuff over your backups if you don't notice.) If it's seldom-used data it may be long gone before you notice.

The problem is that journaling gives you a false sense of security, since it doesn't protect against these issues (which exist at the block device level, not the filesystem level), and can hide even the subset of problems fsck would detect, by reporting a successful journal replay when the metadata still contains collateral damage in areas the journal hadn't logged any recent changes to.

I look forward to btrfs checksumming all extents. That should at least make this stuff easier to detect, so you can know when you _haven't_ experienced this problem.


(Log in to post comments)

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds