LWN.net Logo

Journal-guided RAID resync

Journal-guided RAID resync

Posted Nov 26, 2009 20:54 UTC (Thu) by neilbrown (subscriber, #359)
In reply to: Journal-guided RAID resync by nix
Parent article: Journal-guided RAID resync

Hang on. If you crash after a RAID-5 stripe has been written to one of the disks but not the other, you can tell the stripe is inconsistent, but not what the valid contents are (at least, not programmatically.)
Between the moment when a write is requested, and the moment when the success of that write is reported - and possibly further until a barrier request has been acknowledged - both the 'old' data and the 'new' data are valid. The correct thing to do in this case is to treat all of the data blocks as "valid" (because they are) and update the parity block to ensure it is consistent.

What do you think "Valid" means in this context?


(Log in to post comments)

Journal-guided RAID resync

Posted Nov 26, 2009 21:34 UTC (Thu) by nix (subscriber, #2304) [Link]

My worry is that between the time when the new data is written, and the
time when the parity block is updated (or vice versa if the parity write
gets to the disk surface first), if you have a crash, bingo, you have
instant corruption of that stripe. There isn't any way to make those two
writes happen in sync, after all.

(I *know* you know this, so am quite mystified that you're apparently
claiming that it isn't a problem. I don't see how a workaround is even
*possible*: it's why battery-backing of RAID-5 arrays is done at all...)

Journal-guided RAID resync

Posted Nov 27, 2009 1:14 UTC (Fri) by neilbrown (subscriber, #359) [Link]

You only have "instant corruption" if the array becomes degraded before the parity gets corrected. For this reason mdadm will not assemble an array which is both dirty and degraded. I thought I had mentioned this in my original comment, but apparently not. Maybe this is the case implied in the original article, though I don't think the text really matches reality: Either there is a correct fix that is trivial, or no fix is possible.

(The only two ways to avoid corruption when a crash happens on a degraded array are 1/ to journal updates at the raid level, or 2/ use a copy-on-write filesystem that knows about the stripe size and only ever writes into a stripe that does not contain any important information.).

Journal-guided RAID resync

Posted Nov 27, 2009 6:56 UTC (Fri) by nix (subscriber, #2304) [Link]

Aaah, I see (actually whenever you mention this I get it for a few minutes
and then it blurs out of memory again). Yes, that makes sense: if you
don't lose a disk, you have two intact stripes and one stripe containing
not-yet-written garbage: whether that's the parity stripe or not is
immaterial.

So the thing to be worried about here is that RAID-5 only really protects
you from a single disk failure if your array is not being written to (or
is battery-backed).

And I suspect this is the case the article was discussing.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds