LWN.net Logo

Journal-guided RAID resync

Journal-guided RAID resync

Posted Nov 26, 2009 21:00 UTC (Thu) by neilbrown (subscriber, #359)
In reply to: Journal-guided RAID resync by nye
Parent article: Journal-guided RAID resync

Surely this is not the case in the (fairly common) case of out-of-order writes? This sounds like exactly the sort of situation that caused the ext4 kerfuffle earlier this year.
I don't see how out-of-order writes complicate the question ... maybe I'm missing something. A key question that must be understood is "what data is valid", and in the case of multiple parallel writes pending when a crash happens, there are lots of combinations that are all equally valid.
On a different note, can anyone explain to me why running a check or (god forbid) rebuild on a RAID array takes like a dozen times longer than reading/writing the contents of all the disks?
This is not my experience. If you post specifics to linux-raid@vger.kernel.org you will probably get a useful reply.

The only explanation for what you describe that immediately occurs to me is that fact that the check/repair code deliberately slows down when any other IO is active so as not to inconvenience that IO, but you probably know that already.


(Log in to post comments)

Journal-guided RAID resync

Posted Dec 2, 2009 14:03 UTC (Wed) by nye (guest, #51576) [Link]

>I don't see how out-of-order writes complicate the question ... maybe I'm missing something. A key question that must be understood is "what data is valid", and in the case of multiple parallel writes pending when a crash happens, there are lots of combinations that are all equally valid.

I think I may have misunderstood the nature of the problem in question, but IIRC I was thinking about when one version is 'correct', in the sense that it is an old but consistent version of the data, but the other version has been partially updated, and ended up in an invalid state. This might happen, for example, if a metadata change is written before the corresponding data change.
I suppose from the point of the RAID though, that data is valid, in that it's an accurate representation of what the filesystem asked to be on the disk at the given moment, so I see your point.

>This is not my experience. If you post specifics to linux-raid@vger.kernel.org you will probably get a useful reply.

Well, the specific incident that prompted the question involved a hardware RAID controller (my experiences with Linux software RAID have, thus far, been unproblematic). Knowing next to nothing about how RAID is implemented I wondered if the procedure is expected to involve, say, an O(n^2) number of read or writes, or something else that would lead to this slowness being generally expected. Obviously not :).

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds