LWN.net Logo

RAID 5/6 code merged into Btrfs

RAID 5/6 code merged into Btrfs

Posted Feb 4, 2013 19:10 UTC (Mon) by Jonno (subscriber, #49613)
In reply to: RAID 5/6 code merged into Btrfs by butlerm
Parent article: RAID 5/6 code merged into Btrfs

Due to the btrfs design a write intent bitmap isn't necessary. Checksums make it possible to figure out which drive is at fault without one, you just have to do a scrub after a crash.

Additionally, btrfs already keeps track of the last five transactions it committed, so it should be possible to automatically scrub just those, but I don't know if that is planed, or if the devs have something even smarter in mind.


(Log in to post comments)

RAID 5/6 code merged into Btrfs

Posted Feb 4, 2013 19:59 UTC (Mon) by masoncl (subscriber, #47138) [Link]

It's true that crcs allow us to figure out if the data on the drives is correct. But, if you crash while updating the parity and you lose one of the drives (not unusual in a power failure), you need to be able to rebuild the data from parity.

If the parity isn't consistent with the rest of the stripe, the rebuild isn't possible.

-chris

RAID 5/6 code merged into Btrfs

Posted Feb 6, 2013 15:27 UTC (Wed) by Jonno (subscriber, #49613) [Link]

> If the parity isn't consistent with the rest of the stripe, the rebuild isn't possible.
True, but a write-intent bitmap wouldn't help with that either, as all it does is tell you which drive(s), if any, is out of date and need to be rebuilt, information that won't help if you lost a drive (or two for raid6) and can't rebuild anything.

RAID 5/6 code merged into Btrfs

Posted Feb 6, 2013 18:26 UTC (Wed) by butlerm (subscriber, #13312) [Link]

The purpose of a write intent bitmap is not to recover a failed drive, it is to recover from a lost write. In the event of a power failure or system crash, one or more of the writes may be lost (or partially completed), leaving the stripe parity in an inconsistent state.

Correct parity (sufficient to recover from a subsequent drive failure) can be trivially regenerated using the contents of the write intent bitmap. The data on the blocks actually being written to may be still be incomplete of course, but that doesn't matter for the purpose of protecting the data on other other blocks in the same stripe.

If a drive fails and the system crashes at the same time a stripe update is in progress, it is entirely possible of course that unrelated parts of the stripe being updated may become unrecoverable, for lack of consistent parity information. You can see the attraction of the ZFS full stripe minimum block size policy.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds