|
|
Log in / Subscribe / Register

An md/raid6 data corruption bug

An md/raid6 data corruption bug

Posted Aug 24, 2014 19:42 UTC (Sun) by Wol (subscriber, #4433)
In reply to: An md/raid6 data corruption bug by dlang
Parent article: An md/raid6 data corruption bug

:-)

I'm an underemployed ex-programmer struggling to make a living while caring for my wife :-)

I've just upgraded my PC to two mirrored 3TB drives. Next step add another drive and make it raid-5. If prices have fallen enough I'll then make it raid-6. So yes, I was talking about the smallest number of drives for each raid setup.

Adding further drives to raid-5 or -6 reduces the space allocated to safety, presumably increasing the risk, but I guess that's moderately negligible. Might even reduce it, by reducing the load on each disk. Like many things, I find my understanding increases with discussion - I might *know* the facts, but I don't always *understand* them - this discussion I know has deepened my understanding of raid.

Cheers,
Wol


to post comments

An md/raid6 data corruption bug

Posted Aug 24, 2014 20:55 UTC (Sun) by dlang (guest, #313) [Link] (3 responses)

growing to larger arrays does mean the chance of any single drive failing in a given time period is higher, but when you need multiple drive failures, the chances are still _really_ low, and is scales linearly to the number of drives in the array.

showing my math (in case I have it wrong :-)

If drives have a 12% chance of dieing each year (rough figure from the big studies several years back), that's 1% a month, or ~.2% per week per drive

if you have two drives, the chance of one of them failing each week is .4% (probabilities added), but the chance of them both failing in the same week is 0.0004% (0.2%*0.2%)

if you have three drives in RAID5, the chance of one of them failing each week is 0.6% (3*0.2%), whiel the chance of two of them failing the same week is 0.0024% ((3*0.2%)*(2*0.2%)), higher chances of loss, but twice the storage

if you have four drives in RAID6, the chance of one of them failing each week is 0.8% (4*0.2%), the chance of two of them failing in the same week is 0.0048% ((4*0.2%)*(3*0.2%)), while the chance of three failing in the same week is 0.0000002% ((4*0.2%)*(3*0.2%)*(2*0.2%))

if you have 10 drives in RAID6, the chance of one of them failing each week is 2% (10*0.2%), the chance of two of them failing in the same week is 0.036% ((10*0.2%)*(9*0.2%)), while the chance of three failing in the same week is 0.0006% ((10*0.2%)*(9*0.2%)*(8*0.2%))

now, if it takes you less than a week to recover (or the drives are more reliable), these numbers get better fast.

say that instead of a 12% annual failure and one week rebuild time, you have a 3.5% annual failure and 1 day rebuild time. this translates to a ~0.01% chance of failure per drive within the rebuild time.

with this

RAID1 single disk 0.02%, two disk 0.000001% (1e-6)
3 disk RAID5 single disk 0.03%, two disk 0.000006% (6e-6)
4 disk RAID6 single disk 0.04%, two disk 0.000012% (1.2e-5), three disk 0.0000000024 (2.4e-9)
10 disk RAID6 single disk 0.1%, two disk 0.00009% (9e-5), three disk 0.00000007% (7.2e-8)

meanwhile the cost of the redundancy is fixed, so it becomes much cheaper (as a percentage) as the array grows.

Then there is the performance question. If you are doing largely sequential I/O (backups, large media files), the performance hit is fairly small (and theoretically can be reduced to basically nothing, but that would require that the OS know how to do raid stripe aligned writes, and the raid subsystem notice them, neither of which is available in the kernel today), but if you are doing a lot of small, random I/O (databases), the performance hit can be very large due to the read-modify-write cycle needed to keep the parity up to date.

I have hopes that as flash drives and shingled rotating drives become more popular that the kernel will learn that it can save a lot of time by writing an entire eraseblock/RAID stripe/shingle stripe at once and start to prefer doing so.

An md/raid6 data corruption bug

Posted Aug 25, 2014 20:39 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (1 responses)

> If drives have a 12% chance of dieing each year (rough figure from the big studies several years back), that's 1% a month, or ~.2% per week per drive

Is that 12% for each drive or 12% of drives are expected to die (on second thought…is there a difference?)? If the latter, did you get 1% per month from 12% / 12 or (1 - (1 - .12) ^ (1 / 12)) == 1.06%? The latter seems more accurate, but that's mainly my gut feeling here. As an example, with a 50% failure rate becomes 5.6% per month instead of 4.17% because you expect ~94.4% to survive each month until you're left with 50% still around. Then again, drives are independent (…ish), so maybe just straight division is better there. Anyways, leaving it here for a second thought on it.

An md/raid6 data corruption bug

Posted Aug 25, 2014 20:45 UTC (Mon) by dlang (guest, #313) [Link]

and this is why I show the math :-)

It was intended to be 12% of drives die in any given year, so 1% of drives die in any given month, or .2% of drives die in any given week (assuming that the failures are really independent, not a latent manufacturing defect that will affect all drives of a class)

An md/raid6 data corruption bug

Posted Feb 13, 2015 16:46 UTC (Fri) by ragaar (guest, #101043) [Link]

Your decimal places seem to be slightly off in the RAID6 scenario [3 drive failure]

fail_week = 0.002;
# n = Number of drives

fail_each_week(n) = n*fail_week;
# NOTE: You can multiply failure by 100 to see % failure

# Pseudo-code
Foreach(4, 3, 2)
Y *= fail_each_week(x);
End

Y = ((4*0.002)*(3*0.002)*(2*0.002)) = 1.92e-07 ≈ 2e-07
Y * 100 ≈ 2e-05%

We're amongst friends so rounding to 2 is fine, but [as of Feb 2015] there seems to be couple extra decimal places in your post.
RAID6 scenario [3 drive failure] shows 2e-07% as opposed to 2e-05%.

Side comment:
Thanks for consolidating this information. This was the only post that I've found combining HDD failure rates on a yearly/monthly/weekly interval, laid out the basic statistical math, and provided description as to the intent applied during each step.

It is well thought out posts like this that help make the internet better. You get a +1 (gold star) vote from me :)


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds