An md/raid6 data corruption bug
An md/raid6 data corruption bug
Posted Aug 24, 2014 20:55 UTC (Sun) by dlang (guest, #313)In reply to: An md/raid6 data corruption bug by Wol
Parent article: An md/raid6 data corruption bug
showing my math (in case I have it wrong :-)
If drives have a 12% chance of dieing each year (rough figure from the big studies several years back), that's 1% a month, or ~.2% per week per drive
if you have two drives, the chance of one of them failing each week is .4% (probabilities added), but the chance of them both failing in the same week is 0.0004% (0.2%*0.2%)
if you have three drives in RAID5, the chance of one of them failing each week is 0.6% (3*0.2%), whiel the chance of two of them failing the same week is 0.0024% ((3*0.2%)*(2*0.2%)), higher chances of loss, but twice the storage
if you have four drives in RAID6, the chance of one of them failing each week is 0.8% (4*0.2%), the chance of two of them failing in the same week is 0.0048% ((4*0.2%)*(3*0.2%)), while the chance of three failing in the same week is 0.0000002% ((4*0.2%)*(3*0.2%)*(2*0.2%))
if you have 10 drives in RAID6, the chance of one of them failing each week is 2% (10*0.2%), the chance of two of them failing in the same week is 0.036% ((10*0.2%)*(9*0.2%)), while the chance of three failing in the same week is 0.0006% ((10*0.2%)*(9*0.2%)*(8*0.2%))
now, if it takes you less than a week to recover (or the drives are more reliable), these numbers get better fast.
say that instead of a 12% annual failure and one week rebuild time, you have a 3.5% annual failure and 1 day rebuild time. this translates to a ~0.01% chance of failure per drive within the rebuild time.
with this
RAID1 single disk 0.02%, two disk 0.000001% (1e-6)
3 disk RAID5 single disk 0.03%, two disk 0.000006% (6e-6)
4 disk RAID6 single disk 0.04%, two disk 0.000012% (1.2e-5), three disk 0.0000000024 (2.4e-9)
10 disk RAID6 single disk 0.1%, two disk 0.00009% (9e-5), three disk 0.00000007% (7.2e-8)
meanwhile the cost of the redundancy is fixed, so it becomes much cheaper (as a percentage) as the array grows.
Then there is the performance question. If you are doing largely sequential I/O (backups, large media files), the performance hit is fairly small (and theoretically can be reduced to basically nothing, but that would require that the OS know how to do raid stripe aligned writes, and the raid subsystem notice them, neither of which is available in the kernel today), but if you are doing a lot of small, random I/O (databases), the performance hit can be very large due to the read-modify-write cycle needed to keep the parity up to date.
I have hopes that as flash drives and shingled rotating drives become more popular that the kernel will learn that it can save a lot of time by writing an entire eraseblock/RAID stripe/shingle stripe at once and start to prefer doing so.
