An md/raid6 data corruption bug
An md/raid6 data corruption bug
Posted Aug 19, 2014 23:02 UTC (Tue) by Richard_J_Neill (subscriber, #23093)Parent article: An md/raid6 data corruption bug
I suffered from exactly this - and indeed, with modern huge hard drives, there's about a 10% chance of a (detected, unrecoverable) read error occurring for a single bit when doing a copy of the entire drive. The chances are, that this bit error, is probably unimportant (statistically it's most likely to create a pixel-error somewhere in a video file) - but it's catastrophic when the entire RAID array refuses to be rebuilt.
As far as I can see, Linux RAID5 has no support for "I am mildly annoyed about a 1-bit error, but I do still want to keep the other 3.999999999999 TB of my data".
RAID5 rests on the (now completely wrong) assumption that a complete start-to-end read of a healthy drive should never ever experience uncorrected errors. This is the drive manufacturers' fault because the error-correction algorithm hasn't been strengthened in line with increasing disk sizes - but it means RAID5 is now rather dangerous.
