Posted Jun 20, 2009 21:29 UTC (Sat) by anton (guest, #25547)
[Link]
With magnetic disk drives it's not uncommon that a read
error (marking a disk as faulty) will go away after a write (due to
sector reallocation).
That's the theory, and it's quite plausible, if there are spare blocks
on the disk, but I have seen several drives (from different
manufacturers) with read errors that were also write errors, and none
where the error went away by writing. And it's not that these drives
had run out of spare blocks or something; the errors apparently were
caused by the head running amok in unusual power supply conditions.
RAID rebuild
Posted Jun 22, 2009 7:05 UTC (Mon) by neilbrown (subscriber, #359)
[Link]
It shoulds to me like you need to discover write-intent bitmaps.
Such a bitmap is effectively a set of 'dirty' bits, one for each
chunk of the array (and you can choose the chunk size).
So if you set the chunk size to 50GB (I would probably set it a bit
smaller) you get the same functionality as you describe, only with
much less hassle.
So just create a raid1 or - if you have more than 2 drives - raid10,
and
Posted Jun 23, 2009 9:51 UTC (Tue) by rbuchmann (subscriber, #52862)
[Link]
What happens if a drive will be marked faulty during a read? To my understanding the write-intent is not set then, so the broken chunk would be not rewritten?
RAID rebuild
Posted Jun 23, 2009 11:08 UTC (Tue) by neilbrown (subscriber, #359)
[Link]
A drive is not marked faulty due to a read error (unless the array is degraded ... and even then it probably shouldn't be.... I should fix that).
If md gets a read error when the array is not degraded, it generates the data from elsewhere and tries to write it out. If the write fails, then the drive is marked faulty.
It has not always been that way, but it has for a couple of years.
Now that I think about, there is probably still room for improvement. If it is kicked due to a write error, and it was a cable error, it would be nice if we could re-add the device and it would recover based on the bit map. I'll add that to my list....
(sorry, I didn't read the first part of your comment properly before - I only read the second half and was responding to that. I should learn to rad better ;-)