User: Password:
Subscribe / Log in / New account

RAID rebuild

RAID rebuild

Posted Jun 18, 2009 7:30 UTC (Thu) by rbuchmann (subscriber, #52862)
Parent article: What ever happened to chunkfs?

With magnetic disk drives it's not uncommon that a read error (marking a disk as faulty) will go away after a write (due to sector reallocation).

So RAID rebuilds happen from time to time.

A similar solution to chunkfs for fast RAID rebuild is this:

- take two or more disk drives
- partition them in smaller chunks (say 50GB or less)
- build RAID(1+) across the chunks of different drives

This will make RAID rebuilds necessary only for the "damaged" chunks. And it already helped me a few times.

(Log in to post comments)

RAID rebuild

Posted Jun 20, 2009 21:29 UTC (Sat) by anton (subscriber, #25547) [Link]

With magnetic disk drives it's not uncommon that a read error (marking a disk as faulty) will go away after a write (due to sector reallocation).
That's the theory, and it's quite plausible, if there are spare blocks on the disk, but I have seen several drives (from different manufacturers) with read errors that were also write errors, and none where the error went away by writing. And it's not that these drives had run out of spare blocks or something; the errors apparently were caused by the head running amok in unusual power supply conditions.

RAID rebuild

Posted Jun 22, 2009 7:05 UTC (Mon) by neilbrown (subscriber, #359) [Link]

It shoulds to me like you need to discover write-intent bitmaps.

Such a bitmap is effectively a set of 'dirty' bits, one for each chunk of the array (and you can choose the chunk size).

So if you set the chunk size to 50GB (I would probably set it a bit smaller) you get the same functionality as you describe, only with much less hassle.

So just create a raid1 or - if you have more than 2 drives - raid10, and

  mdadm --grow /dev/md0 --bitmap=internal --bitmap-chunk=1000000
and you will be much happier.

RAID rebuild

Posted Jun 23, 2009 9:51 UTC (Tue) by rbuchmann (subscriber, #52862) [Link]

What happens if a drive will be marked faulty during a read? To my understanding the write-intent is not set then, so the broken chunk would be not rewritten?

RAID rebuild

Posted Jun 23, 2009 11:08 UTC (Tue) by neilbrown (subscriber, #359) [Link]

A drive is not marked faulty due to a read error (unless the array is degraded ... and even then it probably shouldn't be.... I should fix that).

If md gets a read error when the array is not degraded, it generates the data from elsewhere and tries to write it out. If the write fails, then the drive is marked faulty.

It has not always been that way, but it has for a couple of years.

Now that I think about, there is probably still room for improvement. If it is kicked due to a write error, and it was a cable error, it would be nice if we could re-add the device and it would recover based on the bit map. I'll add that to my list....

(sorry, I didn't read the first part of your comment properly before - I only read the second half and was responding to that. I should learn to rad better ;-)

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds