RAID 5 is completely undepdendable. Anyone who uses RAID5 and expects reliability is ignorant or incompetent. It also has terrible performance.
RAID 6 is slightly less bad. If you want to avoid problems with crashes, outages, you should have multiple hot standbys. if you want performance you should use RAID 10.
Either way you should use a backup as your data loss reduction strategy.
Posted Sep 1, 2009 7:46 UTC (Tue) by job (guest, #670)
[Link]
I would expect the same problem to affect RAID10, as a double fault can kill them too if you're very unlucky.
Ext3 and RAID: silent data killers?
Posted Sep 1, 2009 8:05 UTC (Tue) by drag (subscriber, #31333)
[Link]
Well as single fault can destroy any data if you want to look at it that way... But generally with one drive gone either Raid 6 or Raid 10 should still be adequate.
With RAID 5 the amount of time it takes to recover is so long nowadays that the chances of having a double fault is pretty good. It was one thing to have 20GB with 30MB/s performance, but it's quite another to have 1000GB with 50MB/s performance...
Ext3 and RAID: silent data killers?
Posted Sep 11, 2009 1:18 UTC (Fri) by Pc5Y9sbv (guest, #41328)
[Link]
I agree you cannot blindly use RAID5 without considering the sizing, but what do you consider an acceptable recovery time?
My cheap MD RAID5 with three 500 GB SATA drives allows me to have 1TB and approximately 100 MB/s per drive throughput, which implies a full scan to re-add a replacement drive might take 2 hours or so (reading all 500 GB from 2 drives and writing 500 GB to the third at 75% of full speed). I have never been in a position where this I/O time was worrisome as far as a double fault hazard. Having a commodity box running degraded for several days until replacement parts are delivered is a more common consumer-level concern, which has not changed with drive sizes.
Ext3 and RAID: silent data killers?
Posted Sep 3, 2009 5:05 UTC (Thu) by k8to (subscriber, #15413)
[Link]
Double fault can kill raid 10 also, but you're much less likely to have the fault propogate as discussed in the article, and the downtime for bringing in a standby is much smaller, so standby drives are more effective.
Meanwhile, you also get vastly better performance, and higher reliability of implementation.
It's really a no brainer unless you're poor.
Ext3 and RAID: silent data killers?
Posted Sep 3, 2009 5:26 UTC (Thu) by dlang (✭ supporter ✭, #313)
[Link]
actually, if you have a read-mostly workload raid 5/6 can end up being as fast as raid 10. I couldn't believe this myself when I first ran into it, but I have a large (multiple TB) database used for archiving log data and discovered that read/search performance was the same with raid 6 as with raid 10.
in digging further I discovered that they key to performance was to have enough queries in flight to keep all disk heads fully occupied (one outstanding query per drive spindle), and you can do this with both raid 6 and raid 10.
Ext3 and RAID: silent data killers?
Posted Sep 1, 2009 8:11 UTC (Tue) by drag (subscriber, #31333)
[Link]
The way I look at it is like this:
RAID = availability/performance
BACKUPS = data protection.
Anything other way of looking at is pretty much doomed to be flawed.
Ext3 and RAID: silent data killers?
Posted Sep 1, 2009 15:43 UTC (Tue) by Cato (subscriber, #7643)
[Link]
This is a good way to look at it. Starting with a near-CDP tool such as rsnapshot is a good approach to snapshots, backing up data as frequently as every hour with low overhead, through rsync with multi-version support through hard links between snapshots. Then if the overhead of a scan every hour is too much, or you need very fast recovery from a disk fault, add RAID as well.
Ext3 and RAID: silent data killers?
Posted Sep 1, 2009 16:05 UTC (Tue) by jonabbey (subscriber, #2736)
[Link]
Thanks for the reference to rsnapshot!
Ext3 and RAID: silent data killers?
Posted Sep 1, 2009 16:47 UTC (Tue) by martinfick (subscriber, #4455)
[Link]
BACKUPS are poor, version control is the only sane backup. Backups are horrible to recover from. Backups provide no sane automatable mechanism for pruning older data (backups) that doesn't suffer from the same corruption/accidental deletion problem that originals have, but worse, amplified since they don't even have a good mechanism for sanity checking (usage)! Backups tend to backup corrupted data without complaining.
Backups are good for certain limited chores such as backing up your version control system! :) But ONLY if you have a mechanism to verify the sanity of your previous backup and the original before making the next backup. Else, you are back to backing up corrupted data.
A good version control system protects you from corruption and accidental deletion since you can always go to an older version. And the backup system with checksums (often built into VCS) should protect the version control system.
If you don't have space for VCing your data you don't likely really have space for backing it up either, so do not accept this as an excuse to not vcs your data instead of backing it up.
Ext3 and RAID: silent data killers?
Posted Sep 1, 2009 17:44 UTC (Tue) by Cato (subscriber, #7643)
[Link]
Since I've researched this a lot recently, here are some rsync/librsync based tools that work somewhat like version control systems but are intended for system backups. They qualify as 'near-CDP' since rsync is efficient at scanning for changes.
rsnapshot is pretty good as a 'sort of' version control system for any type of file including binaries. It doesn't do any compression, just rsync plus hard links, but works very well within its design limits. It can backup filesystems including the hard links (use rsync -avH in the config file), and is focused on 'pull' backups i.e. backup server ssh's into the server to be backed up. It's used by some web hosting providers who back up tens of millions of files every few hours, with scans taking a surprisingly short time due to the efficiency of rsync. Generally rsnapshot is best if you have a lot of disk space available, and not much time to run the backups in.
rdiff-backup may be closer to what you are thinking of - unlike rsnapshot it only stores the deltas between versions of a file, and stores permissions etc as metadata (so you don't have to have root on the box being backed up to rsync arbitrary files). It's a bit slower than rsnapshot but a lot of people like it. It does include checksums which is a very attractive feature.
duplicity is somewhat like rsnapshot, but can also do encryption, so it's more suitable for backup to a system you don't control.
There are a lot of these tools around, based on Mike Rubel's original ideas, but these ones seem the most actively discussed.
For a non-rsync backup, dar is excellent but not widely mentioned - it includes per-block encryption and compression, and per-file checksums, and is generally much faster for recovery than tar, where you must read through the whole archive to recover.
rdiff-backup, like VCS tools, will have difficulty with files of 500 MB or more - it's been reported that such files don't get backed up, or are not delta'ed. Very large files that change frequently (databases, VM images, etc) are a problem for all these tools.
Ext3 and RAID: silent data killers?
Posted Sep 1, 2009 17:55 UTC (Tue) by dlang (✭ supporter ✭, #313)
[Link]
unless your version control stores your data somewhere other than on your computer, it's a poor substitute for a backup.
there are lots of things that can happen to your computer (including your house burning down) that will destroy everything on it.
no matter how much protection you put into your storage system, you still need backups.
Ext3 and RAID: silent data killers?
Posted Sep 1, 2009 18:05 UTC (Tue) by martinfick (subscriber, #4455)
[Link]
Local backups suffer from the same problem as local version control.
Thus, locality is unrelated to whether your are using backups or version control. Yes, it is better to put it on another computer, or, at least another physical device. But, this is in no way an argument for using backups instead of version control.
Ext3 and RAID: silent data killers?
Posted Sep 1, 2009 18:05 UTC (Tue) by joey (subscriber, #328)
[Link]
> If you don't have space for VCing your data you don't likely really have
> space for backing it up either, so do not accept this as an excuse to not
> vcs your data instead of backing it up.
I'd agree, but you may not have memory to VCS your data. Git, in particular, scales memory usage badly with large data files.
Ext3 and RAID: silent data killers?
Posted Sep 1, 2009 18:16 UTC (Tue) by martinfick (subscriber, #4455)
[Link]
If you have disk space, you have memory: it's called swap. Use it appropriately. With ~$60 TB disks, there is no excuse for either not having enough memory or enough space to VC your data.
Ext3 and RAID: silent data killers?
Posted Sep 2, 2009 0:39 UTC (Wed) by drag (subscriber, #31333)
[Link]
> BACKUPS are poor, version control is the only sane backup.
If your using version control for backups then that is your backup. Your
sentence does not really make a whole lot of sense and is nonsensical.
There is no difference.
My favorite form of backup is to use Git to sync data on geographically
disparate machines. But this is only suitable for text data. If I have to
backup photographs then source code management systems are utter shit.
> Backups are horrible to recover from.
They are only horrible to recover with if the backup was done poorly. If
you (or anybody else) does a shitty job of setting them up then it's your
(or their's) fault they are difficult.
Backing up is a concept.
Anyways its much more horrible to recover data that has ceased to
exist.
> Backups provide no sane automatable mechanism for pruning older data
> (backups) that doesn't suffer from the same corruption/accidental deletion
> problem that originals have, but worse, amplified since they don't even
> have a good mechanism for sanity checking (usage)! Backups tend to backup
> corrupted data without complaining.
Your doing it wrong.
The best form of backup is to full backups to multiple places. Ideally they
should be in a different region. You don't go back and prune data or clean
them up. Thats WRONG. Incremental backups are only useful to reduce the
amount of dataloss between full backups. A full copy of _EVERYTHING_ is a
requirement. And you save it for as long as that data is valuable. Usually
5 years.
It depends on what your doing but a ideal setup would be like this:
* On-site backups every weekend. Full backups. Stored for a few months.
* Incremental backups twice a day, and resets at the weekend with the full
backup.
* Every month 2 full backups are stored for 2-3 years.
* Off-site backups 1 a month, stored for 5 years.
etc. etc.
That would probably be a good idea for most small/medium businesses.
If your relying on a server or a single datacenter to store your data
reliably then your a fool. I don't give a shit on how high quality your
server hardware is or file system or anything. A single fire, vandalism,
hardware failure, disaster, sabotage, or any number of things can utterly
destroy _everything_.
Ext3 and RAID: silent data killers?
Posted Sep 3, 2009 7:51 UTC (Thu) by Cato (subscriber, #7643)
[Link]
On full backups: one of the nice things about rsnapshot and similar rsync-based tools is that every backup is both a full backup and an incremental backup. Full in that previous backups can be deleted without any effect on this backup (thanks to hard links), and incremental in that the data transfer required is proportional to the specific data blocks that have changed (thanks to rsync).
Ext3 and RAID: silent data killers?
Posted Sep 3, 2009 5:06 UTC (Thu) by k8to (subscriber, #15413)
[Link]
Yes agreed, RAID is for availability and performance. RAID 5 doesn't offer performance, and the availability story isn't great either. So don't use it.
Ext3 and RAID: silent data killers?
Posted Sep 4, 2009 10:38 UTC (Fri) by nix (subscriber, #2304)
[Link]
'Terrible performance' is in the eye of the beholder. So is reliability. Software RAID is constrained by bus bandwidth, so RAID 10 writes may well be slower than RAID 5 if you're bus-limited: and even RAID 5 writes are no slower than writes to a single drive. TBH, 89Mb/s writes and 250MB/s reads (which my Areca card can manage with a four-drive RAID 5 array) don't seem too 'terrible' to me.
Furthermore, reliability is fine *if* you can be sure that once RAID parity computations have happened the stripe will always hit the disk, even if there is a power failure. With battery-backed RAID, this is going to be true (modulo RAID controller card failure or a failure of the drive you're writing to). Obviously if the array is sufficiently degraded reliability isn't going to be good anymore, but doesn't everyone know that?