Ext3 and RAID: silent data killers?
Ext3 and RAID: silent data killers?
Posted Sep 1, 2009 8:11 UTC (Tue) by drag (guest, #31333)In reply to: Ext3 and RAID: silent data killers? by k8to
Parent article: Ext3 and RAID: silent data killers?
RAID = availability/performance
BACKUPS = data protection.
Anything other way of looking at is pretty much doomed to be flawed.
Posted Sep 1, 2009 15:43 UTC (Tue)
by Cato (guest, #7643)
[Link] (1 responses)
Posted Sep 1, 2009 16:05 UTC (Tue)
by jonabbey (guest, #2736)
[Link]
Posted Sep 1, 2009 16:47 UTC (Tue)
by martinfick (subscriber, #4455)
[Link] (7 responses)
Backups are good for certain limited chores such as backing up your version control system! :) But ONLY if you have a mechanism to verify the sanity of your previous backup and the original before making the next backup. Else, you are back to backing up corrupted data.
A good version control system protects you from corruption and accidental deletion since you can always go to an older version. And the backup system with checksums (often built into VCS) should protect the version control system.
If you don't have space for VCing your data you don't likely really have space for backing it up either, so do not accept this as an excuse to not vcs your data instead of backing it up.
Posted Sep 1, 2009 17:44 UTC (Tue)
by Cato (guest, #7643)
[Link]
rsnapshot is pretty good as a 'sort of' version control system for any type of file including binaries. It doesn't do any compression, just rsync plus hard links, but works very well within its design limits. It can backup filesystems including the hard links (use rsync -avH in the config file), and is focused on 'pull' backups i.e. backup server ssh's into the server to be backed up. It's used by some web hosting providers who back up tens of millions of files every few hours, with scans taking a surprisingly short time due to the efficiency of rsync. Generally rsnapshot is best if you have a lot of disk space available, and not much time to run the backups in.
rdiff-backup may be closer to what you are thinking of - unlike rsnapshot it only stores the deltas between versions of a file, and stores permissions etc as metadata (so you don't have to have root on the box being backed up to rsync arbitrary files). It's a bit slower than rsnapshot but a lot of people like it. It does include checksums which is a very attractive feature.
duplicity is somewhat like rsnapshot, but can also do encryption, so it's more suitable for backup to a system you don't control.
There are a lot of these tools around, based on Mike Rubel's original ideas, but these ones seem the most actively discussed.
For a non-rsync backup, dar is excellent but not widely mentioned - it includes per-block encryption and compression, and per-file checksums, and is generally much faster for recovery than tar, where you must read through the whole archive to recover.
rdiff-backup, like VCS tools, will have difficulty with files of 500 MB or more - it's been reported that such files don't get backed up, or are not delta'ed. Very large files that change frequently (databases, VM images, etc) are a problem for all these tools.
Posted Sep 1, 2009 17:55 UTC (Tue)
by dlang (guest, #313)
[Link] (1 responses)
there are lots of things that can happen to your computer (including your house burning down) that will destroy everything on it.
no matter how much protection you put into your storage system, you still need backups.
Posted Sep 1, 2009 18:05 UTC (Tue)
by martinfick (subscriber, #4455)
[Link]
Thus, locality is unrelated to whether your are using backups or version control. Yes, it is better to put it on another computer, or, at least another physical device. But, this is in no way an argument for using backups instead of version control.
Posted Sep 1, 2009 18:05 UTC (Tue)
by joey (guest, #328)
[Link] (1 responses)
I'd agree, but you may not have memory to VCS your data. Git, in particular, scales memory usage badly with large data files.
Posted Sep 1, 2009 18:16 UTC (Tue)
by martinfick (subscriber, #4455)
[Link]
Posted Sep 2, 2009 0:39 UTC (Wed)
by drag (guest, #31333)
[Link] (1 responses)
If your using version control for backups then that is your backup. Your
My favorite form of backup is to use Git to sync data on geographically
> Backups are horrible to recover from.
They are only horrible to recover with if the backup was done poorly. If
Backing up is a concept.
Anyways its much more horrible to recover data that has ceased to
> Backups provide no sane automatable mechanism for pruning older data
Your doing it wrong.
The best form of backup is to full backups to multiple places. Ideally they
It depends on what your doing but a ideal setup would be like this:
That would probably be a good idea for most small/medium businesses.
If your relying on a server or a single datacenter to store your data
Posted Sep 3, 2009 7:51 UTC (Thu)
by Cato (guest, #7643)
[Link]
Posted Sep 3, 2009 5:06 UTC (Thu)
by k8to (guest, #15413)
[Link]
Ext3 and RAID: silent data killers?
Ext3 and RAID: silent data killers?
Ext3 and RAID: silent data killers?
Ext3 and RAID: silent data killers?
Ext3 and RAID: silent data killers?
Ext3 and RAID: silent data killers?
Ext3 and RAID: silent data killers?
> space for backing it up either, so do not accept this as an excuse to not
> vcs your data instead of backing it up.
Ext3 and RAID: silent data killers?
Ext3 and RAID: silent data killers?
sentence does not really make a whole lot of sense and is nonsensical.
There is no difference.
disparate machines. But this is only suitable for text data. If I have to
backup photographs then source code management systems are utter shit.
you (or anybody else) does a shitty job of setting them up then it's your
(or their's) fault they are difficult.
exist.
> (backups) that doesn't suffer from the same corruption/accidental deletion
> problem that originals have, but worse, amplified since they don't even
> have a good mechanism for sanity checking (usage)! Backups tend to backup
> corrupted data without complaining.
should be in a different region. You don't go back and prune data or clean
them up. Thats WRONG. Incremental backups are only useful to reduce the
amount of dataloss between full backups. A full copy of _EVERYTHING_ is a
requirement. And you save it for as long as that data is valuable. Usually
5 years.
* On-site backups every weekend. Full backups. Stored for a few months.
* Incremental backups twice a day, and resets at the weekend with the full
backup.
* Every month 2 full backups are stored for 2-3 years.
* Off-site backups 1 a month, stored for 5 years.
etc. etc.
reliably then your a fool. I don't give a shit on how high quality your
server hardware is or file system or anything. A single fire, vandalism,
hardware failure, disaster, sabotage, or any number of things can utterly
destroy _everything_.
Ext3 and RAID: silent data killers?
Ext3 and RAID: silent data killers?