Posted Jul 9, 2006 19:03 UTC (Sun) by hein.zelle (guest, #33324)
In reply to: Backups by addw
Parent article: The 2006 Linux File Systems Workshop

I'll heartily agree that making sufficient backups of large filesystems can become a serious problem, but perhaps you should also consider what kind of data will fill up a drive that is so large it actually becomes a problem.

If a disk becomes so large that write times (of e.g. the entire disk) are too slow to do a regular backup to a similar disk, then I think you can assume that the data is not very volatile either: writing the changes would take too much time as well. I think in many cases, huge databases could theoretically be split up in a small volatile part, and a large not-so-volatile part. This makes it possible to backup the non-volatile part at low frequencies, while the volatile part gets backed up more often.

We do something like this at work, where we have several terabytes (I suppose a relatively small dataset compared to others) of which about 500gb changes often, and the rest is relatively or completely static. We use external discs (usb) to backup the static or slowly changing part about once a month. The volatile part is backed up more often, in this case also using external usb disks but with incremental and full backups.

In our case we've chosen a raid5 main storage system with a hot-spare drive to provide some reliability by itself, apart from the backups. We have not had to fall back on the backups yet, but everything appears to work well and the backup times are not bothersome at all.

I suppose the problem will indeed get worse with the increasing drive sizes, and alternatives like tape may become impossible at some point. However, using a spare drive (in usb enclosure or similar) should remain a viable backup option, I think. If not, then I would seriously wonder if the owner of the disk shouldn't be considering a (or perhaps multiple) more expensive raid system(s) with redundancy to deal with the problem. And there will obviously be exceptions where people actually do store lots and lots of volatile information that must be backed up, but I highly doubt that those exceptions would not consider the more expensive redundant options anyway.

