By Forrest Cook
November 28, 2007
Backing up data stored on computers is one of the most important
jobs of a systems administrator. A regular backup routine can save
large amounts of heartache and frustration when a disk drive or
system fails. Disk failure should be treated as something that is
guaranteed to happen in the life (and death) of every disk drive.
And, disk failures always seem to happen at the worst possible
moment. Typical failures happen on Friday afternoon before a vacation
is about to start or when the boss comes into your office
demanding that critical report that lives on the machine with the
smoke curling out of the power supply.
Over the years, your author has lived through many backup technologies.
In the early days of home computing and CP/M systems,
floppy to floppy transfer was the only method to save data.
Floppies were unreliable and multiple copies were important.
When hard drives became normal hardware on DOS-based microprocessor
systems, backups were performed on piles of floppy disks or short-lived
tape technologies. It was a bad day when floppy disk 29 of a batch of 30
encountered a read error during the restoration of a disk.
Mainframe systems in the early 1980s required copying the contents
of washing machine sized disk drives to piles of 9 track open reel tapes.
As drives were added, the piles of tapes became larger. Large storage
areas were required for storing racks of tapes.
The 1990s brought larger disk drives and the capabilities of the
mainframes and PCs were converging. Single filesystems could be
copied to DC100 serpentine tape cartridges, if one had a lot patience.
Helical scan SCSI tape drives such as Exabyte 8mm and DDS 4mm were able to
store the contents of multiple filesystems on one tape.
For a brief while, tape capacity surpassed filesystem size.
Robotic tape library machines could be programmed to automate
the backup process and allow large numbers of filesystems to
be copied to stacks of tapes.
Disk capacities continued to expand rapidly. AIT tapes were
good for larger backups, but the media was pricey.
RAID arrays became a good way to increase storage
capacity and improve reliability, but downtime could be long in the event
of a controller failure.
RAID backups are still critically important.
Disk drive prices continued to fall.
At some point after the year 2000,
the price/performance of disks versus tapes made it more economical
to buy another disk drive to copy data to.
For the moment, it appears that the disk/tape competition
is over and disks won.
With a removable drive sled or a USB drive, a hard drive can now be
treated as a high speed random access data cartridge.
With multiple online machines, it is possible to
use one machine as a backup repository for another's data.
Today, it is possible to buy a 300GB disk drive for just over
$100. Larger drives can be had for a slightly higher cost per byte.
On a Linux platform, two of the oldest and most common backup
utilities are dump and tar. Both work with tape and disk-based
archives. Dump has the advantage of being able to dig through
the contents of an archive and pick individual files to restore
before reading the entire media. Unfortunately, the dump archive
format has gone through a lot of changes. This means, for example,
that a dump archive that was created on a Red Hat 7.3 system may be unreadable on a Ubuntu 7.04 system. Reading old tar files is more
likely to result in success across systems of different vintages.
Your author decided to standardize on tar-based backups.
Now for some current real-world examples for performing
disk-to-disk backups:
Here's how to use dump to copy the local / filesystem to a compressed
and datestamped file on the same machine's /backup filesystem:
cd /backup
/sbin/dump 0ufa - / | bzip2 > ./localslash`date +%Y%m%d`.bz2
Here's how to use tar to do the same type of local to local backup:
cd /backup
/bin/tar cf - / | bzip2' > ./localslash`date +%Y%m%d`.tar.bz2
Here's how dump is used to backup the / filesystem on a machine
called remote to the local machine's /backup partition:
cd /backup
ssh remote '/sbin/dump 0ufa - / | bzip2' > ./remoteslash`date +%Y%m%d`.bz2
Here's how to use tar to do the remote to local backup:
cd /backup
ssh remote '/bin/tar cf - / | bzip2' > ./remoteslash`date +%Y%m%d`.tar.bz2
The above commands should be run from the root account and
the remote backups can work without passwords if ssh is setup
correctly. Ssh and the ssh server should be installed and configured
on the machines. The dump and/or tar manual pages should be consulted
for more information on the various command options.
Restoration of the filesystems involves using
bunzip2 to uncompress the archive, then restore (for dump) or tar
(for tar) to split out the contents to a local disk.
Restoration across the network is possible with the use of ssh.
A good backup scheme should be devised. Your author has a dedicated
machine with a large disk drive and an old DDS3 tape drive that is used
to backup all of the rest of his machines to. Variations on the
above examples are used in several machine-specific scripts to
backup one machine at a time. The backups are performed several times
a month.
Backups can be copied from the backup machine's disk to tape for
offsite storage. The entire backup set is occasionally copied
to another machine's large disk for redundancy.
Datasets can simply be copied with cp to removable media.
A 100GB+ audio archive is managed differently than standard filesystems,
the rsync command is used to clone the data from one machine to another.
In the early 1990s, your author couldn't imagine ever getting
close to filling up a 9GB disk drive. Then came audio archives,
digital cameras with movie modes and other large data sources.
Several hard drive failures and machine meltdowns have occurred,
but no data has been lost. With a little planning, your data can
be kept safe.
(
Log in to post comments)