By Nathan Willis
June 6, 2012
Lars Wirzenius's new backup tool Obnam was just declared 1.0. There
is no shortage of backup options these days, and in some way
Wirzenius's decision to scratch his own itch with the project is par
for the course. But the program does offer a different feature set
than many of its competitors.
For starters, Obnam makes only "snapshot" backups — that is,
every backup looks like a complete snapshot of the system: there are
not separate "full" and "incremental" backup options. That obviates
the need to separately configure full and incremental backups on
different schedules, and it similarly simplifies the restoration
process. Any snapshot can be restored, without "walking" a chain of
deltas from a full backup starting position. In his 1.0 release
announcement, Wirzenius argues that full-plus-incremental backups make
sense for tape drives, where sequential access favors adding deltas
with incremental changes after an initial full backup, but that
hard-disk backups make the incremental delta approach pointless.
But the sneaky part is that under the hood, Obnam's snapshots are all
incremental, at least in the sense that each snapshot only records
changes since the last. The difference is that they are stored in copy-on-write (COW)
b-trees like those Btrfs uses
for filesystems. Any snapshot can be reconstructed from the b-tree,
and individual snapshots can be removed by deleting their node and
re-attaching the sub-trees. To make the COW b-tree approach
space-efficient, it uses pervasive automatic data de-duplication. The
same chunk of data on disk is re-used — both across multiple
files and over multiple snapshot generations. In addition to saving
space by not duplicating files that have not changed between
snapshots, moving or renaming large files does not result in duplicate
copies of the bits. By default, Obnam uses one-megabyte chunks,
although this setting is adjustable in Obnam's configuration file.
Obnam sports other features of practical value, such as built-in GnuPG
encryption, which Wirzenius cited as a weakness in most rsync-based
backup tools. It also works with local disks or over the network,
including NFS, SMB, and SFTP. Wirzenius admits that the latter
protocol is slow, but that SCP (which should be faster) lacks support
for tracking information like file removals, which Obnam depends on.
In network backup setups, Obnam supports both push (client-initiated)
and pull (server-initiated) backup sessions.
Storing and retrieving
Installation requires several of Wirzenius's other code projects,
including his B-tree library larch and
terminal status-update library ttystatus, plus paramiko a third-party SSH2
library. Most are packaged for Debian (Wirzenius packages his own
projects for Debian), but not all of them are available in downstream
derivatives like Ubuntu. He provides an Apt repository for the
necessary packages; instructions and a link to the repository's
signing key are provided on his Obnam tutorial page.
The tutorial goes into further detail about Obnam's data
de-duplication with practical examples. You can create a new backup
with
obnam backup ~/projectfoo
and subsequently back up a parent directory with
obnam backup ~
Rather than re-save the files from
projectfoo, the new
backup will point to the copy already on disk. Each backup created
with Obnam is specific to a directory; you can exclude specific
subdirectories with the
--exclude= flag, but you cannot
backup several directories in a single command.
The tutorial also explains that Obnam automatically saves checkpoints
every 100MB while creating a new backup. This is valuable because the
initial snapshot is always akin to a full backup in other tools, and
can be large enough to introduce failures. Checkpoints are
not guaranteed to preserve the entire data set as are regular
snapshots; they only allow an interrupted backup to resume without
starting over from scratch.
Obnam's basic usage is straightforward; the same
obnam backup ~ command that is used to start a
new backup in the above example is used verbatim to perform the
subsequent snapshots. You store snapshots on a remote repository by
appending --repository=URL, specify a filesystem storage
location with --output=PATH, and specify a GnuPG encryption
key with --encrypt-with=KEYID.
You can restore a directory from a snapshot with
obnam restore --to=/mnt/recovery-volume ~
(which will restore the most recent snapshot of your home directory to
/mnt/recovery-volume). You can optionally restore just a
file or a subdirectory from the snapshot with
obnam restore ~/importantfiles --to=/mnt/recovery-volume ~
You can also specify a specific intermediate snapshot by
appending a
--generation=N flag to the restore command; you
can get a list of the available snapshots by running
obnam generations. The
obnam verify command checks
snapshot data against the files on disk, and
obnam fsck
checks the internal consistency of the b-tree.
Forgetfulness
The only real confusing part of working with Obnam is the snapshot
retention process. You can tell the program to immediately delete
older snapshots by running
obnam forget --keep=7d
(which will keep the most recent seven days' worth of snapshots), or
some variation. The wrinkle is that the
7d attribute will
keep only one backup
per day for those seven days, even if
you run Obnam hourly. To keep seven days' worth of hourly snapshots,
you would need to specify
--keep=168h.
You can set a snapshot retention policy in your configuration file
that uses these rules in combination. You can retain hourly, daily,
weekly, monthly, and yearly snapshots by providing a comma-separated
list. For example, 12h,7d,3m will keep the last 12
hourly snapshots, the last seven daily snapshots, and the last three
monthly snapshots. When the numbers start to converge (such as the
last 48 hourly snapshots and last two daily snapshots) is when the
potential for miscounting sets in; Wirzenius recommends that you try
your retention policy on the command line with the --pretend
option to simulate results before deploying them in the real
world.
In an email, Wirzenius elaborated a bit on those tricky
multi-factor retention policies. Each retention rule (e.g., hour,
day, or month) is examined separately by Obnam, he said, and a
snapshot is kept if it matches any of the rules. So a 48h,2d
policy would match 48 hourly snapshots, then match two additional
daily snapshots, for 50 total.
As of the 1.0 release, there are a few areas that need improvement,
such as managing multiple clients storing snapshots on one repository;
Wirzenius says that further thought is required before implementing a
real "server mode." For example, two or more machines can run Obnam
and push their backups to the same remote repository, and they will be
tagged with the hostname of origin. However, Obnam can also be run from
the repository machine and "pull" backups from the two remote sources, but
in that case each one needs to specify a client name with the
--client-name= flag in order for Obnam to keep their metadata
separate.
In practice, my interest in backup utilities stems largely from how
rarely I make good backups on a regular basis (i.e., paranoia). I may
be atypical in that way, but the primary reasons I have abandoned most
of the backup utilities I have test driven in the past are the overhead
in keeping track of full and incremental backup schedules and the lack
of good tools for rotating old backups out without manual
intervention. Obnam scores on both of those metrics. If you have a
complicated setup with multiple machines, you may find quirks (such as
the client name issue or the speed of SFTP) working against you, but
Wirzenius is still at work on the code — and he seems quite
happy to take bug reports and questions.
(
Log in to post comments)