While Git is aimed at distributed version control for developers, it has
also inspired more than a few people to apply Git to backing up all sorts
of data. It's been the basis for several backup projects like the outdated
eigenclass
for general backups, and more specialized hacks to keep track of the etc
directory (etckeeper), and a
user's home directory (git-home-history). Another
noteworthy backup application has popped up recently called bup.
Short for "backup," bup is a fledgling Git-based, or at least Git-inspired, backup solution written in Python and C. The first 0.01 release of bup was announced by Avery Pennarun on January 4, and development has been moving at a pretty good clip since. It is newly licensed under the LGPLv2, and is gathering an active community of developers.
Getting bup and its dependencies
Bup is available via a GitHub repository, and isn't currently packaged for any of the major distributions. The build instructions on the GitHub project page address building on Debian/Ubuntu, though users on Ubuntu 9.10 should substitute python2.6-dev for the development libraries, and make sure to install the python-fuse package to mount bup backups via FUSE.
Users will also want to install the par2 package, which is used by bup's
fsck tool to create and read the Par2
format. Par2 allows bup to verify files and to recover damaged files, so
if par2 isn't installed, bup's recovery features are not
available.
When using the bup fsck command, bup creates Par2 files to
allow recovery of damaged blocks in the bup index and pack files.
Using Par2, bup can recover up to 5% of damaged files.
Users who want to test this can use the bup damage command to randomly destroy blocks and then attempt to recover the file using bup fsck.
Pandoc is required
to generate bup's documentation, so users who would appreciate man
pages and HTML documentation should install the pandoc package as
well. Note that bup has no "make install" target at the moment, so the bup
documentation and commands need to be moved into the appropriate locations manually.
Making backups with bup
It's important to understand what bup does and doesn't do. Bup is a back-end tool meant to handle large files (like VM images) and incremental backups quickly with as little space as possible. The focus of development is on speed, taking up less space with backups, error recovery, and not so much on being a front-end for performing backups.
This means that bup is not well-suited yet as a standalone solution for creating and managing backups. It's also without a GUI, so bup is best-suited for users who are comfortable writing their own backup scripts and with at least a passing familiarity with Git usage.
Bup is actually a suite of scripts/commands that manage creating
backups, indexing files, listing files in a backup, etc. The data is stored
in a Git-formatted repository, but bup writes its own packfiles and indexes
— it doesn't use the git command directly, it only uses a few of Git's helper programs. The documentation that comes with bup is actually pretty good for a relatively new project, with a man page for each of the commands. It's a bit short on examples and a user guide would be nice, but given the project has only been around since the beginning of the year, it's hard to find fault with the amount of documentation already available.
To create a new backup, a user can either feed a file to bup's split command or use bup index to create an index of files and then use the bup save command to create a new backup. When using split, bup takes input and breaks it into chunks about 8K in size, saving the resulting files in a bup repository.
That's useful, but doesn't actually automate much. Bup index will create
or update a cache of files and directories in the filesystem, along with their hashes, which can be used by bup save to track files that have been updated since the last backup. Then bup save can work from the index to create a repository or update it with the files that have changed. Bup supports local and remote backups, bi-directionally. That is to say, bup allows local backups, backing up your local computer to a remote server, or pulling backups to the local machine from a remote server.
Bup is relatively speedy and does a pretty good job of compressing files using Git's packfile format. Bup particularly shines on incremental backups, because it uses a "rolling checksum" to compare the file chunks and only save the parts that have changed. Files are split and then checked into Git separately, and bup creates a index file that lists the filenames of those chunks (from a SH1 hash of the file) in the order that they're created. The files that match don't need to be re-saved. For more detail on the way bup works, see Pennarun's more detailed post about version control of large files that preceded bup's creation.
Restoring from bup backups
It's easier to create backups using bup, at the moment, than actually restoring from backups made with bup. That's not to say it's too challenging to get files, just that the process for restoring files is not as smooth as creating them in the first place. Bup has a save command that can be used to create a backup set, but lacks a restore command. So for the time being, it's best to use bup's split command and use its join utility to retrieve files.
The other problem with trying to use bup save is that it doesn't preserve file data like ownership, links, creation/modification times, etc. The upshot is that files backed up with bup won't have some of the requisite metadata that most users want when restoring from backups.
While bup's incremental backups take up less space than full backups, they still take up space. At the moment, bup has no way to delete older backups or manage the backups in any real way. This means that after a while bup stops being particularly effective at saving space after all.
Users can browse the backups in a number of ways. Bup provides a fuse command for mounting the backups as a directory, and an ftp command for browsing the backups as one would a remote directory via FTP. However, the views do not entirely match up with the actual files. Larger files that have been split are viewed as a top-level directory that has the name of the original file and then sub-directories under that that contain the actual data. Unfortunately, even though it uses Git, bup doesn't actually create a standard Git repository from the backups, so it's not possible to use one of the many GUI tools for Git to browse the backups.
At the moment, bup is relatively primitive but looks to be maturing and gaining interest fairly quickly. The project already has a handful of contributors in addition to Pennarun, and the mailing list seems fairly active for such a new program.
The project doesn't have a roadmap, per se, but discussions on the
mailing list indicate that a
bup restore command should be a reality soon, as well as
handling
file metadata so restored files retain their dates, ownership, and so on. While bup isn't yet a full-featured backup system, if the project maintains its current momentum, it should be quite useful by the end of 2010.
(
Log in to post comments)