While Git is aimed at distributed version control for developers, it has also inspired more than a few people to apply Git to backing up all sorts of data. It's been the basis for several backup projects like the outdated eigenclass for general backups, and more specialized hacks to keep track of the etc directory (etckeeper), and a user's home directory (git-home-history). Another noteworthy backup application has popped up recently called bup.
Short for "backup," bup is a fledgling Git-based, or at least Git-inspired, backup solution written in Python and C. The first 0.01 release of bup was announced by Avery Pennarun on January 4, and development has been moving at a pretty good clip since. It is newly licensed under the LGPLv2, and is gathering an active community of developers.
Bup is available via a GitHub repository, and isn't currently packaged for any of the major distributions. The build instructions on the GitHub project page address building on Debian/Ubuntu, though users on Ubuntu 9.10 should substitute python2.6-dev for the development libraries, and make sure to install the python-fuse package to mount bup backups via FUSE.
Users will also want to install the par2 package, which is used by bup's fsck tool to create and read the Par2 format. Par2 allows bup to verify files and to recover damaged files, so if par2 isn't installed, bup's recovery features are not available. When using the bup fsck command, bup creates Par2 files to allow recovery of damaged blocks in the bup index and pack files. Using Par2, bup can recover up to 5% of damaged files. Users who want to test this can use the bup damage command to randomly destroy blocks and then attempt to recover the file using bup fsck.
Pandoc is required to generate bup's documentation, so users who would appreciate man pages and HTML documentation should install the pandoc package as well. Note that bup has no "make install" target at the moment, so the bup documentation and commands need to be moved into the appropriate locations manually.
It's important to understand what bup does and doesn't do. Bup is a back-end tool meant to handle large files (like VM images) and incremental backups quickly with as little space as possible. The focus of development is on speed, taking up less space with backups, error recovery, and not so much on being a front-end for performing backups.
This means that bup is not well-suited yet as a standalone solution for creating and managing backups. It's also without a GUI, so bup is best-suited for users who are comfortable writing their own backup scripts and with at least a passing familiarity with Git usage.
Bup is actually a suite of scripts/commands that manage creating backups, indexing files, listing files in a backup, etc. The data is stored in a Git-formatted repository, but bup writes its own packfiles and indexes — it doesn't use the git command directly, it only uses a few of Git's helper programs. The documentation that comes with bup is actually pretty good for a relatively new project, with a man page for each of the commands. It's a bit short on examples and a user guide would be nice, but given the project has only been around since the beginning of the year, it's hard to find fault with the amount of documentation already available.
To create a new backup, a user can either feed a file to bup's split command or use bup index to create an index of files and then use the bup save command to create a new backup. When using split, bup takes input and breaks it into chunks about 8K in size, saving the resulting files in a bup repository.
That's useful, but doesn't actually automate much. Bup index will create or update a cache of files and directories in the filesystem, along with their hashes, which can be used by bup save to track files that have been updated since the last backup. Then bup save can work from the index to create a repository or update it with the files that have changed. Bup supports local and remote backups, bi-directionally. That is to say, bup allows local backups, backing up your local computer to a remote server, or pulling backups to the local machine from a remote server.
Bup is relatively speedy and does a pretty good job of compressing files using Git's packfile format. Bup particularly shines on incremental backups, because it uses a "rolling checksum" to compare the file chunks and only save the parts that have changed. Files are split and then checked into Git separately, and bup creates a index file that lists the filenames of those chunks (from a SH1 hash of the file) in the order that they're created. The files that match don't need to be re-saved. For more detail on the way bup works, see Pennarun's more detailed post about version control of large files that preceded bup's creation.
It's easier to create backups using bup, at the moment, than actually restoring from backups made with bup. That's not to say it's too challenging to get files, just that the process for restoring files is not as smooth as creating them in the first place. Bup has a save command that can be used to create a backup set, but lacks a restore command. So for the time being, it's best to use bup's split command and use its join utility to retrieve files.
The other problem with trying to use bup save is that it doesn't preserve file data like ownership, links, creation/modification times, etc. The upshot is that files backed up with bup won't have some of the requisite metadata that most users want when restoring from backups.
While bup's incremental backups take up less space than full backups, they still take up space. At the moment, bup has no way to delete older backups or manage the backups in any real way. This means that after a while bup stops being particularly effective at saving space after all.
Users can browse the backups in a number of ways. Bup provides a fuse command for mounting the backups as a directory, and an ftp command for browsing the backups as one would a remote directory via FTP. However, the views do not entirely match up with the actual files. Larger files that have been split are viewed as a top-level directory that has the name of the original file and then sub-directories under that that contain the actual data. Unfortunately, even though it uses Git, bup doesn't actually create a standard Git repository from the backups, so it's not possible to use one of the many GUI tools for Git to browse the backups.
At the moment, bup is relatively primitive but looks to be maturing and gaining interest fairly quickly. The project already has a handful of contributors in addition to Pennarun, and the mailing list seems fairly active for such a new program.
The project doesn't have a roadmap, per se, but discussions on the mailing list indicate that a bup restore command should be a reality soon, as well as handling file metadata so restored files retain their dates, ownership, and so on. While bup isn't yet a full-featured backup system, if the project maintains its current momentum, it should be quite useful by the end of 2010.
Brief itemson schedule, to the day" after the usual six-month development cycle. As the release notes describe, there are lots of new features for users in 2.30, including a default split view mode for Nautilus, background syncing for Tomboy, easier user management, a time-tracker applet, and much more. There are also new features and bug fixes for developers and in the accessibility framework. "To celebrate this release, a GNOME Store has been created. A selection of t-shirts and mugs is available on this store, that is powered by Zazzle: http://www.zazzle.com/gnome". Click below for the full announcement. Version 1.2.0 of the digiKam image editor is out. The biggest change appears to be the multi-threading of many image editing tools and the addition of zoomable preview widgets. Djigzo is a mail transfer agent whose main purpose in life is to encrypt all email in transit. The 1.3.2 release adds some new configuration options, improved virtual appliance, Debian packages, and more. discussing a proposal from Pablo Quirós for a new user interface element to put in the upper right corner of windows which has been recently vacated on Ubuntu systems. The "Esfera" is a large circle which is used to implement a number of gesture-oriented features; details can be found in this PDF file. "Moved in a semicircle from right to left: the window is turned, and the back of it is shown to the user. The back of a window is a new UI concept... The idea is that we have a 'front' side of a window, which is what we normally see, and a 'back' side, which offers some possibilities that there is no space to display in the front side." FusionForge is the project hosting and management system formerly known as GForge, formerly known as SourceForge. The 5.0 release is the result of a determined effort to "upstream" improvements found in various instances. These improvements include rewritten version control integration with support for most version control systems, better security, a multi-host search facility, and more. This month's edition of KDE SC is a bugfix and translation update to KDE SC 4.4. KDE SC 4.4.2 is a recommended upgrade for everyone running KDE SC 4.4.1 or earlier versions. As the release only contains bugfixes and translation updates, it will be a safe and pleasant update for everyone. Users around the world will appreciate that KDE SC 4.4.2 multi-language support is more complete. KDE SC 4 is already translated into more than 50 languages, with more to come." MongoDB is "a scalable, high-performance, open source, dynamic-schema, document-oriented database." The 1.4 release has been announced; it features a number of performance improvements, better replication support, geospatial search support, a number of query language improvements, and more. Firefox 3.0.19 and 3.5.9 (3.0.19 being the final planned 3.0.x update), Thunderbird 3.0.4, and SeaMonkey 2.0.4. Our advice to owners of NVIDIA GPUs running Linux is to use the VESA X driver from the time of Linux distribution installation until they can download and install the NVIDIA Linux driver from their distribution repositories or from nvidia.com." No mention of Nouveau, needless to say. The OpenSSL project team is pleased to announce the release of version 1.0.0 of our open source toolkit for SSL/TLS. This new OpenSSL version is a major release and incorporates many new features as well as major fixes compared to 0.9.8n." The actual changes listed in the announcement are a real alphabet soup ("Streaming ASN1 encode support for PKCS#7 and CMS"), but it's undoubtedly stuff we all really need to have. reported that the mess of PostgreSQL adapters for Python might finally be clearing up. So it's with mixed feelings that we note the recent releases of:
Newsletters and articles
Page editor: Jonathan Corbet
Next page: Announcements>>
Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds