User: Password:
|
|
Subscribe / Log in / New account

Development

Git-based backup with bup

March 31, 2010

This article was contributed by Joe 'Zonker' Brockmeier.

While Git is aimed at distributed version control for developers, it has also inspired more than a few people to apply Git to backing up all sorts of data. It's been the basis for several backup projects like the outdated eigenclass for general backups, and more specialized hacks to keep track of the etc directory (etckeeper), and a user's home directory (git-home-history). Another noteworthy backup application has popped up recently called bup.

Short for "backup," bup is a fledgling Git-based, or at least Git-inspired, backup solution written in Python and C. The first 0.01 release of bup was announced by Avery Pennarun on January 4, and development has been moving at a pretty good clip since. It is newly licensed under the LGPLv2, and is gathering an active community of developers.

Getting bup and its dependencies

Bup is available via a GitHub repository, and isn't currently packaged for any of the major distributions. The build instructions on the GitHub project page address building on Debian/Ubuntu, though users on Ubuntu 9.10 should substitute python2.6-dev for the development libraries, and make sure to install the python-fuse package to mount bup backups via FUSE.

Users will also want to install the par2 package, which is used by bup's fsck tool to create and read the Par2 format. Par2 allows bup to verify files and to recover damaged files, so if par2 isn't installed, bup's recovery features are not available. When using the bup fsck command, bup creates Par2 files to allow recovery of damaged blocks in the bup index and pack files. Using Par2, bup can recover up to 5% of damaged files. Users who want to test this can use the bup damage command to randomly destroy blocks and then attempt to recover the file using bup fsck.

Pandoc is required to generate bup's documentation, so users who would appreciate man pages and HTML documentation should install the pandoc package as well. Note that bup has no "make install" target at the moment, so the bup documentation and commands need to be moved into the appropriate locations manually.

Making backups with bup

It's important to understand what bup does and doesn't do. Bup is a back-end tool meant to handle large files (like VM images) and incremental backups quickly with as little space as possible. The focus of development is on speed, taking up less space with backups, error recovery, and not so much on being a front-end for performing backups.

This means that bup is not well-suited yet as a standalone solution for creating and managing backups. It's also without a GUI, so bup is best-suited for users who are comfortable writing their own backup scripts and with at least a passing familiarity with Git usage.

Bup is actually a suite of scripts/commands that manage creating backups, indexing files, listing files in a backup, etc. The data is stored in a Git-formatted repository, but bup writes its own packfiles and indexes — it doesn't use the git command directly, it only uses a few of Git's helper programs. The documentation that comes with bup is actually pretty good for a relatively new project, with a man page for each of the commands. It's a bit short on examples and a user guide would be nice, but given the project has only been around since the beginning of the year, it's hard to find fault with the amount of documentation already available.

To create a new backup, a user can either feed a file to bup's split command or use bup index to create an index of files and then use the bup save command to create a new backup. When using split, bup takes input and breaks it into chunks about 8K in size, saving the resulting files in a bup repository.

That's useful, but doesn't actually automate much. Bup index will create or update a cache of files and directories in the filesystem, along with their hashes, which can be used by bup save to track files that have been updated since the last backup. Then bup save can work from the index to create a repository or update it with the files that have changed. Bup supports local and remote backups, bi-directionally. That is to say, bup allows local backups, backing up your local computer to a remote server, or pulling backups to the local machine from a remote server.

Bup is relatively speedy and does a pretty good job of compressing files using Git's packfile format. Bup particularly shines on incremental backups, because it uses a "rolling checksum" to compare the file chunks and only save the parts that have changed. Files are split and then checked into Git separately, and bup creates a index file that lists the filenames of those chunks (from a SH1 hash of the file) in the order that they're created. The files that match don't need to be re-saved. For more detail on the way bup works, see Pennarun's more detailed post about version control of large files that preceded bup's creation.

Restoring from bup backups

It's easier to create backups using bup, at the moment, than actually restoring from backups made with bup. That's not to say it's too challenging to get files, just that the process for restoring files is not as smooth as creating them in the first place. Bup has a save command that can be used to create a backup set, but lacks a restore command. So for the time being, it's best to use bup's split command and use its join utility to retrieve files.

The other problem with trying to use bup save is that it doesn't preserve file data like ownership, links, creation/modification times, etc. The upshot is that files backed up with bup won't have some of the requisite metadata that most users want when restoring from backups.

While bup's incremental backups take up less space than full backups, they still take up space. At the moment, bup has no way to delete older backups or manage the backups in any real way. This means that after a while bup stops being particularly effective at saving space after all.

Users can browse the backups in a number of ways. Bup provides a fuse command for mounting the backups as a directory, and an ftp command for browsing the backups as one would a remote directory via FTP. However, the views do not entirely match up with the actual files. Larger files that have been split are viewed as a top-level directory that has the name of the original file and then sub-directories under that that contain the actual data. Unfortunately, even though it uses Git, bup doesn't actually create a standard Git repository from the backups, so it's not possible to use one of the many GUI tools for Git to browse the backups.

At the moment, bup is relatively primitive but looks to be maturing and gaining interest fairly quickly. The project already has a handful of contributors in addition to Pennarun, and the mailing list seems fairly active for such a new program.

The project doesn't have a roadmap, per se, but discussions on the mailing list indicate that a bup restore command should be a reality soon, as well as handling file metadata so restored files retain their dates, ownership, and so on. While bup isn't yet a full-featured backup system, if the project maintains its current momentum, it should be quite useful by the end of 2010.

Comments (13 posted)

Brief items

GNOME 2.30 has been released

GNOME 2.30 has been released, "on schedule, to the day" after the usual six-month development cycle. As the release notes describe, there are lots of new features for users in 2.30, including a default split view mode for Nautilus, background syncing for Tomboy, easier user management, a time-tracker applet, and much more. There are also new features and bug fixes for developers and in the accessibility framework. "To celebrate this release, a GNOME Store has been created. A selection of t-shirts and mugs is available on this store, that is powered by Zazzle: http://www.zazzle.com/gnome". Click below for the full announcement.

Full Story (comments: 7)

digiKam 1.2.0 released

Version 1.2.0 of the digiKam image editor is out. The biggest change appears to be the multi-threading of many image editing tools and the addition of zoomable preview widgets.

Comments (none posted)

Djigzo 1.3.2

Djigzo is a mail transfer agent whose main purpose in life is to encrypt all email in transit. The 1.3.2 release adds some new configuration options, improved virtual appliance, Debian packages, and more.

Comments (none posted)

What to do with the top-right window space: Esfera

[Esfera] The Ubuntu "Ayatana" mailing list is discussing a proposal from Pablo Quirós for a new user interface element to put in the upper right corner of windows which has been recently vacated on Ubuntu systems. The "Esfera" is a large circle which is used to implement a number of gesture-oriented features; details can be found in this PDF file. "Moved in a semicircle from right to left: the window is turned, and the back of it is shown to the user. The back of a window is a new UI concept... The idea is that we have a 'front' side of a window, which is what we normally see, and a 'back' side, which offers some possibilities that there is no space to display in the front side."

Comments (40 posted)

FusionForge 5.0 released

FusionForge is the project hosting and management system formerly known as GForge, formerly known as SourceForge. The 5.0 release is the result of a determined effort to "upstream" improvements found in various instances. These improvements include rewritten version control integration with support for most version control systems, better security, a multi-host search facility, and more.

Full Story (comments: none)

KDE SC 4.4.2 Released

KDE has released a new version of the KDE Software Compilation (KDE SC). "This month's edition of KDE SC is a bugfix and translation update to KDE SC 4.4. KDE SC 4.4.2 is a recommended upgrade for everyone running KDE SC 4.4.1 or earlier versions. As the release only contains bugfixes and translation updates, it will be a safe and pleasant update for everyone. Users around the world will appreciate that KDE SC 4.4.2 multi-language support is more complete. KDE SC 4 is already translated into more than 50 languages, with more to come."

Full Story (comments: 9)

MongoDB 1.4 released

MongoDB is "a scalable, high-performance, open source, dynamic-schema, document-oriented database." The 1.4 release has been announced; it features a number of performance improvements, better replication support, geospatial search support, a number of query language improvements, and more.

Comments (18 posted)

A pile of Mozilla software releases

Mozilla has released new versions of its applications; as usual, they fix a pile of scary security issues. The releases are: Firefox 3.0.19 and 3.5.9 (3.0.19 being the final planned 3.0.x update), Thunderbird 3.0.4, and SeaMonkey 2.0.4.

Comments (1 posted)

NVIDIA deprecates the xf86-video-nv driver

NVIDIA has posted an announcement to the effect that they are no longer interested in working on the "nv" graphics driver. "Our advice to owners of NVIDIA GPUs running Linux is to use the VESA X driver from the time of Linux distribution installation until they can download and install the NVIDIA Linux driver from their distribution repositories or from nvidia.com." No mention of Nouveau, needless to say.

Full Story (comments: 53)

OpenSSL 1.0.0 released

OpenSSL - an encryption toolkit that many of us have been using for years - has finally announced its 1.0 release. " The OpenSSL project team is pleased to announce the release of version 1.0.0 of our open source toolkit for SSL/TLS. This new OpenSSL version is a major release and incorporates many new features as well as major fixes compared to 0.9.8n." The actual changes listed in the announcement are a real alphabet soup ("Streaming ASN1 encode support for PKCS#7 and CMS"), but it's undoubtedly stuff we all really need to have.

Full Story (comments: 53)

Some new Python database adapters

Back in February, LWN reported that the mess of PostgreSQL adapters for Python might finally be clearing up. So it's with mixed feelings that we note the recent releases of:

  • GSQL 0.2.2, an adapter which is focused on "native DBMS access" without using an ODBC layer and "databased objects organised into a tree."

  • ceODBC 2.0, a multi-database ODBC adapter with Python 3 support and a number of DBAPI extensions.

  • py-postgresql 1.0, a pure Python 3 adapter which, happily, has lost its previous name (py_proboscis).

Comments (none posted)

Silva 2.2 released

Silva is a BSD-licensed content management system based on Zope; its advertised features include "versioning, workflow system, integral visual editor, content reuse, sophisticated access control, multi-site management, extensive import/export facilities, fine-grained templating, and hi-res image storage and manipulation." The 2.2 release is now available; enhancements include improved table of contents management, reworked image and link tools, and quite a bit more.

Full Story (comments: none)

Newsletters and articles

Development newsletters

The following newsletters have been received over the last week:

Comments (none posted)

Five questions about building community with Chris Blizzard of Mozilla (opensource.com)

Over at opensource.com, Chris Grams interviews Mozilla's Chris Blizzard about various topics. In five questions, they cover things like how to get involved in a project, what Mozilla's strengths are as a project and community, the role that the project's mission plays, and so on. "If you're interested in helping a project, this is the best thing you can do. Play the part of the generalist, listen a lot, drive change where it's important, and make the biggest difference you can. And always remember: these projects are made up of people, not code, and how you treat others is the most important thing."

Comments (none posted)

Install Multiple 'Bleeding Edge' Firefox Versions in Linux (LXer)

Over at LXer, H. Kwint looks at installing multiple Firefox versions on Linux. He also looks at some features in Firefox 3.7 alpha. "As a last part of my journey into 'bleeding edge' Firefox I wanted to install the 'highly experimental' JaegerMonkey (JM) javascript (js) engine, slated to replace SpiderMonkey in the near future. The story is a bit complex, so here's my short version of it: Firefox comes with a js interpreter called SpiderMonkey - which is slow, and a highly optimizing engine called 'TraceMonkey' which is 'super awesome fast'. However, when something cannot be optimized by 'tracing' by TraceMonkey, Firefox falls back to interpreting using SpiderMonkey, and that's why javascript in Firefox is pretty slow sometimes. The people behind JM hope to solve this by means of 'replacing' the SpiderMonkey interpreter by using Apple Webkits 'Nitro' JIT (Just In Time compiler) instead of interpreting. So, when it makes sense, be 'super awesome fast' by means of tracing, and if not, fall back to 'still really fast' using Nitro."

Comments (none posted)

Walker: GNOME Accessibility Hackfest

Willie Walker has posted a detailed report from the GNOME Accessibility Hackfest. "Some people have suggested that it will be OK if GNOME 3 goes out the door inaccessible, using the analogy that it took GNOME 2 a few releases before it became accessible. I disagree. In my opinion, going out the door inaccessible is a regression and violates the position that accessibility is a core value of GNOME. Having said that, there is a lot of work to do to make GNOME 3 accessible."

Comments (none posted)

Mark Shuttleworth: Less is more. But still less.

In his blog, Mark Shuttleworth writes about removing design elements to reduce clutter on the Ubuntu desktop. "One of the driving mantras for us is 'less is more'. I want us to 'clean up, simplify, streamline, focus' the user experience work that we lead. The idea is to recognize the cost of every bit of chrome, every gradient or animation or line or detail or option or gconf setting. It turns out that all of those extras add some value, but they also add clutter. There's a real cost to them – in attention, in space, in code, in QA. So we're looking for things to strip out, as much (or more) as things to put in."

Comments (31 posted)

Page editor: Jonathan Corbet
Next page: Announcements>>


Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds