September 14, 2005
This article was contributed by Jake Edge.
One of the more visible outcomes of the BitKeeper fiasco earlier this year
was the development of
git
to replace the use of BitKeeper for kernel development. A less prominent,
but equally capable alternative began development at roughly the same time.
Matt Mackall started work on
Mercurial just a few
days after git and since that time it has made great strides
as a distributed source code management system. It has matured to the
point where at least one large project, the virtual machine monitor
Xen, is using it to manage their code.
Mercurial, like BitKeeper, git and others is targeted at projects where
the developers are spread out geographically and need to be able to
perform source code management functions without the bottleneck of a
central repository. Matt adopted the design goals that Linus
used
for git (speed, distributed operation, and trustability) and added the
additional constraints that it should be CPU, storage, and bandwidth
efficient. Mercurial is written in Python, with some C extensions for
CPU intensive pieces and is fairly small, weighing in around 7500 lines
of code.
Disk based storage of Mercurial revisions is done using delta compressed
revision logs (revlogs) that are stored with disk access optimization in
mind. The revlogs are stored in a directory structure that mirrors the
structure of the project and filesystems are generally optimized for this
kind of access. Over time, fragmentation of revlogs will occur, but a
tar or copy of the directory will have the side effect of defragmentation.
Other SCMs that use filenames based on the SHA1 hash of the contents (git
for example) tend to require more disk seeking because file locality is
a function of the hash rather than the filename.
Because the revlogs are smaller than keeping each individual revision of a
file as a separate object, Mercurial uses less bandwidth when syncing
repositories as well.
A single command, called 'hg' after the chemical symbol for mercury, is the
command line interface to Mercurial and provides a consistent set of
switches used for various source code management tasks. Users of CVS or
subversion will find it immediately familiar to type commands like 'hg commit'
or 'hg update'. Also, there is the 'hg help' command which gives a quick
overview of the commands available and a summary line for each of the
individual commands.
The framework that Mercurial provides will be familiar to anyone who has used
a distributed SCM. The push/pull style of development where tree maintainers
pull changes from contributor's feature branches and merge them into their
current working tree is the model best supported by Mercurial. Both HTTP and
SSH are supported for network syncing and the hg command itself can be run
as a server to export a repository for pulling via hg and for browsing
via the web.
Various extensions and other tools have been created for Mercurial, or, in
some cases, ported from git. Visualization tools for examining repositories
are available as well as conversion utilities to convert repositories from
other SCM systems. Chris Mason's
Mercurial
Queues extension adds patch management features, similar to
quilt, to hg.
Interoperability with git is clearly a feature desired by Matt and the other
developers. Matt's intent with Mercurial was to create a tool that he
could use for kernel development and since the various official kernel
trees are using git repositories, tools to extract information from git
and into Mercurial have been created. There is a
repository that tracks
Linus' git repository for the 2.6 kernel and there are plans to add a git
export feature to Mercurial.
Mercurial has an active development community, a
wiki with a great deal
of information for new users, and a very responsive
mailing list.
It is a fast, scalable, easy to use, and generally well thought out
system that is being used for kernel and other development. It currently
lacks a few features that developers might want (a way to compare
repositories for example), but the pace of development has been rapid
and these holes are likely to be filled quickly. For anyone who is
thinking about using a distributed SCM, Mercurial is definitely worth
a look.
(
Log in to post comments)