|
|
Subscribe / Log in / New account

Mercurial: an alternative to git

September 14, 2005

This article was contributed by Jake Edge.

One of the more visible outcomes of the BitKeeper fiasco earlier this year was the development of git to replace the use of BitKeeper for kernel development. A less prominent, but equally capable alternative began development at roughly the same time. Matt Mackall started work on Mercurial just a few days after git and since that time it has made great strides as a distributed source code management system. It has matured to the point where at least one large project, the virtual machine monitor Xen, is using it to manage their code.

Mercurial, like BitKeeper, git and others is targeted at projects where the developers are spread out geographically and need to be able to perform source code management functions without the bottleneck of a central repository. Matt adopted the design goals that Linus used for git (speed, distributed operation, and trustability) and added the additional constraints that it should be CPU, storage, and bandwidth efficient. Mercurial is written in Python, with some C extensions for CPU intensive pieces and is fairly small, weighing in around 7500 lines of code.

Disk based storage of Mercurial revisions is done using delta compressed revision logs (revlogs) that are stored with disk access optimization in mind. The revlogs are stored in a directory structure that mirrors the structure of the project and filesystems are generally optimized for this kind of access. Over time, fragmentation of revlogs will occur, but a tar or copy of the directory will have the side effect of defragmentation. Other SCMs that use filenames based on the SHA1 hash of the contents (git for example) tend to require more disk seeking because file locality is a function of the hash rather than the filename. Because the revlogs are smaller than keeping each individual revision of a file as a separate object, Mercurial uses less bandwidth when syncing repositories as well.

A single command, called 'hg' after the chemical symbol for mercury, is the command line interface to Mercurial and provides a consistent set of switches used for various source code management tasks. Users of CVS or subversion will find it immediately familiar to type commands like 'hg commit' or 'hg update'. Also, there is the 'hg help' command which gives a quick overview of the commands available and a summary line for each of the individual commands.

The framework that Mercurial provides will be familiar to anyone who has used a distributed SCM. The push/pull style of development where tree maintainers pull changes from contributor's feature branches and merge them into their current working tree is the model best supported by Mercurial. Both HTTP and SSH are supported for network syncing and the hg command itself can be run as a server to export a repository for pulling via hg and for browsing via the web.

Various extensions and other tools have been created for Mercurial, or, in some cases, ported from git. Visualization tools for examining repositories are available as well as conversion utilities to convert repositories from other SCM systems. Chris Mason's Mercurial Queues extension adds patch management features, similar to quilt, to hg.

Interoperability with git is clearly a feature desired by Matt and the other developers. Matt's intent with Mercurial was to create a tool that he could use for kernel development and since the various official kernel trees are using git repositories, tools to extract information from git and into Mercurial have been created. There is a repository that tracks Linus' git repository for the 2.6 kernel and there are plans to add a git export feature to Mercurial.

Mercurial has an active development community, a wiki with a great deal of information for new users, and a very responsive mailing list. It is a fast, scalable, easy to use, and generally well thought out system that is being used for kernel and other development. It currently lacks a few features that developers might want (a way to compare repositories for example), but the pace of development has been rapid and these holes are likely to be filled quickly. For anyone who is thinking about using a distributed SCM, Mercurial is definitely worth a look.


Index entries for this article
GuestArticlesEdge, Jake


to post comments

Mercurial: an alternative to git

Posted Sep 15, 2005 2:39 UTC (Thu) by dlang (guest, #313) [Link] (3 responses)

Matt's posts about the bandwidth comparisons prompted the git team to develop the pack object. from what I have seen about it's internals I wouldn't be surprised if the git pack object ends up being smaller then Mercurial in many cases (and is very close in the other cases)

the pack object also goes a long way to address the seek problem that's described.

this should be looked at more closely if either of these factors are significant to you.

Mercurial: an alternative to git

Posted Sep 15, 2005 14:43 UTC (Thu) by bos (guest, #6154) [Link] (2 responses)

Caveat lector: I work on Mercurial a lot.

Yes, git pack objects are often smaller than Mercurial's storage uses.

The reason is that, as I recall, git packs data for multiple files into a single pack object, which Mercurial does not do.

I believe, from hearsay, that Mercurial's approach has some practical advantages over git's pack files, but as I don't know enough about how git works in this specific area, I won't bother trying to make stuff up :-)

Mercurial: an alternative to git

Posted Sep 15, 2005 21:53 UTC (Thu) by dlang (guest, #313) [Link] (1 responses)

yes git packs multiple files into one object.

it's hard to say which is better, each has their advantage

the git method keeps everything in the pack self-contained so that you don't have to worry about the file becomeing worthless becouse the file it is a diff of gets removed.

while the Mercurial method does everything transparently so the user doesn't have to do anything about it.

the git network interface supports effectivly createing a custom pack file for the user and then downloading it, which is better for network bandwidth, at the cost of a little more CPU on the server.

one thing that hasn't been done with git yet (but is being looked at) is the possibility of a pack to include 'unrelated' things. An example of this would be to have pack files that span distros (bash is very close to being the same on all distros for example), which for large archives could have some interesting implications.

Caveat lector: I've been reading the git list since it started, what I know of mercurial is what's been posted there.

Mercurial: an alternative to git

Posted Sep 16, 2005 2:56 UTC (Fri) by bos (guest, #6154) [Link]

Mercurial supports the equivalent of custom pack files, called bundles.

Mercurial: an alternative to git

Posted Sep 15, 2005 13:13 UTC (Thu) by arcticwolf (guest, #8341) [Link] (9 responses)

Interesting article, but I wish there'd been more comparison of git's and mercurial's actual features, not to mention a list of "cons" (which I have no doubt exist, too).

Right now, it sounds like mercurial is the best thing since sliced bread and everyone should convert to it immediately since it's superior to every other SCM out there; a more critical look would've been nice.

Mercurial: an alternative to git

Posted Sep 15, 2005 14:39 UTC (Thu) by bos (guest, #6154) [Link]

Caveat lector: I work on Mercurial a lot.

The sets of features are quite similar, particularly with core features. There are a few differences in things you can do with non-core commands, but nothing terribly significant.

The bigger difference between the two is in how the user interfaces "feel". This doesn't affect the functionality of either system, but I think that Mercurial has a more complete, consistent command line UI than git, and certainly one that's more familiar to CVS or SVN users.

Mercurial: an alternative to git

Posted Sep 15, 2005 19:29 UTC (Thu) by rickmoen (subscriber, #6943) [Link]

IMVAO, git's pack format and repack command are more than a bit kludgey and inelegant. Why should SCM users have to fiddle with such things? Mecurial sidesteps that mess entirely, by being built on deltas in the first place.

Also, because Mercurial runs anywhere Python does, it has an inherent portability advantage over git/Cogito (Bourne shell and other Unixisms). That design limitation of git/Cogito doesn't bother me, but does limit deployment for others.

Mercurial's progress in a very short time has been impressive: I'm keeping an eye on that and Bazaar 2/bzr (ne Bazaar-NG) as the best hopes. Others will very defensibly favour darcs, Codeville, Monotone, ArX.

Rick Moen
rick@linuxmafia.com

... and the others?

Posted Sep 16, 2005 8:11 UTC (Fri) by ncm (guest, #165) [Link] (6 responses)

Several other projects with similar goals started well before the Bitkeeper fiasco came to a head, among them Arch/Bazaar, Monotone, Codeville, and Darcs, and ought to be a lot more mature than Git or Mercurial. Monotone, in particular, got a lot of attention about the time Git and Hg started, mainly for being taken seriously but found to be too slow. I know it has since been sped up enormously, and got lots of important user-interface features.

Have the earlier projects turned out to be prototypes for a more modern, radically simpler generation of production-ready systems? Or should we consider them on even footing with Hg, expecting the latter to complexify to match as it matures and gains important features? Or, are Git and Hg interesting mainly to kernel maintainers, while those of us with more typical needs will be better off with one of the more mature products?

It seems clear there are too many of these projects, and some will stall as everyone's respective itches get scratched. How many should we expect will still be vigorous in, say, three years' time, after adopting the important features of the others and winning over current CVS holdouts? I.e., how many ecological niches are there, really?

... and the others?

Posted Sep 17, 2005 1:35 UTC (Sat) by rickmoen (subscriber, #6943) [Link]

Nathan:

Monotone: On the plus side, it's in a halfway reasonable language (C++, with "hooks" in Lua), and the gripe about performance seems unfair, as that was a temporary glitch that is long gone. On the minus side, when last I heard, it was still considered a bit beta-ish; requires a dedicated server component rather than being reachable over commodity http; and identifies changesets by their SHA1 checksums, which is a bit cluttered.

GNU Arch ("tla"): Seemingly moribund, given first the departure or many developers to Bazaar 1.x ("baz"), and then more recently the resignation of Tom Lord. Tom prototyped his own GNU Arch 2.0 redesign, dubbed "revc" (to fix some of tla's more hideous misfeatures, IMVAO), but nobody's yet adopted it in the wake of Tom's departure. Maybe someone from the tla-user community will comment, but all I'm seeing is unhappiness and a slow exodus to elsewhere (especially baz/bzr and ArX), among this camp.

Bazaar 1.x ("baz"): Recently back-burnered by Canonical in favour of Martin Pool's more-ambitious Bazaar 2 "bzr" (formerly Bazaar-NG) project as the intended successor. ("baz" has been declared to be in "maintenance mode", as opposed to being actively developed.)

Codeville: Close to mature, but still has a to-do list. E.g., last I heard, non-ASCII files and some file metadata still weren't handled. Last I heard, didn't have much docs. Intriguing, advanced merge algorithm that should be studied more widely. Uses SRP as network transport; I'm going to have to read Bram and Ross's reasons before I decide what I think about that. (Cannot be accessed over commodity http, anyway.)

darcs: Drawbacks: How many people can hack Haskell? Stores repository metadata within the checked-out working area. Nagging performance problems (sometimes). Advantages: Good all-around system. Tracks inter-patch dependencies well. Patches from others can retain their separate identity even after integration (are not collapsed/rolled up). Implementation of "cherry-picking" is exemplary.

It's difficult to judge how broad the appeal is, of most of these things: However, I do know that Hg is in commercial, production usage at one non-kernel software house of my acquaintance (Xenworks).

Anyhow, I try to keep up, as best I can. (If I'm judged misinformed and in desperate need of reeducation, I wouldn't be the least bit surprised. It might even be true.)

Rick Moen
rick@linuxmafia.com

... and the others?

Posted Sep 17, 2005 3:30 UTC (Sat) by kevinbsmith (guest, #4778) [Link] (4 responses)

I think Rick Moen's post is a pretty fair summary of many of the other systems.

In my mind, the big distinction between the first generation and the second is simplicity. Simplicity of UI, and simplicity of the underlying model. Gnu arch (tla) was extremely large and complex. ArX and Baz forked largely to simplify it, but they are both still somewhat on the heavy side. Monotone and codeville have simpler UI's, but they require a server daemon which makes them feel non-minimal.

Git (and cogito) popularized the "lightweight" tool, helping increase interest in mercurial and bzr. Of course, darcs had a great (simple) UI long before it became cool to have one, and it remains the most mature of the lightweight options.

There is a revctrl list/wiki/irc for mostly-technical cross-system SCM discussions:
http://revctrl.org/
A recent thread discussed which of these apps are likely to survive, but it mostly just emphasized the uncertainty.

Personally, I'm happy to see all these projects advancing so rapidly. None of them quite have all the features I need yet, but I'm optimistic that at least one will within the next few months. I doubt we will see a single dominant distributed SCM tool emerge for at least a year or two, due to different projects having distinctly different requirements.

... and the others?

Posted Sep 22, 2005 18:35 UTC (Thu) by bos (guest, #6154) [Link] (2 responses)

Kevin, if you need features, just ask :-)

Missing features

Posted Sep 23, 2005 12:54 UTC (Fri) by kevinbsmith (guest, #4778) [Link] (1 responses)

Actually, I have asked. I have been on the ArX, mercurial, and bazaar-ng lists for quite a while now. I was on the git/cogito list until it became clear to me that nobody was working on a user-friendly, command-line front end (such as cogito) that would be cross-platform.

One feature request for all systems is to have an eclipse plugin. As far as I know, only darcs has one, and it is still very preliminary.

I have requested that ArX get a simpler UI. The maintainer agrees that's a good idea, but it will take some serious reworking to achieve.

I have requested that mercurial support cheap branching on systems that don't have hardlinks. My three posts to the mercurial list asking about the feasibility of adding this feature have all, surprisingly, gone unanswered.

The bazaar2 folks have explained their plans to solve that cheap branching problem by adding "centralized storage". I believe they even have some prototypes working, but it looks like it's still a couple months away from being an official part of the product.

There is an experimental monotone add-on that supposedly allows you to serve a readonly repo on a cheap (http-only) web server. If that becomes mature, it might make monotone workable for me. Something similar could presumably be written for codeville.

Missing features

Posted Sep 23, 2005 16:28 UTC (Fri) by bos (guest, #6154) [Link]

We'd all love to have Eclipse plugins, I'm sure, but the unfortunate fact is that the current user communities of the various tools have not contributed any. Whether this is due to lack of interest, time, or experience I cannot say.

If you wanted to write one yourself, I am sure it would be very welcome.

... and the others?

Posted Sep 22, 2005 18:51 UTC (Thu) by Omnifarious (guest, #19508) [Link]

I've looked briefly at these other systems, and all of them seemed too complex to be worth using. The only one I don't have any experience with is Monotone.

In contrast, Mercurial was simple to set up and easy to use. I'm a Subversion user from the early days of Subversion, and it was much easier than Subversion to set up for the first time.

Then, as I started playing more with it, it became quite obvious and clear how I could solve the "I have a work machine, a home machine, and a laptop, and I work on all of them and don't always remember to sync." problem. After that, I was hooked.

None of the other systems I've used have even come close to the external elegance and simplicity of Mercurial. And as I look deeper into its design, it's clear that it's external coherency is a reflection of a set of well-thought-out design principles. So, I guess I'm a convert and can be put into the "It's the greatest thing since sliced bread!" category.

I can understand caution though. It's quite possible there is some fundamental design problem that I'll encounter after I understand it enough. I felt similarly about Java in the mid 90s, and it took me a few years to realize what was wrong with it.

Mercurial: an alternative to git

Posted Sep 15, 2005 13:21 UTC (Thu) by job (guest, #670) [Link]

This looks very interesting, but wrapping the whole distributed-VC around my head is a bit too much for a thursday afternoon. Perhaps it would be a good idea for a future LWN article to cover the basics of how to work distributed and why it is different from centralized VCS. Mercurial looks like a very good starting point.

Mercurial: an alternative to git

Posted Sep 15, 2005 14:54 UTC (Thu) by bos (guest, #6154) [Link] (1 responses)

A small addendum to Jake's review: Mercurial has commands called incoming and outgoing to see what changes are available to propagate from one repository to another.

These aren't complete replacements for a "give me one diff that shows how this repository differs from that one" kind of command, but they do give you roughly the same information.

Mercurial: an alternative to git

Posted Oct 3, 2005 12:10 UTC (Mon) by arafel (subscriber, #18557) [Link]

While this is true, there are limitations. :-)
paul@nova:~/src/mutt$ hg incoming
abort: incoming doesn't work for remote repositories yet

Use cases?

Posted Sep 16, 2005 18:51 UTC (Fri) by erich (guest, #7127) [Link]

Yeah, I'd like to see an article on different use cases for SCM (including /etc management) and explain the differences on these use cases.
E.g. how to do feature branches in them, how to merge them etc.
One of the things I am concerned with distributed SCM is that well, you don't have a central repository. ;-)
I.e. you don't have a storge where you could possibly see all revisions of a file. That you can backup, where you a directory for all versions...
In a FLOSS environment this is unrealistic, but in a corporate?

wish: better access control

Posted Sep 16, 2005 22:24 UTC (Fri) by dann (guest, #11621) [Link] (8 responses)

One thing that seems to be missing in all the new VCS is access control.
(Well, openCM has it but it does not seem to be developed anymore).

I would be nice to be able to specify fine grain permissions to a VCS.

Here are some example use cases:
- some random people decide to work together on a project, it would be nice to be able to access the server without having to create a UNIX account, set up a webserver...
- restricting access to some files/directories - for example people doing just translations don't need to have access to the code (just to minimize accidents).
- restricting access to a branch - for security related branches, when a problem is discovered and before it is fixed, just a few select people need to know about it
- enforce a freeze before a release: the release manager can just make the branch read-only

wish: better access control

Posted Sep 17, 2005 23:03 UTC (Sat) by kevinbsmith (guest, #4778) [Link] (5 responses)

Distributed VCS automatically gives developers the ability to work effectively without accounts on a central server. That means that contributors can work on whatever they want, with no risk of damaging the official tree.

In several distributed VCS systems, a branch is a directory. In that case, it's pretty easy to control access, including marking the whole branch as readonly.

Distributed VCS is really a whole new paradigm, and it takes a while for most people to even start to understand how to use this new tool effectively. It's not appropriate for all projects, but personally I think it is a big improvement for most FLOSS projects (where potential contributors may not be trusted yet), and for many/most projects of any kind where the developers are geographically distributed.

wish: better access control

Posted Sep 18, 2005 6:23 UTC (Sun) by dann (guest, #11621) [Link] (4 responses)

The fact is that when multiple people work on the same project their work
must be somehow merged together in a single place. If more than one person
has write access to that place, then it would be nice to have a way to control access.

wish: better access control

Posted Sep 18, 2005 14:26 UTC (Sun) by man_ls (guest, #15091) [Link] (2 responses)

I think the way to go in that case would be to have a "team leader" that picks the relevant changes from everyone and applies them to her own tree. So no need to have access control there. But maybe you are thinking of a different scenario.

wish: better access control

Posted Sep 18, 2005 15:40 UTC (Sun) by dann (guest, #11621) [Link] (1 responses)

Having a single leader that applies all the changes is simply NOT the way
a lot of (probably most) software is developed. Allowing more that one person to make changes is very important.

wish: better access control

Posted Sep 18, 2005 16:40 UTC (Sun) by man_ls (guest, #15091) [Link]

It is not how software is developed nowadays, but simply because we use the old paradigm of "single repository -- multiple branches". If we switched to a new distributed paradigm, where a developer publishes her changes and a leader imports them in order to make a version, these new tools would become indispensable. But don't you think that applying the old way of thinking to the new source management model would just add needless complexity?

wish: better access control

Posted Sep 22, 2005 18:39 UTC (Thu) by bos (guest, #6154) [Link]

Merging work and write access are orthogonal concepts.

Here's how merging works in a distributed SCM.

You publish your changes somewhere, and tell me. I pull them over, and merge my changes in. I publish the merged result, and tell you. You pull the results of the merge.

Now we both have your changes and mine, but at no point did either of us have write access to the other's storage.

Another way of approaching the issue: we both have write access to a shared server. However, in many systems, we can't push changes to the server without merging first, so the server cannot get into a messy unmerged state.

wish: better access control

Posted Sep 21, 2005 11:08 UTC (Wed) by droundy (subscriber, #4559) [Link] (1 responses)

I consider this an advantage of the new VCS (looking from a different perspective). By avoiding
integration of "user accounts" into the VCS itself, you can choose between any of the existing
systems that you already use for user authentication---unix groups, sudo, ssh public key
authentication, gpg signatures. This doesn't give the fine-grained control that you'd like, but I
don't really see a pressing need for that.

David

wish: better access control

Posted Sep 22, 2005 18:59 UTC (Thu) by Omnifarious (guest, #19508) [Link]

I've come to the conclusion that in a distributed SCM, fine grained access control and permissions management shouldn't be a design goal. There are better and easier ways of achieving the same results with a distributed SCM.

What should be a design goal is clear ownership of a patch or changeset, and that can easily be accomplished in most such systems with digital signatures.

Mercurial: an alternative to git

Posted Sep 30, 2005 23:48 UTC (Fri) by rickmoen (subscriber, #6943) [Link]

I just wanted to revisit this thread to note Bryan O'Sullivan's comment just posted to the Mercurial mailing list (http://www.selenic.com/pipermail/mercurial/2005-September/004745.html):

As I mentioned the other day, I will not be contributing to Mercurial development for a while. Several people have asked me why.

At my workplace, we use a commercial SCM tool called BitKeeper to manage a number of source trees. Last week, Larry McVoy (the CEO of BitMover, which produces BitKeeper) contacted my company's management.

Larry expressed concern that I might be moving BitKeeper technology into Mercurial. In a phone conversation that followed, I told Larry that of course I hadn't done so.

However, Larry conveyed his very legitimate worry that a fast, stable open source project such as Mercurial poses a threat to his business, and that he considered it "unacceptable" that an employee of a customer should work on a free project that he sees as competing.

To avoid any possible perception of conflict, I have volunteered to Larry that as long as I continue to use the commercial version of BitKeeper, I will not contribute to the development of Mercurial.

As such, Mercurial can stand entirely on its own merits in comparison to BitKeeper. This, I am sure, is a situation that we would all prefer.

The implications for commercial customers' relationship with BitMover are left as an exercise for the reader.

Rick Moen
rick@linuxmafia.com


Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds