LWN.net Logo

Looking Past CVS: The Future is Distributed

November 30, 2004

This article was contributed by Mark Stosberg

The field of alternatives to CVS has exploded. Alternatives have been documented and compared, but the trends deserve further analysis.

It's truly a critical moment, as the winds of change are shifting over the landscape of source control. Major projects, such as PostgreSQL, KDE, and emacs are discussing the dumping of CVS for an alternative. Smaller Projects, such as wxRuby and Rhythmbox have already switched.

A Source Control Management (SCM) system is important because this software choice impacts a whole group of developers, and changing systems can be very disruptive to a project. The larger the project, the greater the inertia, and higher the cost to switch.

Here's my analysis of trends that will emerge:

What won't happen: No "CVS replacement" will emerge, at least not with the dominance that CVS has had. Instead, what we will see will follow the patterns of the expanding offerings of scripting languages. Although Perl has long dominated this category, other languages have dared to challenge the heavyweight, and they have prospered. Python, PHP, and ruby are all doing well, with growing communities building up around them.

Don't expect to see one clear SCM leader, with the rest hopelessly out of sight in terms of popularity and usability. Many sufficiently capable alternatives are emerging. The diverse environment we will see will play a part in determining which projects stand out. Those projects that grasp the importance of playing well with other SCMs will see increased popularity.

The young svk project seems to understand this issue. They integrate with VCP, a framework designed for interchanging formats of various SCMs. Svk is being designed so that at maturity, you will be able to use it as a client for several other SCMs.

Consider the following situation for a typical open-source programmer: The programmer would like to contribute to one project that uses CVS, another which uses Subversion, and a third which uses Arch. Rather than learning all three, she can use svk, reduce her overhead time, and improve her overall efficiency. Currently, svk can mirror a CVS archive, but not perform a "commit through" on your changes.

As people contribute to this 'glue' project, it will be easier for participating SCMs to update their own offerings to allow better interoperability.

One important trend is the removal the "single central server" limitation of CVS. New distributed systems allow developers to share changes in a peer-to-peer mode without going through a central server. This feature will gain prominence for two reasons. Most importantly, the centralized model is a subset of what a distributed system can do. So users don't have to pick an "either or" solution. Also, a distributed design maps extremely well onto the organic global network of open source software development.

Developers who do not have "commit access" benefit from distributed systems because they are given a much expanded toolkit, giving them access to the same command set that the core developers have. With better tools for more developers, more time can be spent writing code instead of managing it.

Distributed SCMs should be equally beneficial to corporations, with their increasingly distributed structures. More activity can happen locally to the developers, making a fast link to a distant central server less critical for developer productivity.

I have followed two distributed SCMs in particular, Arch and Darcs. Arch currently has a larger user base, and arch repositories exist for popular projects such as the Emacs and Vim editors. Arch is also noticeably more complex to set up and use.

Darcs, which just turned 1.0, shines because of its ease of use, clear documentation, and powerful underlying unique "theory of patches". Svk is working on emulating the Darcs interface, while Arch would like to support the Darcs patch handling features.

It's not all roses for Darcs, though. While it receives praise for use on small projects, it is known to hang for hours on large trees like the Linux kernel as well as when large scale conflicts occur.

Colin Walters, an Arch hacker, shares my vision of a distributed future. He concluded recently: "The contender for the future of free software revision control is still very much up in the air..

This much is clear: If you are still using CVS, it's time to evaluate the alternatives, and think distributed.


(Log in to post comments)

Disappointing article by LWN standards

Posted Dec 2, 2004 4:32 UTC (Thu) by louie (subscriber, #3285) [Link]

This is, frankly, the least meaty article I've ever read on LWN. There would be real value in a 'Grumpy Editor' style revision control article, which is what I expected here, but it just came across as a poorly researched, poorly written opinion piece. The blog post by Colin Walters linked in the article ('The State Of Free Software Revision Control') is briefer and yet more informative on the big picture of revision control in general, and his 'The Future is Distributed' (also linked) is again more concise and yet more explanatory on the subject of distributed repositories and why that is so important. Go read those if you want to have a good grip on the state of things.

[Havoc Pennington also has some interesting thoughts on the subject, well worth reading if you're interested in the bigger revision control picture and some interesting thinking about 'where next.']

[Oh, and someone drew up a great chart comparing features of the big systems, but I can't seem to find it right now.]

monotone, too

Posted Dec 2, 2004 5:14 UTC (Thu) by ncm (subscriber, #165) [Link]

Let us not neglect Graydon Hoare's admirable Monotone.

Also, don't miss Zooko's list of alternatives, or the more inclusive Linuxmafia list.

Looking Past CVS: The Future is Distributed

Posted Dec 2, 2004 8:59 UTC (Thu) by ekj (subscriber, #1524) [Link]

There used to be CVS -- now there's a bunch of competitors, most of which are distributed, all of which have different feature-sets. In the future these projects will compete.

This article doesn't really tell anyone anything they didn't already know. Atleast I read it, and emerged with no new information aswell as no clear idea what the author is trying to say.

Looking Past CVS: The Future is Distributed

Posted Dec 2, 2004 10:41 UTC (Thu) by njd27 (subscriber, #5770) [Link]

This is an interesting article - the comments contains links that are far more interesting than the article itself.

Perhaps what we need is WikiLWN, so we can migrate the improvements into the article itself?

Looking Past CVS: The Future is Distributed

Posted Dec 2, 2004 12:46 UTC (Thu) by zezaz (guest, #5465) [Link]

I liked this article.
- it is synthetic (so if you already know this topic, you learn nothing, but it is the way it should be),
- it lets you go deeper with good links.

I enjoy more technical articles sometimes too, but there are some in lwn. And, if 2 or 3 SCM emerge, i am sure that lwn will cover them more deeply.

Keep doing the good work!

Looking Past CVS: The Future is Distributed

Posted Dec 2, 2004 13:07 UTC (Thu) by aleXXX (subscriber, #2742) [Link]

Well, why should the future of SCM's be distributed ?
I know more or less nothing about distributed SCM's, but I think it is a
good idea to have one central server where the current official version
of the sources lives. If you need the current version, get it from this
server. Which advantages do distributed systems offer ?

One thing about cvs: it has some problems, these problems are know. It's
damn easy to set up, run it's daemon, and all files exist as plain text
files on the server. If something goes wrong, you can just go and delete/
move/rename/whatever the files on the server directly.
That's currently my main "fear" of svn: I read about the installation,
and found something about apache, LDAP (or was it WebDAV ?, I'm no expert
in these topics) and that the sources are kept in a database.
So if I would switch to svn, I would switch to a SCM which does a lot of
things I don't understand and where I can't "debug" stuff. So I'll stay
with cvs for the near future.

Alex

Fear of the future

Posted Dec 2, 2004 13:35 UTC (Thu) by ncm (subscriber, #165) [Link]

The cure for fear is learning.

We have a lot of experience with CVS, and know what problems it has, and some people have good ideas of how to fix them. A distributed system is easy to make work centralized, but the reverse has not been demonstrated. CVS's flat files are reassuring, but Arch's are no less so. What's different is that if you mv a CVS file you have made it impossible to reconstitute, or branch, old versions.

Monotone puts its stuff in a database file, but that's easily extracted as a text file and operated upon. It doesn't depend on setting up http servers or anything. Subversion puts its in a bdb file, though, which makes me a bit queasy too. It can use a DAV server attached to Apache, but it doesn't need it.

Certainly these are not mature systems. There are lots of things to tune or add. The only way to know what they need is to use them, and discover it, and then code it. Some may take the wrong path, but at least one will emerge as industrially ready. They will keep one another "honest", because anybody who cares enough can compare them, and the laggard will either stagnate or adapt. CVS may be familiar, but it's also practically impervious to improvements. The code has just got too crufty, and the design is not compatible with improvements; it stagnated long ago.

Someday soon you'll be as comfortable with one of the new ones as with CVS, and the thought of going back to CVS will make you queasy instead.

Looking Past CVS: The Future is Distributed

Posted Dec 2, 2004 16:37 UTC (Thu) by gdt (guest, #6284) [Link]

Well, why should the future of SCM's be distributed?

Because it's an easy feature to add. The fundamental change isn't the move from centralised to distributed source management, but the change in emphasis from files to patchsets. Once you've got a patchset-oriented configuration control system then you get the distributed feature for near-free.

And the future of CCS's is patchsets, it's simply a better paradigm.

...a good idea to have one central server where the current official version of the sources lives

And a distributed CCS gives you that. But what you also get is the ability to trivially take a copy of that "official" source and modify it. When you are done you can provide a set of changes back to the official source, a set of changes which the official maintainer finds simple to integrate. And it's this trick that CVS doesn't do well. As a small experiment, take a copy of a CVS tree of a big project, spend a month working on a change, and see how hard it is to patch back to the official-but-improved CVS.

I really have trouble with your notion that CVS is easy to set up. I've found it near-impossible to securely configure an anonymous CVS which allows a few maintainers to update the repository.

In a large company SVN leveraging off Apache is a good thing. Sysadmins already have authentication and authorisation configured for Apache and extending this to SVN is trivial. So you get to use your "real" userid and password. And it runs over HTTPS no there's no fiddling about with SSH tunnels and craziness.

The opaquness of the SVN repository is a practical problem, especially if you want to invisibly repair a stuff up (such as creating a directory tree in the wrong place, something SVN's syntax makes easy to do). Once you've got a big repository the only choice is to pretend the error was a project change. I imagine you can check out a revision of /branch rather than /${project}/branch out of most SVN repos.

To my mind the problem with the new batch of systems is that they don't play nicely together. When CVS was the only free game in town, all the interesting tools which add value to the configuration control system worked with CVS. But now there's no single interface for tool writers to target. The IETFly-correct might target DeltaV, but there's no real support for that protocol outside of SVN.

Configuration control brings some basic programmer productivity gains. But to improve process productivity we need tools which work on top of the CCS. Stuff that answers questions like "which module costs the most to maintain", "how long did it take to make change request Blah", "who is responsible for the change that broke the nightly regression tests". At the moment there's no viable target for authors of those interesting tools to aim at, and there seems little interest by the authors of this generation of CCS in interoperation or standardised interfaces.

Looking Past CVS: The Future is Distributed

Posted Dec 9, 2004 22:23 UTC (Thu) by anton (guest, #25547) [Link]

I really have trouble with your notion that CVS is easy to set up. I've found it near-impossible to securely configure an anonymous CVS which allows a few maintainers to update the repository.

If you find that hard, don't do it. Give accounts to your maintainers, and let them use it with ssh.

In a large company SVN leveraging off Apache is a good thing. Sysadmins already have authentication and authorisation configured for Apache and extending this to SVN is trivial. So you get to use your "real" userid and password. And it runs over HTTPS no there's no fiddling about with SSH tunnels and craziness.

It runs over what? Our sysadmin wouldn't know how to make Apache authentication use our "real" userid, and I don't know it, either.

We have sshd running on every server, and Apache on exactly one, and that's a pretty crufty machine, where Subversion is guaranteed not to install (it's even too picky for my Fedore Core 1 AMD64 box, and my bug report about that to users@subversion.tigris.org vanished in the void).

In contrast, using CVS over ssh is easy and trouble-free, and I don't even have to use a password (thanks to .ssh/authorized_keys). No tunneling necessary, just set

export CVS_RSH=ssh
and use a command like the following for checkout
cvs -d :ext:user@host:root-path checkout directory
Any chance that svn will ever be as convenient and simple?

Looking Past CVS: The Future is Distributed

Posted Jan 24, 2005 19:11 UTC (Mon) by jhohm (guest, #7225) [Link]

export CVS_RSH=ssh
cvs -d :ext:user@host:root-path checkout directory
Any chance that svn will ever be as convenient and simple?

How about this:

svn checkout svn+ssh://user@host/root-path directory
I'd say that's just as convenient and simple.

Looking Past CVS: The Future is Distributed

Posted Dec 2, 2004 17:13 UTC (Thu) by Wummel (subscriber, #7591) [Link]

Regarding your fear of putting the data in a database: since version 1.1 subversion has a plain file backend.

Looking Past CVS: The Future is Distributed

Posted Dec 2, 2004 17:55 UTC (Thu) by vmole (guest, #111) [Link]

But the files are hardly transparent. Which isn't to imply that the FSFS backend isn't a useful thing, but it's not equivalent to the CVS flatfiles.

Looking Past CVS: The Future is Distributed

Posted Dec 2, 2004 22:38 UTC (Thu) by iabervon (subscriber, #722) [Link]

IMHO, the biggest advantage of a distributed SCM is the ability to version control things that don't work yet. If you're working on a big project with a lot of people, and you need to make a change which will take three days and not build successfully while you're working on it, and you're using CVS, you can't check in each of the steps or check in each day's work without breaking the tree for everyone else. This means that you don't end up with a good commit message (set), because you've got changes in there that you haven't thought about for days, and you have to make sure you don't mess things up. If you accidentally delete something in the evening that you wrote in the morning (assuming overnight backups), you have to redo it.

With a distributed SCM, you can check in each logical change to your own subrepository, and then check your complete set into the official repository when it works. A distributed SCM doesn't mean that there isn't a central server; it means that there isn't *only* a central server.

Looking Past CVS: The Future is Distributed

Posted Jun 26, 2005 16:34 UTC (Sun) by HR (guest, #30682) [Link]

=====
and you're using CVS, you can't check in each of the steps or check in each day's work without breaking the tree for everyone else
=====

The way we handle this situation in CVS (committing changes that span days to complete) is to create a private branch for the intermediate work. When you're done you just merge that branch or, if it doesn't work out, you can simply abandon the branch. Either way you can source control your work along the way.

wxruby correction, and a plug for ArX

Posted Dec 2, 2004 20:08 UTC (Thu) by kevinbsmith (guest, #4778) [Link]

As the author of the wxruby posting linked from the article, I have to share the bad news that wxruby no longer uses darcs. Not because of any specific problems with darcs, but because of resistance to using an unproven tool (and one that didn't yet work well under MS Windows). The project went back to CVS, although I personally remain committed to distributed version control.

Since then, I have found a few things about darcs that I don't like so much, which has led me to explore other systems. Currently I am experimenting with ArX (http://savannah.nongnu.org/projects/arx/), and am optimistic that it will meet my needs. If not, I'll try monotone, or perhaps even codeville. Maybe I'll even end up using darcs again, and that would be ok.

ArX started as a fork of arch/tla, but with the 2.x series it has evolved into something quite different (in a good way). It is very easy to learn and use, and hopefully will be easy to port to MS Windows. If you are evaluating distributed RCS's, it's definitely worth a look.

Kevin

When do we get good articles on SCM - this ain't one

Posted Dec 5, 2004 18:57 UTC (Sun) by jschrod (subscriber, #1646) [Link]

The author was also not able to present arguments for and against distributed SCM systems; respectively didn't describe situations (a.k.a. use cases) where patchset-centric approaches are sensible and where they aren't. Not to mention the kind of development processes that influence the choice of SCM processes and associated tools. The world ain't black and white, folks. I'm using patchsets (formerly named changesets) since Aide de Campe, and I'm using version-oriented tools also since RCS came out. Both have their place.

The author came around as a fanboy who wanted to sprout about shining new tools that he discovered. This is a bad article by LWN standards; we had already several articles on SCMs in the past, no new information was brought forward, no new insight.

Btw, a nice reading list about SCM, although not updated for some time, was collected by Brad Appleton, at http://www.cmcrossroads.com/bradapp/acme/scm-readings.html

Cheers, Joachim

Importance of client-server protocols

Posted Dec 5, 2004 20:32 UTC (Sun) by robla (subscriber, #424) [Link]

A lot of version control authors (the creator of arch being among them, IIRC), don't understand the importance of dedicated client-server protocols in creating version control systems. Thus, many "next generation" version control systems make it architecturally impossible to provide secure commit access.

I think distributed development is a cool concept, and can see the allure. However, in large teams of professional developers, a central repository is a fundamental requirement. Here's the biggest reasons:

  • Central backups
  • Official repository for nightly/incremental builds
  • Automated change notification/auditing
All developers who are building code that is to be backed up and built on a regular basis must be given commit access to add new revisions for the nightly builds, while at the same time, be restricted from the ability to "rewrite history", i.e. corrupt or intentially remove older revisions or other audit information.

I've yet to see a next generation, open source solution provide this, other than Subversion, nor has anyone ever provided a credible explanation for why this requirement can be met without a client-server architecture. I'll admit that I haven't yet checked out many of the other systems listed, but I have been conducting an earnest investigation, and I'm fearful that they are like arch in relying on file-level access control, and using the "distributed" mantra as cover for dodging these important requirements.

The sort of power that auditable write access provides is most vividly demonstrated by Wikipedia. As a result of having rigorous auditing of write access, they've proven that it's possible to produce a quality product by giving the whole world write access. There's plenty of vandals, but by having an audit trail and good tools for backing out changes (and blocking vandals), it's possible for the good guys to outpace the bad guys. While I wouldn't recommend the wide open approach for most projects, it provides a good example of why auditable write access isn't just for control freaks.

Does anyone here know which (if any) of the systems mentioned in the story or the comments provide auditable write access, other than Subversion?

Rob

Importance of client-server protocols

Posted Dec 6, 2004 21:26 UTC (Mon) by kevinbsmith (guest, #4778) [Link]

Your first two requirements do not require a central repository. They merely require that one repository be designated as "official". It would be backed up rigorously, and would be the source for official builds.

Distributed RCS systems offer at least two approaches to solve your write/audit concern. Some packages offer one or the other, and some offer both:

1. Use of a "patch manager" queue process. Developers submit their patches into the queue, and the patches are verified before actually being committed into that repository. The patch manager might require that all patches be signed by an authorized developer, for example.

2. Use of a "pull" model instead of a "push" model. Every developer can publish their own repository, and the official repository can pull changes directly from those sources. Systems like arch, ArX and darcs that allow any http server to host a read-only repository make this especially easy.

Also, with some systems, every patch is signed by the author, so you have a full audit trail available. Note that this audit information acutally follows the patch around, rather than merely providing protection to a single server.

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds