LWN.net Logo

Version control for Linux (developerWorks)

developerWorks has put up a survey of source code management systems which run on Linux. "Arch is a specification for a decentralized SCM that offers many different implementations. These include ArX, Bazaar, GNU arch, and Larch. Arch not only operates as a decentralized SCM, but also uses the changeset model. The Arch SCM is a popular method for open source development because developers can develop on separate repositories with full source control. This is because the distributed repositories are actual repositories complete with revision control. You can create a patch from changes in the local repository to be used by an upstream developer. This is the real power of the decentralized model." (Thanks to Jake Edge).
(Log in to post comments)

Version control for Linux (developerWorks)

Posted Oct 11, 2006 18:40 UTC (Wed) by JoeF (subscriber, #4486) [Link]

The author doesn't seem to know some of the version control systems he surveys...
He says:
"In the snapshot model, complete files are stored for the entire repository for each revision (with optimizations to reduce the size of the tree). In the changeset model, only the deltas are stored between revisions, creating a compact repository (see Figure 3)."
He then goes on to say
"Concurrent Versions System (CVS) is one of the most common SCMs around today. It's a centralized solution using the snapshot model"
and
"Like CVS, Subversion is a centralized solution and uses the snapshot model."

This is of course wrong. Both CVS and Subversion store deltas, i.e., what the author calls the "changeset model."

Version control for Linux (developerWorks)

Posted Oct 11, 2006 19:40 UTC (Wed) by felixfix (subscriber, #242) [Link]

Noticed that. I wonder if there are, or ever have been, any version control systems which did not store deltas. Even old SCCS stored deltas. It seems a pretty pointless distinction to me.

Version control for Linux (developerWorks)

Posted Oct 11, 2006 19:52 UTC (Wed) by niner (subscriber, #26151) [Link]

AFAIK git stores full versions

Version control for Linux (developerWorks)

Posted Oct 11, 2006 19:56 UTC (Wed) by proski (subscriber, #104) [Link]

git doesn't store deltas by default. You have to run git-repack to create deltas.

This further indicates that the distinction between "snapshot model" and "changeset model" is artificial and useless. It also places CVS in the "snapshot model" category based on the server storage alone, while in fact CVS is unique among the currently used systems in NOT having any support for changesets spanning more than one file.

Version control for Linux (developerWorks)

Posted Oct 11, 2006 20:01 UTC (Wed) by proski (subscriber, #104) [Link]

I mean, it also places CVS in the "changeset model" category. The author had trouble applying his own definitions to CVS, and so did I :)

Version control for Linux (developerWorks)

Posted Oct 11, 2006 21:38 UTC (Wed) by JoeF (subscriber, #4486) [Link]

The changeset definition that the author uses, based on that the server stores deltas, is of course not how the SCM community defines it.
A changeset has nothing to do with how the servers store the data. It is an abstraction that specifies a collection of changes. CVS doesn't really use changesets, but Subversion does. I don't know enough about Arch to comment on its changeset use.

Version control for Linux (developerWorks)

Posted Oct 12, 2006 20:03 UTC (Thu) by njs (guest, #40338) [Link]

To make it worse, most of the people _I_ know who actually work on SCMs prefer yet another definition again. The thing is, all the SCMs that anyone actually works on have whole-tree atomic commit as a matter of course, so it's pointless to talk about it. Also, traditional SCMs that use that terminology (perforce and bitkeeper in particular, I believe?) actually take both the "change" and "set" parts seriously -- because they are based on older systems that did commits per-file, they still have a concept of a single-file change, and then a changeset is a secondary structure that groups several such changes to together. In modern free SCMs, the whole concept of a single-file commit is just gone -- you have only one sort of change, and it is a change to the tree. So the term is misleading anyway.

OTOH, probably the single deepest divide between modern designs is between darcs on the one hand, and everyone else on the other. Darcs represents history as a set of equivalence-classes of tree changes, has basic operations like "commute", and is good at cherrypicking but not at referring to a particular tree state ("yesterday's build"). Other systems represent history as a DAG of tree changes (so still changes, but each change has a particular immutable location as well), have basic operations like "merge", and are bad at cherrypicking but excellent at referring to a particular tree state. Which of these works better, the relations between them, and so on, are active research topics, so it's useful to have a term to refer to the distinction, and people end up re-using the familiar words "changeset" vs. "snapshot"...

Life would be easier if this overloaded "changeset" term would just go away.

Version control for Linux (developerWorks)

Posted Oct 11, 2006 21:36 UTC (Wed) by bronson (subscriber, #4806) [Link]

This is one of those awful articles that muddies the subject more than it clears it. I pity anyone who tries to read it as an introductio to SCMs. Just look at Fig. 2! I use svn, svk, git, and p4 daily and *I* find that diagram impenetrable.

Also, as other people have stated, the author totally misunderstands the difference betwen the snapshot and the changeset models. It has nothing to do with the on-disk storage format. In the snapshot model, you browse your project's history by file revision (svn co -r 441), and in the changeset model you browse its history by changeset number. In other words, if the history were represented by a DAG, are you specifying the graph's nodes or its edges? This distinction is mostly irrelevant anyway... snapshot is slightly easier to understand and explain, and the changeset model is slightly easier to navigate in a tree that suffers lots of branching. Git, in particular, is both and neither.

Also, how relevant is Arch these days? I tried it out in 2004 and loathed it. Tom Lord's oddball opinions on how projects should be managed just got in my way at every turn. Though I've never tried it, I hear bzr solves most of Arch's difficulties. I'm very surpried to see Arch used in this article instead of the (imo) far more vital Git, Mercurial, Bzr, Monotone, or Darcs projects.

arch's role

Posted Oct 11, 2006 21:47 UTC (Wed) by dark (subscriber, #8483) [Link]

I prefer to think of it this way: arch was very experimental and Tom Lord tried out a lot of new concepts. Some of these worked very well, others did not work very well. Arch inspired spinoff and successor projects took the parts that worked very well and did great things with them. Overall, the experiment was a great benefit.

arch's role

Posted Oct 14, 2006 16:13 UTC (Sat) by kevinbsmith (guest, #4778) [Link]

I don't think the poster intended any insult of Tom or arch. Yes, arch was an extremely valuable project. Tom Lord is/was visionary, and his work was the critical building block that allowed the rest of the FLOSS distributed RCS tools to get started. But if you are about to create a new RCS repository today, very few people would (or should) choose arch.

I completely agree that bzr, darcs, mercurial, monotone, or perhaps even svk would have been a far better choice for that third slot in the review.

Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds