|
|
Subscribe / Log in / New account

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

At his blog, Jelmer Vernooij has written a detailed retrospective on the history of the Bazaar version control system, including a lot of analysis of the project's ups and downs over the years. "We just made these changes to the file format as they came along, rather than accumulating them. This meant that at one point there was a new format every couple of months. Later on, we did slow down on format changes and no new format has been introduced since 2009. Unfortunately we have been unable to shake the image that we introduce a new file format every fortnight."



to post comments

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 4:01 UTC (Thu) by yarikoptic (guest, #36795) [Link] (1 responses)

My summary quote is "I think it's time to move on. There are still some things I don't like about it, but Git is a decent source code management system. Bazaar isn't going anywhere; no doubt there will be users for a few years to come, and people contributing fixes, but it hasn't been adopted to the level I was hoping.".

On a related note -- happen Canonical not closed-sourced launchpad development until "nobody cared anymoe", its engine as well as bzr could have been different now. They could have been approached by contributors/developers wanting other VCS backends in LP than bzr. That could have not only made launchpad a platform that others could deploy privately, but probably could help bzr as well.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 14:17 UTC (Thu) by cmorgan (guest, #71980) [Link]

+1

Its useful to see the explosion of different ideas around a similar concept, git, arch, hg, bzr, but its also nice when there is some consolidation around a smaller set of tools that are most suitable, most popular etc so we can all have something we know how to use.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 7:15 UTC (Thu) by Lukehasnoname (guest, #65152) [Link] (18 responses)

I can't comment on the details listed in the article. I do see he thinks that Bazaar is a stale project, and perhaps that's true.

Here's an obligatory anecdote: I have tried to learn git several times, and I find its command names and tools somewhat overwhelming, and inconsistent in naming.

I tried bzr once. I followed the tutorials on the project site, and its presentation of DVCS concepts made sense to me immediately, and I really enjoyed using it for the week or so that I got back into programming :)

If I get the itch to make my site again, it will be controlled with bzr.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 8:16 UTC (Thu) by djc (subscriber, #56880) [Link] (1 responses)

You should try Mercurial. Much better UI than git, with performance almost as good.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 21, 2012 23:02 UTC (Fri) by kleptog (subscriber, #1183) [Link]

Actually, I find the UI of git easier than Mecurial, but I think that's mostly because I try to make Mercurial work with the work flow I use for git and it just won't do it.

Since I grasped the idea of the index in git, I find it difficult to use any other VCS. The way git allows you to make changes to your source tree and then selectively commit them (git add -p) is incredibly addictive. The MQ extension in Mercurial tries, but ultimately fails because the underlying model doesn't support it.

Git has a really simple mental model, and all its commands are part of a toolkit to work on that model. Mercurial keeps telling me can't do what I want. Rewriting history is nice for perfectionists, and git's toolkit approach makes it possible for tools like Gerrit to integrate the review process seamlessly with the revision control.

IMHO Mercurial really needs to be able to support a Gerrit-like workflow before I'd seriously consider using it in another project.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 8:41 UTC (Thu) by dgm (subscriber, #49227) [Link] (15 responses)

Many people complain that git naming and commands are inconsistent, and that it's somehow difficult to learn. Yet it's very successful, more than more consistent and "easier" dvcs. How is that possible?

I think there are some factors. One, of course, is the Torvalds factor. But there are more. The most important one is that, in fact, you only need a dozen commands and four or five concepts to make good use of git. From there on, you can grow organically.

But the most important factor is that git has reached the critical mass of help around faster. Should I have any problem doing anything with git, 99.99% of the time a bit of google-fu leads to the solution, and usually leads you to discover something new you didn't know was possible. That's very rewarding. It's very possible that this mass of help and workflows floating around in blogs and forums is caused by git's own "imperfections". Another case of worse-is-better, maybe?

Finally, there are always "easy git" (and the rest of alternative "porcelain") for those that like more consistent names and commands. It's nice and delivers as promised, but I personally gave up and went with the default. The added value of the forums and blogs is just to big to be ignored.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 10:32 UTC (Thu) by jengelh (guest, #33263) [Link] (10 responses)

>How is that possible?

Maybe people started to realize it is not the UI "bling" (or lack thereof) that counts, but having the implementation Done Right, and all that _while_ retaining or increasing productivity (measurable; PHBs counting commits per time in the SCM).

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 14:34 UTC (Thu) by yarikoptic (guest, #36795) [Link] (9 responses)

It is like a Bible -- there was GIT and its UI was crap and then HG and Bzr came with a nice UI... People keep repeating that but either I was too early an adopted of GIT or that is just a matter of different personal preferences. Sure thing I do not remember my initial feelings about GIT cmline UI but I had no real difficulty starting to use it efficiently once I understood what staging area is, what is remote and then what branch is (former CVS experience helped to comprehend that); and 4 so "difficult" to remember commands: 'clone', 'commit', 'push', 'pull' which were sufficient for the basic use. But with adopting GIT mentality I got crippled in HG and Bzr -- any time I need to use them, I feel frustration since some things do not make sense to me, or things are just way too slow, and as a result I do not have work done. Then I come back again to git-hg/bzr helpers to just get things going.

So, for anyone who is asking me if that is indeed true that HG/Bzr is easier on the mortals' brains -- I am answering "Maybe, but not", giving them 5 minutes "tutorial", explaining them the meaning of RTFM, reference them Git foundations by M.Brett (http://matthew-brett.github.com/pydagogue/foundation.html) for a night fun reading, and people usually become happy GIT users down the road.

Is GIT perfect? nope -- indeed I wish GIT had similar got SVN and Bzr handling of remotes without needing local clones, but those usecases come rare enough. And hopefully land in some state in GIT in the future.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 15:52 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)

>Is GIT perfect? nope -- indeed I wish GIT had similar got SVN and Bzr handling of remotes without needing local clones, but those usecases come rare enough. And hopefully land in some state in GIT in the future.
Shallow clones work just fine.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 16:11 UTC (Thu) by yarikoptic (guest, #36795) [Link] (6 responses)

really?
may be I have missed the bus -- how would I accomplish with a shallow clone something similar to

"svn log REMOTE/trunk" to get recent activity on the trunk (or a particular file of interest )
"svn ls REMOTE/trunk/dir" to list all files under dir
"svn cat REMOTE/trunk/dir/file" to actually see the interesting one for me

without fetching e.g. 100MB of the current content of the REMOTE's master treeish?

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 16:59 UTC (Thu) by hummassa (subscriber, #307) [Link] (5 responses)

git clone --depth 20 server:git/repo

will not get the whole tree, but just the info and files for the last 20 commits

git log # will show the last 20 commits
ls dir
cat dir/file

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 17:09 UTC (Thu) by yarikoptic (guest, #36795) [Link] (4 responses)

exactly -- I need some intrinsic knowledge of how many commits I need to fetch and then fetch all files touched by those, but the point is that I do not know how many commits, and I do not want to fetch anything irrelevant. I just need to

1. see what files are there NOW (after all GIT is a content tracking, right? ;-) )
2. what is the content of the file NOW, regardless when was it modified -- 1 commit back, or 1000 commits back

svn (and probably bzr, according to the blog -- I never used those features myself) provide such functionality. I did need it a few times too. That is why was my comment

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 17:20 UTC (Thu) by hummassa (subscriber, #307) [Link]

You are right, but I usually set up gitweb for those needs.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 18:13 UTC (Thu) by tuna (guest, #44480) [Link] (1 responses)

I thought that git does not have files, only changes?

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 19:59 UTC (Thu) by jengelh (guest, #33263) [Link]

The other way around. Basically, Git stores complete objects rather than some delta thing. Creating some commit whose diff reads

diff --git a/net/ipv4/netfilter/ipt_ECN.c b/net/ipv4/netfilter/ipt_ECN.c
index 4bf3dc4..5508113 100644
[...]

will get you a .git/objects/55/0811365ac655c3b2d4f9183112e15ad0ae17ba. It is compressed, but it is still standalone/complete/non-deltified. You can nuke all other objects, and git show 550811365 should still succeed.

Deltified packs are a strap-on option (one that's enabled by default because of its usefulness) to the base design.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 19:30 UTC (Thu) by dlang (guest, #313) [Link]

umm, One of us is misunderstanding something

doing a

git clone --depth 20 server:git/repo

doesn't give you only the files that have changed in the last 20 commits, it gives you ALL the files in the project, as they existed (changed or not) for the last 20 commits

so if you only care about what the files are NOW you can do "--depth 1", no need to know when it changed.

If you want the last 20 changes of a file, and that file hasn't changed for the last 1000 commits, then you really do want the full history, because the file may not have changed 20 times since it was created

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 21, 2012 3:13 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Some more anecdata (I'm a heavy git user):

I was using Mercurial for < 1 hour and hit a (what I consider to be) data-loss bug[1]. Granted, I didn't really read a proper hg tutorial, but the bug is still basically untouched even with a testcase and is marked as a bug because I used hg "wrong". My biggest problems with it are[2]:

- no index (this is the real killer for me using it day-to-day); and
- there wasn't anything that did git rebase -i nearly as easy last time I tried (which I use often, but could probably be get used to missing if there was painless merge resolution (see git mergetool) and something like git rerere).

I understand git being a little steep at first (but so is Vim, emacs, etc.). Understanding how git views things is a core part of groking it enough to be proficient with it. Basically, I don't care as much about the learning curve of tools I use every day, I just require them to let me get my work done. "Intuitiveness" is useless to me. I'd rather it work and work well. If I used Mercurial every day, with the way I've gotten accustomed to working, I'd probably hit that bug at least once a week.

[1]http://bz.selenic.com/show_bug.cgi?id=3423
[2]Technically, "were" since I'll be using something to convert any Mercurial repository I use over to git first. The verbiage change just doesn't work well with me.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 15:19 UTC (Thu) by nix (subscriber, #2304) [Link]

The irony here is that for me at least I found git nice and simple because it had a clear core model in which it was easy to build up -- yourself -- whatever workflow you preferred. Bazaar tried to make things simple by adding explicit support for lots of workflows, but unfortunately that means you need to understand them all before you can figure out how the tool works and how to use it.

The number of such workflows, many closely similar, and all accompanied by deep changes in the behaviour of lots of bzr commands, is frankly overwhelming. I still can't keep them separate in my head despite much trying. And I am not the only one (there was a comment in the recent bzr-development-stopped thread which said much the same, and that thread was mostly populated by longtime users and developers of bazaar).

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 21, 2012 5:28 UTC (Fri) by dpotapov (guest, #46495) [Link] (1 responses)

> Many people complain that git naming and commands are inconsistent, and that it's somehow difficult to learn.

Some complains about Git being difficult to learn comes from its early days. (Before Git 1.5.0, Git did not have any UI suitable for normal folk, and then the user documentation was somewhat scarce or confusing as it often referred to some low-level "plumbing" commands, but many issues got fixed over time.)

Now I think most complains come from people who try to use Git in a centralized workflow as if it were CVS or SVN. Git appears unnecessary complex for that workflow and does not follow CVS naming convention. While Mercurial and Bazaar has tried to make some things more CVS like, Git has focused more on better support of distributed workflows and general flexibility. (Linus Torvalds is known for his disdain to CVS: "I'm trying my best to be a humanitarian and rid the world of the scourge that is CVS, but I'm not sure I can undo the untold mental damage wrought by it over decades of quiet suffering.")

"Easy to use" is rather subjective, it often depends on past experience and your workflow. In simple cases, usually you can see one to one match between Git, Mercurial or Bazaar commands, though those names may be different. For example, "git pull" = "bzr update" = "hg pull --update".

Apparently, Bazaar uses "update" due to tradition going back to CVS, where developers periodically use "cvs update" to merge their own (not yet committed ) changes with the upstream. In the Linux kernel workflow, developers are discouraged from doing unnecessary merges their topic branch with the current 'master' branch. Typically, it is the upstream maintainer who pulls changes and resolves possible conflicts. Thus 'pull' makes more sense in this context: Linus Torvalds does not update his master branch, he pulls new changes from his lieutenants.

Another thing is that Mercurial and Bazaar tries to give you a nice set of basic commands but for anything else you need plugins (or extensions in Mercurial). If you install Git, you get a lot of power out-of-the-box, so naturally it comes with more commands, and Git does not try to hide useful things from you (like the staging area). So some people find it more difficult in the beginning, but once you understood the core model of Git, most things immediately start to make sense, and Git is more transparent in what it does.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 23, 2012 12:09 UTC (Sun) by juliank (guest, #45896) [Link]

Please note that bzr pull exists as well, and bzr merge. The equivalent to git's pull is bzr merge, as far as I can tell.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 27, 2012 13:10 UTC (Thu) by cstanhop (subscriber, #4740) [Link]

> How is that possible?

Github.

Still working and useful

Posted Dec 20, 2012 11:46 UTC (Thu) by coriordan (guest, #7544) [Link] (4 responses)

It's a pity that this developer is pessimistic about BZR, but it's still a documented, debugged, working piece of free software that a lot of people know how to use.

"Move on" sounds like a nice way to say "abandon", but I don't see a reason for people to abandon BZR.

Still working and useful

Posted Dec 20, 2012 23:21 UTC (Thu) by nickbp (guest, #63605) [Link]

I don't know if it's at this point yet, but there is some value in a given project using something that potential contributors already know by default.

Still working and useful

Posted Dec 25, 2012 8:13 UTC (Tue) by cmccabe (guest, #60281) [Link] (2 responses)

First of all, kudos to Jelmer Vernooij for writing a great article and also for contributing to bzr and other projects.

To be honest, however, I think it would be a good thing for bzr to ride off into the sunset. I think most developers don't have time to master all of bazaar, mercurial, darcs, BitKeeper, arch, monotone, and git. You kind of have to pick one or two to focus on, and it looks like the community has made its choice.

I think everyone pretty much agrees that git has a lot of abilities that the other systems don't-- for example, the ability to handle big repositories without grinding to a halt. In contrast, the only thing that's ever been pointed out as an advantage of the other systems is a "better UI." I have the feeling that in a lot of cases "it has a better UI" is code for "I was exposed to it first."

With regard to the a-word, continuing to use a legacy system can mean "abandoning" the wider developer community, and that's far more important than any individual project.

Still working and useful

Posted Dec 25, 2012 17:15 UTC (Tue) by bronson (subscriber, #4806) [Link] (1 responses)

Have no fear, four of those packages have already ridden off into the sunset.

Of the remaining three, I agree, bzr is the least engaged. No need to hurry its departure, it'll happen soon enough.

Just imagine how many paid man-hours Canonical dumped into bzr...

not to mention it's a GNU project

Posted Jan 7, 2013 14:06 UTC (Mon) by alex (subscriber, #1355) [Link]

The Emacs VCS is bzr because FSF favours GNU projects over non-GNU if they are functionally equivalent (and I believe an even split in preference amongst the devs).

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 12:50 UTC (Thu) by rleigh (guest, #14622) [Link]

A very interesting review of the history. I do wonder why it didn't take off to a greater extent than it ultimately did. Is hiring full-time people ultimately detrimental to fostering a development community by giving the impression of being an internal project, or was this mainly reducing existing developer time as they got involved with other projects within Canonical? The other factors appear to be the copyright assignment (but this came in relatively recently in its history). Or was it really just a case of being unlucky, and not being in the right place at the right time?

I should have been a bzr user, given its origins with tla/Arch. Back when tla was in its infancy, I converted all my code to Arch repos, and started all my new work with tla, and used it fairly exclusively for several years. tla had many faults, but it was unique at that point, and was the only realistic alternative to CVS (before SVN was ready). baz/bzr would have been a natural upgrade path, but /at the time/ it always looked like the repo format changed so often that it wasn't ready/safe for serious use (at least, that was the impression I received). All these repos were eventually imported to git, some with a intermediate reversion back to CVS/SVN. But I do wonder if it wasn't for the initial perceptions of the project, they would have otherwise have been in baz/bzr for the last 6-8 years, and if this applied to other projects as well.

With hindsight, I think ending up using git was the best choice technically, even if the "UI" is not as friendly, but things could have ended up quite differently 6-8 years ago if bzr had been the obvious choice.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 16:45 UTC (Thu) by iabervon (subscriber, #722) [Link] (1 responses)

I'm amused that he thinks that git's design is intimately tied to its low-level file format, and that it would be impossible to change how it stores things. The truth is that git's "on-disk representation" is actually an abstraction which the library is clever enough to convince the world is reality. For storage, it mostly uses an entirely different pack format, developed much later, which is quite complex and has gone through several revisions. Part of the reason that git manages to have such a strong abstraction layer is that the abstraction layer is presented as a simple low-level file format that can't be changed; nobody is tempted to violate the abstraction because they don't realize it's an abstraction. This leaves the actual on-disk format easy to improve, because nothing depends on it, which is how git manages to be so space-efficient. It is pleasing to see someone fail to notice that "simple file format that novices can understand" and "requires less space than Bazaar's current efficient format" can't really both be true.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 22, 2012 17:13 UTC (Sat) by Wol (subscriber, #4433) [Link]

But isn't that ALSO the way linux interacts with hardware?

The main core of linux enforces a strict abstraction of the hardware, and drivers have to convert the vagaries of real hardware into the abstraction that linux expects.

I remember an article by Linus where he said taking that approach actually results in cleaner, more efficient code. If the theoretical best way to do something is X, it is better for the kernel to assume X and have a shim compensate for defective hardware, than to actually make the kernel aware of what is really going on.

Cheers,
Wol

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 19:15 UTC (Thu) by joey (guest, #328) [Link] (2 responses)

This strikes me as a very hard thing to write about a project with which one is intimately involved, and which Jelmer manages admirably well. I only hope that if I'm ever in such a position I can create that much value on the way out.

There are bzr features mentioned in this that I didn't know about, as someone who only uses bzr very casually. One is handling of the the left/right sides of history, where one is simple (and presumably, good for rebasing) and the other contains all the detail of intermediate commits. It's apparently well ahead of git in this area.

I think it's important to keep in mind that, as the VCS area contracts and consolidates (which it clearly has been, for years), there's danger of losing innovative stuff like that. Another example is that I think bzr or other VCS developers would claim to have merge algorithms that are often better than git's for text files. It would probably be fruitful to find these ideas and bring them into git as add-ons.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 20, 2012 19:50 UTC (Thu) by dlang (guest, #313) [Link]

remember that in git you can configure it to use arbitrary programs for diff/merge so if anyone has a better one (either for general text files, or for specific formats like XML or JSON) it's pretty easy to start using them.

Vernooij: Bazaar-NG: 7 years of hacking on a distributed version control system

Posted Dec 21, 2012 3:22 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

> as the VCS area contracts and consolidates (which it clearly has been, for years)

On the whole, I agree. There was veracity[1] which showed up a year or two back (that's when I first heard of it at least). It's got some stuff from fossil (the "all-in-one") and also does formal rename tracking (which I think git got right here). The developers also mentioned that they don't like history rewriting, but I'm not so convinced that the history being an accurate representation of how the code was written is anywhere near as important as minimizing merge conflicts (via rebase) or making changes easily to revert (not easy if a change has 5 "fix typo" commits following it later down the branch).

[1]http://veracity-scm.com/


Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds