Git != BitKeeper

Posted Nov 17, 2013 0:35 UTC (Sun) by Tara_Li (guest, #26706)
In reply to: Git != BitKeeper by david.a.wheeler
Parent article: Four years of Go

Indeed. Git & Bitkeeper are very different, and Git is no more just a reimplementation of Bitkeeper than Bitkeeper is a reimplementation of CVS or Mercurial (in fact, Mercurial *is* an open-source clone of Bitkeeper). One of the signs of the power of Git is the growth in free online Git repository sites for easy collaboration with many people.

Git != BitKeeper

Posted Nov 17, 2013 2:08 UTC (Sun) by ledow (guest, #11753) [Link] (21 responses)

Is anyone else not scared by the fact that one person, who is far from perfect, managed to implement a "better", more popular source code repository system that all of the big companies in the world combined... as an afterthought... after he was annoyed at the removal at a tool he was using... within no time at all... and then went back to his day job and forgot about it?

I think it says more about what things like SVN, git, BitKeeper etc. actually provide than what a genius he might be, but still... a pretty shocking indictment on the capabilities of professional software developers (whom should have been watching, and whom should all have been able to jump in with "Well, why not try our tool... here, we'll make a free version just for Linux developers...")

Git != BitKeeper

Posted Nov 17, 2013 6:30 UTC (Sun) by Tara_Li (guest, #26706) [Link]

That thought has actually occurred to me, as well. Then again, the GIT that Linus released was the bare bones - from there, a lot of work has gone into improving it, both internally and external interfaces. So it's not completely fair to say that he did it overnight.

As for a free version just for Linux developers, well - that was pretty much precisely the issue that triggered Linus' Coding Of Rage spell that created GIT - BitKeeper had been doing pretty much exactly that, and pulled the license suddenly, so Linus attacked the problem at its very root.

Necessity mother of invention

Posted Nov 17, 2013 14:25 UTC (Sun) by david.a.wheeler (subscriber, #72896) [Link] (17 responses)

When Bitkeeper's license got revoked, Linus quickly read some of the literature and emailed the various authors. I know this because I had written stuff like SCM security; he contacted me and we had a great email discussion about SCMs. Linus was not coming into this cold, either; Linus had had many discussions with people over the years about version control, and of course was an expert in managing a really big project.

But Linus wanted an SCM system that did certain things significantly better than all existing systems (including BK). He wanted to process patches at orders of magnitude faster than even BK, for example. Necessity is often the mother of invention.

And sometimes what you need for innovation is a different perspective. Linus came at this problem as a filesystem designer. The early versions of git were more like a filesystem than like a version control system. Obviously, that turned out to work really well. It shouldn't be surprising that every once in a while, a big change in perspective produces a really useful result. After Linus got started, of course, a lot of other people got involved.

Necessity mother of invention

Posted Nov 17, 2013 16:41 UTC (Sun) by kleptog (subscriber, #1183) [Link] (16 responses)

The most amazing thing I find about git is the simplicity of its data model. After just a few slides of explanation any competent programmer could reproduce the basics of git in a week or two.

git is also trivial to script, because the underlying plumbing is available, which makes it easy to integrate into other workflows where previously no VCS would have made sense.

After using git I find using any other VCS feels like beating my head against a brick wall.

Anyway, I think what made Linus succeed is that he had no stake in any of the existing version control systems, which meant he could go his own way. The man page for git still says "stupid content tracker" which is what it is, it's just also very useful for revision control.

Necessity mother of invention

Posted Nov 18, 2013 12:57 UTC (Mon) by ms (subscriber, #41272) [Link] (15 responses)

> After using git I find using any other VCS feels like beating my head against a brick wall.

FWIW, I find the exact opposite. The degenerate branching for one is an utter PITA for the workflows I'm used to with tools like mercurial and monotone. The fact that a branch is just a pointer to one revision rather than an arbitrary set of revisions is a major pain point. In fact I would suggest that the deficiencies of branching in Git is why so many people rebase and (often) squash rather than merging from master into their own branch. Personally, I just don't like that - from a code archaeology POV squashing is an awful idea, and rebasing is utterly unnecessary if you have decent integration with merge tools and are used to merging and dealing with concurrent changes going on. That said, I'm also a big fan of one branch per bug, which also seems to be a rarer position these days.

However, it's undeniable that git has pretty much won so maybe the fault is with me.

No real branches in Git

Posted Nov 18, 2013 13:46 UTC (Mon) by rvfh (guest, #31018) [Link]

> The fact that a branch is just a pointer to one revision rather than an arbitrary set of revisions is a major pain point.

Yes, it makes more sense to talk about 'heads' than 'branches'. A branch to me implies a clear starting point. And no, I don't think 'git merge-base' is the answer :-)

Necessity mother of invention

Posted Nov 18, 2013 14:26 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

I like rebasing branches because sometimes the code just moves too much and merging a month of work all at once just isn't a good use of time when conflicts can be done on a per-commit basis. I'm also not so worried about archaeology ("oh, you made a typo in X and fixed it 3 commits later" is *useless* in the final, merged history) as bisectability. Maybe some of it comes from git making merge conflict diffs hard to see, but I'd still rebase if it were easy.

Necessity mother of invention

Posted Nov 18, 2013 16:58 UTC (Mon) by khim (subscriber, #9252) [Link] (12 responses)

Git and mercurial just have opposite phylosophies WRT history.

Mercurial's POV: history of your project is important and thus should be preserved as much as possible. Everything should be kept including all the marginal details.

Git's POV: history of your project is so important that you should not leave to the chance. Skilled chronicler must craft it instead.

Everything else comes from this difference: mercurial needs (and gets) tools which help you to drop the insignificant detailt, git needs (and gets) helper which help the aforementioned chronicler (sometimes called maintainer).

Rebase and/or squash in git are natural consequences of this approach: why would you need to keep useless history of your experimentations with the problem space around? Better to squah the change and split in into logical chunks instead of using chronological chunks!

Note that properly done these actions may actually help “code archaeology”. If you goal is not to find something to incriminate somone but to find the root of problem, I mean. It only becomes ugly when people rebase and/or squash without bothering to actually check that history they are producing after that still makes sense.

One-branch-per bug makes perfect sense in your local repo when you are developing stuff, but it makes absolutely no sense in the central repo: you are not delivering bazillion binaries where one bug is fixed in each binary to your Q&A team, right? Why would you need to see some phatoms which were never actually used or tested in your history?

Necessity mother of invention

Posted Nov 18, 2013 17:09 UTC (Mon) by ms (subscriber, #41272) [Link] (11 responses)

Thank you - that's a very helpful explanation of the consequences of the differences involved.

It certainly seems to me that git gives you way more rope to hang yourself with and skilled users can do everything and more that you can do in other DVCSs but requires a more hands-on approach. To me, tools like mercurial seem more proscriptive of workflow, and it just happens to be the one I learnt and know.

> One-branch-per bug makes perfect sense in your local repo when you are developing stuff, but it makes absolutely no sense in the central repo: you are not delivering bazillion binaries where one bug is fixed in each binary to your Q&A team, right? Why would you need to see some phatoms which were never actually used or tested in your history?

The fact that I genuinely shudder at the thought of only wanting to capture actual release candidates shows how weird I find this approach to development. I genuinely see no good reason why local and central should differ at all. More data is always better than less, and if you need to have tools to filter down and ignore the "noise" then so be it, but let's not be throwing data away.

Anyway, we're way off topic here, and I don't really have a bone to pick. I do though think it would be helpful for git to grow some features to better cope with branches as arbitrary sets of changesets rather than a feature, but I'm sure that'll come in time.

Necessity mother of invention

Posted Nov 18, 2013 17:17 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

One-branch-per-bug also makes sense as a third-party contributor since then branches are targeted and easier to get back upstream. If a project has code review for everyone (even core contributors), then one-branch-per-bug works there as well. If, on the other hand, committing directly to master is allowed, then there's not much gain in making branches for typos or (simple) bugfixes.

Necessity mother of invention

Posted Nov 18, 2013 17:35 UTC (Mon) by dlang (guest, #313) [Link] (9 responses)

> I do though think it would be helpful for git to grow some features to better cope with branches as arbitrary sets of changesets rather than a feature.

can you explain a bit more about what you mean by this?

git doesn't care what changes are in the branch, it's convention to use it for a feature/bugfix

Necessity mother of invention

Posted Nov 18, 2013 22:30 UTC (Mon) by kleptog (subscriber, #1183) [Link] (8 responses)

I think what is meant is the distinction mentioned elsewhere in this thread, that is git tracks heads and so refer to the end of a sequence of commits, whereas branches refer to the *beginning* of a sequence of commits.

So, like in SVN, you can point to a commit and say "this commit is in branch X", whereas in git you have to consider the branch to commits matching master..head, but that doesn't work anymore after merges. Once merged a branch could still exist in the history whereas in git a merged branch can be deleted and no trace would remain.

I'm not bothered by branch names vanishing, since I don't consider the "branch" a commit is in to be at all interesting information. The commit message, that describes the purpose.

I suppose you could make a commit hook that detected when you were making an off-trunk commit and added the branch name to the commit message?

Necessity mother of invention

Posted Nov 18, 2013 22:59 UTC (Mon) by dlang (guest, #313) [Link] (7 responses)

First off, one item that you are not understanding

in git there is no technical difference between a branch and a trunk, which one is the trunk and which a branch is purely convention.

While you can delete a branch from a repository, if it has been merged into another branch, all the history is going to be in that other branch. This includes the HEAD commit as of the point where it was merged, and that commit id (the SHA1) will always be uniquely identifiable in the merged version. So a branch can only really be completely deleted if it has never been merged.

so if you have

A->B->C->D
\E->F/

This is exactly identical to
A->E->F->C->D
\------B------/
(hopefully this shows properly with A->B->C being the lower loop, I don't think LWN has the equivalent of pre tags)

as for referring to the start of the branch vs it's end, does it really matter if you call a branch E vs F?

the _name_ of a branch (featurefoo) is a purely local convenience and has only two uses. Think of it as a symlink into the commits. it points at the most recent commit on that branch as it's changing. The two uses are:

1. the local developer uses it to switch between branches when working

2. people pulling updates from a developer use it to say 'I want the most recent work on this branch'

it would be theoretically possible to work without named branches, all you would need to do is have some other mechanism to track what the most recent commit ID for a branch is, and then people pulling could ask for that commit ID (instead of a branch name), and the developer could just checkout that commit ID, and then add more commits to it (just be sure you save the newest commit ID when you are done :-)

the text name of the branch may not have any long-term meaning. There are thousands, if not millions of _different_ branches named for_linus (or equivalent) that all have different content.

all of this means that branch names are temporary disposable items in git, not long-term permanent identities the way you are thinking of them. This is a very different way of thinking that does take some time to wrap your head around, but once you accept that from a technical point of view, every branch is equal to every other branch a lot of interesting things open up.

Note that 'all branches are equal' is not only true about the branches in one repository, it's also true about branches in different repositories owned and managed by different people.

If Linus were to go crazy and start committing bad stuff to the 'mainline kernel trunk repository', the only thing that would have to happen cut him out entirely is for people to decide to pull from a branch in someone else's repository (and it doesn't have to be named "master" "trunk" or any other specific name)

This is similar to how some people started running the -ac kernel back when Linus was in his burnout stage, but even simpler.

Necessity mother of invention

Posted Nov 19, 2013 12:02 UTC (Tue) by ms (subscriber, #41272) [Link] (6 responses)

I think there's some confusion over terminology going on here.

In particular around "deleting branches". My understanding is that in git this is removing the branch pointer. But this is a confusing concept in general to those of us who cut our teeth on "history is immutable" DVCSs.

Whilst the equivalences you point out are true, I would suggest that names are very important. To suggest a daft strawman, they're as important as variable names. For one thing, integration with bug tracking systems: we have all sorts of hooks which look at the branch name and pass commit comments into the relevant bug in bugzilla (this is much nicer than adding specific tags to commit comments which are easy to forget to do).

Also, if the bug (and by bug I mean bug/feature/story/whatever/unit-of-work) gets reopened even after merging, it still makes sense to want to group all these commits together even if there ends up being multiple merges from bug-branch to master/trunk/stable/ready-for-qa/whatever. In even more extreme cases, you could actually need multiple branches of the same name - say if I have a "dev" and "stable" branch and I'm told to implement feature X documented in bug18723, I might well create a branch off "dev" called "bug18723" which implements it one way, but it turns out that so much has changed in the mean time and that management says we must backport this feature, that I create an entirely separate branch off "stable" also called "bug18723". Now these two branches are entirely unrelated in terms of the DAG of changesets, but they conceptually relate to the same feature and so should have the same name. It's basically impossible to deal with this situation in git (yes, you could do a lot of reset --hard bug18723 and track the commit shas manually, but this is basically equivalent to working without named branches, which as you point out, could be done).

What you need to be able to deal with this sort of issue is for a branch to not be a pointer to a single changeset, but an arbitrary set of changesets.

There's another example where greater flexibility with branches is a good idea. Say that someone has wrongly merged a branch to default. You want to back that out but unfortunately there are now downstream changesets already. You might think to commit a revert, but the problem there is that due to die-die-die, when the branch eventually gets merged correctly, the revert will likely win and you'll have to tidy up all the pieces yourself. So a better idea is to go back to the changeset on master before the erroneous merge, and fork from there but on master. So:

master: A -> B -> C -> D -> ...
         \       /
branch:   X -> Y

C is the bad merge. We now make some arbitrary (normally whitespace) change to B, creating a C', which is still on master:

                C'
               /
master: A -> B -> C -> D -> ...
         \       /
branch:   X -> Y

And we now merge the descendants of C into our "junk" branch:

                C'
               /
master: A -> B -> C -> D -> ...
         \       /              \
branch:   X -> Y                 \
                        \    \    \
junk:                    J -> K -> L

and then reapply those changesets onto C':

                C'-> D'-> ...
               /
master: A -> B -> C -> D -> ...
         \       /              \
branch:   X -> Y                 \
                        \    \    \
junk:                    J -> K -> L

Now we can continue to develop our branch as a child of Y, we have master at some descendant of D', we have no erroneous dangling branches, and we've never had to commit a revert, so die-die-die won't bite us.

Whilst all of this is possible in git, because it can't do a branch with multiple heads (and here, at many points we have master with multiple heads), it makes it much more painful than it ought to be.

I think your later points really get back to the idea that local can differ from remote. If you have integration with bug trackers and so forth, it really doesn't make sense to name your local branches differently from your remote branches, and I would suggest that it is the minority case where this level of flexibility is an important feature. Consequently, if you do have unified branch names and integration with other tooling, branch names are both long lived and frequently very important. IME, the way kernel dev is done is the exception rather than the norm. Now I'm not so daft as to suggest the kernel development process is being done "wrongly" at all - quite the opposite - and git is clearly a tool that fits the requirements there well. However, and this is quite possibly just my own biased opinion, it seems to me that it's incorrect to assume that all projects would "do better" if they used the same toolset as the kernel. Obviously, no one is forcing me to use git (well, actually, some projects at work are), but the popularity of github and the prevailing monoculture is a touch alarming.

Necessity mother of invention

Posted Nov 19, 2013 12:36 UTC (Tue) by dlang (guest, #313) [Link] (2 responses)

I'll post more later (after I get some sleep), but a couple details.

when you are talking about deleting branches, you are not actually changing the history. Git also has immutable history (you can go back and create a new branch/history, but it's not going to result in the same ID (part of what goes into the SHA1 id is the parents of the commit)

For most of the rest of what you are talking about, it requires that you have an 'authoritative' repository that can control the naming for all other repositories. With git there is no such central control, all repositories are technically equal (including the one on my laptop and the one Linus pushes to on kernel.org), so it's impossible to levy any requirements on naming across all copies of the repository.

I'll read the rest of your post and reply in more detail later.

Necessity mother of invention

Posted Nov 21, 2013 13:29 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

I think you can delete a branch - it's called "pruning". But that is only for ABANDONED code. You delete the reference to the head, and then a repository tidy-up finds all these "unreferenced commits" and gets rid of them.

But as others have said, if you want to keep the code, you merge your bug/feature branch back into mainline, and your branch has become a loop in the main tree.

btw, I've seen it commented elsewhere that people tend to like either mercurial or git - they "get" one approach and find the other difficult. That doesn't mean one is better than the other :-)

Cheers,
Wol

Necessity mother of invention

Posted Nov 21, 2013 18:58 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

When I played around with hg for a while, I ended up getting mixed up with the different verbiage between the two. This started the "ugh…things are *different*" annoyances which just snowballed from there. As for features, I really miss the index when working with non-git repos. It's just too ingrained into the way I work now to try and go back to svn-style commit preparation.

Necessity mother of invention

Posted Nov 19, 2013 16:13 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (2 responses)

> say if I have a "dev" and "stable" branch and I'm told to implement feature X documented in bug18723, I might well create a branch off "dev" called "bug18723" which implements it one way, but it turns out that so much has changed in the mean time and that management says we must backport this feature, that I create an entirely separate branch off "stable" also called "bug18723". Now these two branches are entirely unrelated in terms of the DAG of changesets, but they conceptually relate to the same feature and so should have the same name. It's basically impossible to deal with this situation in git (yes, you could do a lot of reset --hard bug18723 and track the commit shas manually, but this is basically equivalent to working without named branches, which as you point out, could be done).

Namespace your branches. All feature branches I work on start with 'dev/', if work is abandoned for some long amount of time, I rename it to 'wip/'. Here, I would probably have named the original 'devel/bug18723' and the other 'devel/stable/bug18723'.

> <branch diagrams>

What I'd do here is 'git revert -m1 C' to undo the branch merge (making commit R). 'branch' continues from Y and when it comes time to merge, revert R and then merge 'branch' in again.

I don't think I see why 'junk' is necessary at all here in either case. Is the "C -> D" line a dead head of 'master'? Does 'junk' need to live around forever? I don't like the sound of it, but I may have missed something.

Another solution in git is to branch off of the oldest release branch which the feature may apply to (and I'd rebase it back if more information came in later, but that's me). With conflicts on newer releases, branch again, merge in the newer release branch in, then merge back.

Necessity mother of invention

Posted Nov 20, 2013 10:54 UTC (Wed) by ms (subscriber, #41272) [Link] (1 responses)

> What I'd do here is 'git revert -m1 C' to undo the branch merge (making commit R). 'branch' continues from Y and when it comes time to merge, revert R and then merge 'branch' in again.

That's an interesting idea there - reverting the revert. The only problem there is if other conflicting changesets land on master in the mean time then you'll suffer some merging pain - which, fair enough, you'd also suffer with the merge even if you didn't have the erroneous merge, but I wonder if the revert-the-revert would get in the way if you also regularly merge from master into branch to bring in other changesets you want and to attempt to resolve conflicts ahead of the final merge.

> I don't think I see why 'junk' is necessary at all here in either case. Is the "C -> D" line a dead head of 'master'? Does 'junk' need to live around forever? I don't like the sound of it, but I may have missed something.

Right, so this sort of strategy has been figured out for history-is-immutable DVCSs. Yes, D is the head of a dead fork of master. Generally, having dangling branches is unpleasant - it's nice to have a perfect correlation between dangling branches and open bugs. So in such setups, a "junk" branch is useful as something where you can merge such dead forks into to just tidy up this sort of problem.

Necessity mother of invention

Posted Nov 20, 2013 13:26 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

You'd be fine even in that case. Basically, git undoes the *change* the merge did, but does not mark that the *commits* were undone (reverts are *not* special in git; they're just inverted diffs pre-generated from old commits). When you merge again, git will ignore everything older than merge-base between the branches, which means that commits already merged-then-reverted aren't considered. This is why you need to revert-the-revert: to get the changes the old merge did back from the grave.

If you're unwinding a branch with many merges into master, make a new branch (or head on the afflicted branch, but that breaks down for octopus merges), revert the merges in reverse topological order, then merge it back. In fact, this stuff is *the* reason to *never* do fast forward merges when merging branches in. You get one nice commit to undo rather than a string of N commits.

So any head of a branch which is an ancestor of a head of junk is "dead"? I'd call it "Hercules" :) .

Amazingly productive programmers

Posted Nov 17, 2013 15:37 UTC (Sun) by dskoll (subscriber, #1630) [Link] (1 responses)

No, not scared and not surprised. In a couple of decades in the software industry, I've discovered that some amazingly talented programmers are orders of magnitude better and more productive than the average programmer. I can think of a few examples in the free software world: Linus, obviously, but also Fabrice Bellard, John Ousterhout, Larry Wall, and Andrew Tridgell. I'm sure LWN readers can think of others.

I run a small software business and I was lucky enough that the first developer I hired apart from myself was one of these amazingly productive programmers, and furthermore he had excellent communication skills and was extremely diligent when it came to documentation and testing. I think that one person pretty much set our company on the road to rigorous, systematic software engineering.

Never underestimate the power of one person.

Amazingly productive programmers

Posted Nov 24, 2013 9:02 UTC (Sun) by thoeme (subscriber, #2871) [Link]

>I've discovered that some amazingly talented programmers are orders of >magnitude better and more productive than the average programmer
Yes, (luckily) I discovered that fact myself, me being a *not* talented programmer in comparision to my class mates when I was in university. Just in time to switch the main tpic ;-)