Necessity mother of invention

Posted Nov 19, 2013 12:02 UTC (Tue) by ms (subscriber, #41272)
In reply to: Necessity mother of invention by dlang
Parent article: Four years of Go

I think there's some confusion over terminology going on here.

In particular around "deleting branches". My understanding is that in git this is removing the branch pointer. But this is a confusing concept in general to those of us who cut our teeth on "history is immutable" DVCSs.

Whilst the equivalences you point out are true, I would suggest that names are very important. To suggest a daft strawman, they're as important as variable names. For one thing, integration with bug tracking systems: we have all sorts of hooks which look at the branch name and pass commit comments into the relevant bug in bugzilla (this is much nicer than adding specific tags to commit comments which are easy to forget to do).

Also, if the bug (and by bug I mean bug/feature/story/whatever/unit-of-work) gets reopened even after merging, it still makes sense to want to group all these commits together even if there ends up being multiple merges from bug-branch to master/trunk/stable/ready-for-qa/whatever. In even more extreme cases, you could actually need multiple branches of the same name - say if I have a "dev" and "stable" branch and I'm told to implement feature X documented in bug18723, I might well create a branch off "dev" called "bug18723" which implements it one way, but it turns out that so much has changed in the mean time and that management says we must backport this feature, that I create an entirely separate branch off "stable" also called "bug18723". Now these two branches are entirely unrelated in terms of the DAG of changesets, but they conceptually relate to the same feature and so should have the same name. It's basically impossible to deal with this situation in git (yes, you could do a lot of reset --hard bug18723 and track the commit shas manually, but this is basically equivalent to working without named branches, which as you point out, could be done).

What you need to be able to deal with this sort of issue is for a branch to not be a pointer to a single changeset, but an arbitrary set of changesets.

There's another example where greater flexibility with branches is a good idea. Say that someone has wrongly merged a branch to default. You want to back that out but unfortunately there are now downstream changesets already. You might think to commit a revert, but the problem there is that due to die-die-die, when the branch eventually gets merged correctly, the revert will likely win and you'll have to tidy up all the pieces yourself. So a better idea is to go back to the changeset on master before the erroneous merge, and fork from there but on master. So:

master: A -> B -> C -> D -> ...
         \       /
branch:   X -> Y

C is the bad merge. We now make some arbitrary (normally whitespace) change to B, creating a C', which is still on master:

                C'
               /
master: A -> B -> C -> D -> ...
         \       /
branch:   X -> Y

And we now merge the descendants of C into our "junk" branch:

                C'
               /
master: A -> B -> C -> D -> ...
         \       /              \
branch:   X -> Y                 \
                        \    \    \
junk:                    J -> K -> L

and then reapply those changesets onto C':

                C'-> D'-> ...
               /
master: A -> B -> C -> D -> ...
         \       /              \
branch:   X -> Y                 \
                        \    \    \
junk:                    J -> K -> L

Now we can continue to develop our branch as a child of Y, we have master at some descendant of D', we have no erroneous dangling branches, and we've never had to commit a revert, so die-die-die won't bite us.

Whilst all of this is possible in git, because it can't do a branch with multiple heads (and here, at many points we have master with multiple heads), it makes it much more painful than it ought to be.

I think your later points really get back to the idea that local can differ from remote. If you have integration with bug trackers and so forth, it really doesn't make sense to name your local branches differently from your remote branches, and I would suggest that it is the minority case where this level of flexibility is an important feature. Consequently, if you do have unified branch names and integration with other tooling, branch names are both long lived and frequently very important. IME, the way kernel dev is done is the exception rather than the norm. Now I'm not so daft as to suggest the kernel development process is being done "wrongly" at all - quite the opposite - and git is clearly a tool that fits the requirements there well. However, and this is quite possibly just my own biased opinion, it seems to me that it's incorrect to assume that all projects would "do better" if they used the same toolset as the kernel. Obviously, no one is forcing me to use git (well, actually, some projects at work are), but the popularity of github and the prevailing monoculture is a touch alarming.

Necessity mother of invention

Posted Nov 19, 2013 12:36 UTC (Tue) by dlang (guest, #313) [Link] (2 responses)

I'll post more later (after I get some sleep), but a couple details.

when you are talking about deleting branches, you are not actually changing the history. Git also has immutable history (you can go back and create a new branch/history, but it's not going to result in the same ID (part of what goes into the SHA1 id is the parents of the commit)

For most of the rest of what you are talking about, it requires that you have an 'authoritative' repository that can control the naming for all other repositories. With git there is no such central control, all repositories are technically equal (including the one on my laptop and the one Linus pushes to on kernel.org), so it's impossible to levy any requirements on naming across all copies of the repository.

I'll read the rest of your post and reply in more detail later.

Necessity mother of invention

Posted Nov 21, 2013 13:29 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

I think you can delete a branch - it's called "pruning". But that is only for ABANDONED code. You delete the reference to the head, and then a repository tidy-up finds all these "unreferenced commits" and gets rid of them.

But as others have said, if you want to keep the code, you merge your bug/feature branch back into mainline, and your branch has become a loop in the main tree.

btw, I've seen it commented elsewhere that people tend to like either mercurial or git - they "get" one approach and find the other difficult. That doesn't mean one is better than the other :-)

Cheers,
Wol

Necessity mother of invention

Posted Nov 21, 2013 18:58 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

When I played around with hg for a while, I ended up getting mixed up with the different verbiage between the two. This started the "ugh…things are *different*" annoyances which just snowballed from there. As for features, I really miss the index when working with non-git repos. It's just too ingrained into the way I work now to try and go back to svn-style commit preparation.

Necessity mother of invention

Posted Nov 19, 2013 16:13 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (2 responses)

> say if I have a "dev" and "stable" branch and I'm told to implement feature X documented in bug18723, I might well create a branch off "dev" called "bug18723" which implements it one way, but it turns out that so much has changed in the mean time and that management says we must backport this feature, that I create an entirely separate branch off "stable" also called "bug18723". Now these two branches are entirely unrelated in terms of the DAG of changesets, but they conceptually relate to the same feature and so should have the same name. It's basically impossible to deal with this situation in git (yes, you could do a lot of reset --hard bug18723 and track the commit shas manually, but this is basically equivalent to working without named branches, which as you point out, could be done).

Namespace your branches. All feature branches I work on start with 'dev/', if work is abandoned for some long amount of time, I rename it to 'wip/'. Here, I would probably have named the original 'devel/bug18723' and the other 'devel/stable/bug18723'.

> <branch diagrams>

What I'd do here is 'git revert -m1 C' to undo the branch merge (making commit R). 'branch' continues from Y and when it comes time to merge, revert R and then merge 'branch' in again.

I don't think I see why 'junk' is necessary at all here in either case. Is the "C -> D" line a dead head of 'master'? Does 'junk' need to live around forever? I don't like the sound of it, but I may have missed something.

Another solution in git is to branch off of the oldest release branch which the feature may apply to (and I'd rebase it back if more information came in later, but that's me). With conflicts on newer releases, branch again, merge in the newer release branch in, then merge back.

Necessity mother of invention

Posted Nov 20, 2013 10:54 UTC (Wed) by ms (subscriber, #41272) [Link] (1 responses)

> What I'd do here is 'git revert -m1 C' to undo the branch merge (making commit R). 'branch' continues from Y and when it comes time to merge, revert R and then merge 'branch' in again.

That's an interesting idea there - reverting the revert. The only problem there is if other conflicting changesets land on master in the mean time then you'll suffer some merging pain - which, fair enough, you'd also suffer with the merge even if you didn't have the erroneous merge, but I wonder if the revert-the-revert would get in the way if you also regularly merge from master into branch to bring in other changesets you want and to attempt to resolve conflicts ahead of the final merge.

> I don't think I see why 'junk' is necessary at all here in either case. Is the "C -> D" line a dead head of 'master'? Does 'junk' need to live around forever? I don't like the sound of it, but I may have missed something.

Right, so this sort of strategy has been figured out for history-is-immutable DVCSs. Yes, D is the head of a dead fork of master. Generally, having dangling branches is unpleasant - it's nice to have a perfect correlation between dangling branches and open bugs. So in such setups, a "junk" branch is useful as something where you can merge such dead forks into to just tidy up this sort of problem.

Necessity mother of invention

Posted Nov 20, 2013 13:26 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

You'd be fine even in that case. Basically, git undoes the *change* the merge did, but does not mark that the *commits* were undone (reverts are *not* special in git; they're just inverted diffs pre-generated from old commits). When you merge again, git will ignore everything older than merge-base between the branches, which means that commits already merged-then-reverted aren't considered. This is why you need to revert-the-revert: to get the changes the old merge did back from the grave.

If you're unwinding a branch with many merges into master, make a new branch (or head on the afflicted branch, but that breaks down for octopus merges), revert the merges in reverse topological order, then merge it back. In fact, this stuff is *the* reason to *never* do fast forward merges when merging branches in. You get one nice commit to undo rather than a string of N commits.

So any head of a branch which is an ancestor of a head of junk is "dead"? I'd call it "Hercules" :) .