|
|
Subscribe / Log in / New account

Rebasing and merging in kernel repositories

By Jonathan Corbet
June 18, 2019
What follows is a kernel document I have been working on for the last month in the hope of reducing the number of subsystem maintainers who run into trouble during the merge window. If all goes according to plan, this text will show up in 5.3 as Documentation/maintainer/rebasing-and-merging.txt. On the off chance that some potentially interested readers might not be monitoring additions to the nascent kernel maintainer's handbook, I'm publishing the text here as well.

Maintaining a subsystem, as a general rule, requires a familiarity with the Git source-code management system. Git is a powerful tool with a lot of features; as is often the case with such tools, there are right and wrong ways to use those features. This document looks in particular at the use of rebasing and merging. Maintainers often get in trouble when they use those tools incorrectly, but avoiding problems is not actually all that hard.

One thing to be aware of in general is that, unlike many other projects, the kernel community is not scared by seeing merge commits in its development history. Indeed, given the scale of the project, avoiding merges would be nearly impossible. Some problems encountered by maintainers result from a desire to avoid merges, while others come from merging a little too often.

Rebasing

"Rebasing" is the process of changing the history of a series of commits within a repository. There are two different types of operations that are referred to as "rebasing" since both are done with the git rebase command, but there are significant differences between them:

  • Changing the parent (starting) commit upon which a series of patches is built. For example, a rebase operation could take a patch set built on the previous kernel release and base it, instead, on the current release. We'll call this operation "reparenting" in the discussion below.

  • Changing the history of a set of patches by fixing (or deleting) broken commits, adding patches, adding tags to commit changelogs, or changing the order in which commits are applied. In the following text, this type of operation will be referred to as "history modification"

The term "rebasing" will be used to refer to both of the above operations. Used properly, rebasing can yield a cleaner and clearer development history; used improperly, it can obscure that history and introduce bugs.

There are a few rules of thumb that can help developers to avoid the worst perils of rebasing:

  • History that has been exposed to the world beyond your private system should usually not be changed. Others may have pulled a copy of your tree and built on it; modifying your tree will create pain for them. If work is in need of rebasing, that is usually a sign that it is not yet ready to be committed to a public repository.

    That said, there are always exceptions. Some trees (linux-next being a significant example) are frequently rebased by their nature, and developers know not to base work on them. Developers will sometimes expose an unstable branch for others to test with or for automated testing services. If you do expose a branch that may be unstable in this way, be sure that prospective users know not to base work on it.

  • Do not rebase a branch that contains history created by others. If you have pulled changes from another developer's repository, you are now a custodian of their history. You should not change it. With few exceptions, for example, a broken commit in a tree like this should be explicitly reverted rather than disappeared via history modification.
  • Do not reparent a tree without a good reason to do so. Just being on a newer base or avoiding a merge with an upstream repository is not generally a good reason.
  • If you must reparent a repository, do not pick some random kernel commit as the new base. The kernel is often in a relatively unstable state between release points; basing development on one of those points increases the chances of running into surprising bugs. When a patch series must move to a new base, pick a stable point (such as one of the -rc releases) to move to.
  • Realize that reparenting a patch series (or making significant history modifications) changes the environment in which it was developed and, likely, invalidates much of the testing that was done. A reparented patch series should, as a general rule, be treated like new code and retested from the beginning.

A frequent cause of merge-window trouble is when Linus Torvalds is presented with a patch series that has clearly been reparented, often to a random commit, shortly before the pull request was sent. The chances of such a series having been adequately tested are relatively low — as are the chances of the pull request being acted upon.

If, instead, rebasing is limited to private trees, commits are based on a well-known starting point, and they are well tested, the potential for trouble is low.

Merging

Merging is a common operation in the kernel development process; the 5.1 development cycle included 1,126 merge commits — nearly 9% of the total. Kernel work is accumulated in over 100 different subsystem trees, each of which may contain multiple topic branches; each branch is usually developed independently of the others. So naturally, at least one merge will be required before any given branch finds its way into an upstream repository.

Many projects require that branches in pull requests be based on the current trunk so that no merge commits appear in the history. The kernel is not such a project; any rebasing of branches to avoid merges will, most likely, lead to trouble.

Subsystem maintainers find themselves having to do two types of merges: from lower-level subsystem trees and from others, either sibling trees or the mainline. The best practices to follow differ in those two situations.

Merging from lower-level trees

Larger subsystems tend to have multiple levels of maintainers, with the lower-level maintainers sending pull requests to the higher levels. Acting on such a pull request will almost certainly generate a merge commit; that is as it should be. In fact, subsystem maintainers may want to use the ‑‑no‑ff Git flag to force the addition of a merge commit in the rare cases where one would not normally be created so that the reasons for the merge can be recorded. The changelog for the merge should, for any kind of merge, say why the merge is being done. For a lower-level tree, "why" is usually a summary of the changes that will come with that pull.

Maintainers at all levels should be using signed tags on their pull requests, and upstream maintainers should verify the tags when pulling branches. Failure to do so threatens the security of the development process as a whole.

As per the rules outlined above, once you have merged somebody else's history into your tree, you cannot rebase that branch, even if you otherwise would be able to.

Merging from sibling or upstream trees

While merges from downstream are common and unremarkable, merges from other trees tend to be a red flag when it comes time to push a branch upstream. Such merges need to be carefully thought about and well justified, or there's a good chance that a subsequent pull request will be rejected.

It is natural to want to merge the master branch into a repository; this type of merge is often called a "back merge". Back merges can help to make sure that there are no conflicts with parallel development and generally gives a warm, fuzzy feeling of being up-to-date. But this temptation should be avoided almost all of the time.

Why is that? Back merges will muddy the development history of your own branch. They will significantly increase your chances of encountering bugs from elsewhere in the community and make it hard to ensure that the work you are managing is stable and ready for upstream. Frequent merges can also obscure problems with the development process in your tree; they can hide interactions with other trees that should not be happening (often) in a well-managed branch.

That said, back merges are occasionally required; when that happens, be sure to document why it was required in the commit message. As always, merge to a well-known stable point, rather than to some random commit. Even then, you should not back merge a tree above your immediate upstream tree; if a higher-level back merge is really required, the upstream tree should do it first.

One of the most frequent causes of merge-related trouble is when a maintainer merges with the upstream in order to resolve merge conflicts before sending a pull request. Again, this temptation is easy enough to understand, but it should absolutely be avoided. This is especially true for the final pull request: Torvalds is adamant that he would much rather see merge conflicts than unnecessary back merges. Seeing the conflicts lets him know where potential problem areas are. He does a lot of merges (382 in the 5.1 development cycle) and has gotten quite good at conflict resolution - often better than the developers involved.

So what should a maintainer do when there is a conflict between their subsystem branch and the mainline? The most important step is to warn Torvalds in the pull request that the conflict will happen; if nothing else, that demonstrates an awareness of how your branch fits into the whole. For especially difficult conflicts, create and push a separate branch to show how you would resolve things. Mention that branch in your pull request, but the pull request itself should be for the unmerged branch.

Even in the absence of known conflicts, doing a test merge before sending a pull request is a good idea. It may alert you to problems that you somehow didn't see from linux-next and helps to understand exactly what you are asking upstream to do.

Another reason for doing merges of upstream or another subsystem tree is to resolve dependencies. These dependency issues do happen at times, and sometimes a cross-merge with another tree is the best way to resolve them; as always, in such situations, the merge commit should explain why the merge has been done. Take a moment to do it right; people will read those changelogs.

Often, though, dependency issues indicate that a change of approach is needed. Merging another subsystem tree to resolve a dependency risks bringing in other bugs and should almost never be done. If that subsystem tree fails to be pulled upstream, whatever problems it had will block the merging of your tree as well. Preferable alternatives include agreeing with the maintainer to carry both sets of changes in one of the trees or creating a topic branch dedicated to the prerequisite commits that can be merged into both trees. If the dependency is related to major infrastructural changes, the right solution might be to hold the dependent commits for one development cycle so that those changes have time to stabilize in the mainline.

Finally

It is relatively common to merge with the mainline toward the beginning of the development cycle in order to pick up changes and fixes done elsewhere in the tree. As always, such a merge should pick a well-known release point rather than some random spot. If your upstream-bound branch has emptied entirely into the mainline during the merge window, you can pull it forward with a command like::

    git merge v5.2-rc1^0

The ^0 will cause Git to do a fast-forward merge (which should be possible in this situation), thus avoiding the addition of a spurious merge commit.

The guidelines laid out above are just that: guidelines. There will always be situations that call out for a different solution, and these guidelines should not prevent developers from doing the right thing when the need arises. But one should always think about whether the need has truly arisen and be prepared to explain why something abnormal needs to be done.

Index entries for this article
KernelDevelopment tools/Git
KernelGit


to post comments

Merging one commit at a time

Posted Jun 18, 2019 16:14 UTC (Tue) by epa (subscriber, #39769) [Link] (15 responses)

I sometimes rebase instead of merging because it is less error-prone (for me and my workflow). I can review each commit as it is applied, and resolve conflicts at the smallest level of granularity. But then, as the article notes, you have an incompatible history with anything that has been previously published.

Is there a way to get the best of both worlds? Can I ask to merge a branch into mine a single commit at a time? That is, for each commit X on the other branch that's not already an ancestor of my branch, individually 'git merge X', resolving conflicts if necessary. Then go on to the next commit in the other branch, and so on to the end. The revision graph will become spaghetti, but you have individually reviewed each change rather than having to resolve conflicts as a big lump, and at the same time preserved the history of your branch. (If there were long sequences of commits that applied without conflicts, they could be replaced with a single merge commit merging in the last one of the sequence.)

Is there a git tool that will do this?

Merging one commit at a time

Posted Jun 18, 2019 17:21 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (2 responses)

Maybe this tool is what you're looking for? https://github.com/mhagger/git-imerge

Merging one commit at a time

Posted Jun 18, 2019 18:27 UTC (Tue) by epa (subscriber, #39769) [Link] (1 responses)

Yes, the 'rebase-with-history' goal sounds like what I am talking about. Thanks.

Merging one commit at a time

Posted Jun 21, 2019 15:43 UTC (Fri) by andrewsh (subscriber, #71043) [Link]

Mercurial implements rebase-with-history in a very interesting way I would love Git to adopt at some point.

https://www.mercurial-scm.org/doc/evolution/user-guide.html

Merging one commit at a time

Posted Jun 19, 2019 9:45 UTC (Wed) by nilsmeyer (guest, #122604) [Link]

Interesting question, off the top of my head I would think you could start another branch and then cherry-pick in the order you need.

Merging one commit at a time

Posted Jun 19, 2019 16:51 UTC (Wed) by iabervon (subscriber, #722) [Link] (1 responses)

Remember that git is able to use local branches however you'd like to come up with the information you need. You can reparent a local copy of the public branch in order to figure out the correct merge resolution, and then, when you do the merge that's appropriate for the project history, you can compare against the right answer that you worked out separately.

There's nothing saying you can't do the error-prone merge, and, before pushing it, amend it to the error-free result you figured out some other way.

Merging one commit at a time

Posted Jun 19, 2019 17:14 UTC (Wed) by epa (subscriber, #39769) [Link]

That's right. (What I sometimes do is to merge first, then make a rebased version of the same changes, and check they are the same.) But this does still lose some information in that you have a monster merge commit whose exact origins aren't clear. Sometimes it may be better to "show your working" by resolving the conflicts with the first part of your patch series (leaving a revision which is meaningful in itself, can be built and tested), then apply the second part of your patch series, etc. Rebase achieves this, but at the expense of losing history and annoying anyone who is referring to the branch (possibly including myself). The tool suggested above looks like the best of both worlds.

Merging one commit at a time

Posted Jun 19, 2019 19:19 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link] (8 responses)

It's pretty simple, actually. Assuming you have two somehow identified commits, eg, tags, one representing and old version of something which is already incorporated in your code and one denoting a new version, the command
git rev-list --reverse "$old".."$new"
can be used to get a list of all commits between old and new in chronological order. These can then be applied one-by-one via
git cherry-pick "$commit"
I have a fairly simple (77 lines) script for this which aborts in case of a merge conflict after putting a tag with a magic name on the last 'clean' single commit merge. After resolving the conflict, the script can just be rerun. It'll find the last successful cherry-pick with the help of the tag, delete the tag and continue merging the remaining commits in the sequence.

Merging one commit at a time

Posted Jun 19, 2019 20:40 UTC (Wed) by mbunkus (subscriber, #87248) [Link] (3 responses)

Isn't that exactly what "git rebase … --onto …" with "git rebase --continue" after fixing conflicts does? With slightly different branch setups, of course.

Merging one commit at a time

Posted Jun 19, 2019 21:13 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link] (2 responses)

It's more or less what cvs -j... -j... update does in order to incorporate changes from a vendor branch. git rebase saves a set of local modifications and reapplies these on top of a new upstream commit. The outcome (after fixing conflicts) as a branch whose history has been changed as if the local modifications had been applied on top of the new commit. In the other case, it a modified branch with upstream updates merged into it, ie, no history change.

Merging one commit at a time

Posted Jun 20, 2019 12:19 UTC (Thu) by mbunkus (subscriber, #87248) [Link] (1 responses)

I was thinking along these lines: using your nomenclature of $old..$new as the range of commits you want to cherry-pick on top of your current branch (let's call it "work"), you could do:

git checkout -b $new work-with-old-to-new
git rebase --onto work ${old}~

The second command will

1. reset the current branch ("work-with-old-to-new") to the revision given with "--onto", meaning "work" and "work-with-old-to-new" will point to the same commit,
2. take the commits in range $old..$new and cherry-pick each of them on top of the current branch (still "work-with-old-to-new")
3. If any of the cherry-pick attempts fails, git will pause rebasing, let you fix the situation (resolve conflichts, git add, git commit…) and then you run "git rebase --continue" and git will continue cherry-picking where it was interrupted.

The result is that your new branch "work-with-old-to-new" will now contain all commits from the range $old..$new on top of what's already present in "work". And that sounds pretty much like what you described.

This procedure has nothing to do with "upstream" or with which branch "work" tracks.

Merging one commit at a time

Posted Jun 20, 2019 14:16 UTC (Thu) by rweikusat2 (subscriber, #117920) [Link]

You can obvioulsly invert the inverted operation to end up with a more-or-less similar outcome. But the result will then be a trashed upstream branch as that has been rebased onto a work branch. Now, please don't inform me that one could branch the upstream branch solely for the reason to have a branch to trash. I understand that as well. But it's not the intended outcome which is integrate changes from the upstream branch into the work branch in a manageable fashion.

Merging one commit at a time

Posted Jun 19, 2019 21:18 UTC (Wed) by epa (subscriber, #39769) [Link] (1 responses)

You suggest cherry-picking commits. But that loses the history, at least in a form that git can later use automatically. A new commit is created with the same changes but the original cherry-picked commit does not become an ancestor. Indeed, ‘git rebase’ can be considered a way of cherry-picking a series of commits.

That’s why I asked about a tool which, instead of cherry-picking, does a merge at each step, leaving a revision that shows what happened and can in turn be merged automatically with others that might have done a similar thing.

Merging one commit at a time

Posted Jun 20, 2019 14:21 UTC (Thu) by rweikusat2 (subscriber, #117920) [Link]

I was assuming git could keep track of this as subversion can. If it can't, that's something which will need to be investigated and eventually fixed.

Merging one commit at a time

Posted Jun 20, 2019 9:48 UTC (Thu) by geert (subscriber, #98403) [Link]

Instead of rev-list, you can also use cherry:
git cherry -v $old $new
This will show all new commits, prefixed with a - or +, indicating if they are or are not yet present in the old branch. Then you can cherry-pick the ones prefixes with +. (yes, this is the same as what git rebase --onto does).

Merging one commit at a time

Posted Jun 23, 2019 17:21 UTC (Sun) by nix (subscriber, #2304) [Link]

This seems like a more verbose and error-prone way of doing what git cherry-pick $a..$b already does. git cherry-pick has supported ranged operations for many years now: you should probably think about discarding tools that reimplement that feature.

(You can tell if the merge failed via the exitcode of the git cherry-pick: git cherry-pick --abort then gets you back to a consistent state.)

Rebasing and merging in kernel repositories

Posted Jun 18, 2019 17:58 UTC (Tue) by pm215 (subscriber, #98099) [Link] (2 responses)

One git feature I think it might be nice to have and which could perhaps avoid some inadvertent issues would be if you could mark git branches as "this branch rebases" vs not; and then have 'git merge' complain about attempts to merge rebasing branches. Often I know that a branch I have is going to rebase; but there's no way for me to advertise that fact to people who might fetch it.

Rebasing and merging in kernel repositories

Posted Jun 18, 2019 18:32 UTC (Tue) by epa (subscriber, #39769) [Link]

I'd like to be able to add a terminator commit to a branch; a kind of waxen seal which is the last commit and which stops git from allowing any further commits (unless you give a --force flag). That way, you can make a new branch name when you rebase, as foo_1, foo_2, and so on, while at the same time 'terminating' the old branch so that nobody will be tempted to start work on it. (The log message for the terminator commit could mention the new rebased branch to work on instead.) That is better than deleting the branch altogether. By default web interfaces could hide branches marked as terminated.

Rebasing and merging in kernel repositories

Posted Jun 19, 2019 7:57 UTC (Wed) by marcH (subscriber, #57642) [Link]

Rebasing and merging in kernel repositories

Posted Jun 19, 2019 10:41 UTC (Wed) by sandsmark (guest, #62172) [Link] (1 responses)

> Realize that reparenting a patch series (or making significant history modifications) changes the environment in which it was developed and, likely, invalidates much of the testing that was done. A reparented patch series should, as a general rule, be treated like new code and retested from the beginning.
> [...]
> A frequent cause of merge-window trouble is when Linus Torvalds is presented with a patch series that has clearly been reparented, often to a random commit, shortly before the pull request was sent. The chances of such a series having been adequately tested are relatively low — as are the chances of the pull request being acted upon.

Wouldn't this be true for merges done at "random" points as well?

Rebasing and merging in kernel repositories

Posted Jun 19, 2019 12:37 UTC (Wed) by timokokk (subscriber, #52029) [Link]

If you mean that when Linux pulls your well tested branch, he is now creating yet another "random commit" that is no longer tested and guaranteed to work as well as the tip of your branch. That is absolutely correct. There can be some interaction with the rest of the commits merged and your branch that make them somehow incompatible to each other. But the merge commit itself is the discontinuity point that indicates the unknown incompatibility with your branch and the rest of the stuff that was merged elsewhere. It is then obvious, for example via "git bisect", that the issue was introduced between creating the topic branch and getting it merged.

If you were to rebase your commits before sending them for merging, then git bisect would indicate the commits itself as broken, but it wouldn't be obvious at all that they used to work before rebasing.

Rebasing and merging in kernel repositories

Posted Jun 19, 2019 12:03 UTC (Wed) by randomguy3 (subscriber, #71063) [Link]

> Do not rebase a branch that contains history created by others. If you have pulled changes from another developer's repository, you are now a custodian of their history. You should not change it. With few exceptions, for example, a broken commit in a tree like this should be explicitly reverted rather than disappeared via history modification.

The last sentence of this paragraph is hard to parse - it's taken me about three re-reads to figure out what you meant by it. I think something like "for example, a broken commit in a tree like this should (with few exceptions) be explicitly reverted rather than disappeared via history modification." is easier to read.

Rebasing and merging in kernel repositories

Posted Jun 19, 2019 13:36 UTC (Wed) by imMute (guest, #96323) [Link] (2 responses)

>If your upstream-bound branch has emptied entirely into the mainline during the merge window, you can pull it forward with a command like:
> git merge v5.2-rc1^0
>The ^0 will cause Git to do a fast-forward merge (which should be possible in this situation), thus avoiding the addition of a spurious merge commit.

This is surprising to me! I would have used "git merge --ff-only v5.2-rc1" so that if a fast-forward isn't possible, I'm alerted to that immediately.

Rebasing and merging in kernel repositories

Posted Jun 20, 2019 2:15 UTC (Thu) by draco (subscriber, #1792) [Link] (1 responses)

You're right to be surprised -- it's incorrect. From gitrevisions(7):

> As a special rule, <rev>^0 means the commit
> itself and is used when <rev> is the object name of a tag object
> that refers to a commit object.

Nothing there says anything about fast forward vs non-fast forward merges. Double-checking locally (just in case git's behavior changed on me...), it makes no difference whether I merge an annotated tag or the underlying commit hash -- I get fast forwards either way -- so the ^0 does nothing useful.

The text should say this instead:
>If your upstream-bound branch has emptied entirely into the mainline during the merge window, you can pull it forward with a command like:
> git merge --ff-only v5.2-rc1
>The --ff-only will tell Git to do a fast-forward merge (which should be possible in this situation), thus avoiding the addition of a spurious merge commit.
> If the command fails, there's a disagreement between your branch and the mainline.

Rebasing and merging in kernel repositories

Posted Jun 20, 2019 12:48 UTC (Thu) by corbet (editor, #1) [Link]

FWIW, the "^0" advice comes straight from Linus. Otherwise you can get a merge commit when fast-forwarding to a signed tag. --ff-only can also do the trick, but it's more typing...

Gerrit => squash, rebase and commit --amend ?

Posted Jun 19, 2019 14:17 UTC (Wed) by Tov (subscriber, #61080) [Link] (4 responses)

Apparently a lot of different work-flows are possible with git. As an almost exact opposite to the Linux workflow, I have tried to contribute to a project using Gerrit, which required the developers to submit one commit at a time! The project instead encourages liberal use of squash, rebase and use of commit --amend to groom these individual commits before merge.

I will hopefully not use Gerrit again, if I can avoid it...

Gerrit => squash, rebase and commit --amend ?

Posted Jun 19, 2019 16:14 UTC (Wed) by cpitrat (subscriber, #116459) [Link] (2 responses)

Are you sure this was a constraint coming from Gerrit? I never used it but it seems quite popular and I'm surprised so many people would accept such an annoying restriction.

Gerrit => squash, rebase and commit --amend ?

Posted Jun 19, 2019 16:29 UTC (Wed) by pizza (subscriber, #46) [Link] (1 responses)

Like a lot of other things with gerrit, it's configurable and largely up to the policies the admins set up.

Gerrit => squash, rebase and commit --amend ?

Posted Jun 19, 2019 19:03 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

AFAIK, there's still no branch overview and review is very much a commit-by-commit basis. Yes, it helps reduce the "fix whitespace" or "typo fix" junk commits by being more commit-focused, but, IMO, it goes too far and ends up being awful (e.g., Change-Id makes a nightmare for branches with shared history to both be in-flight at the same time) for more complicated changesets.

Gerrit => squash, rebase and commit --amend ?

Posted Jun 20, 2019 9:19 UTC (Thu) by jezuch (subscriber, #52988) [Link]

You can post a merge commit for review. A merge commit is a commit after all, though it needs to be treated specially and it's a little awkward in Gerrit. But I've done it and it seems to work.

Rebasing and merging in kernel repositories

Posted Jun 19, 2019 17:37 UTC (Wed) by k8to (guest, #15413) [Link] (6 responses)

I find this whole topic a bit sad.

If git had a greater understanding of branches beyond "this is a hash", and/or could wrap commits into commit groups, then so much of this pain could just go away.

Not all of it,sometimes sophistication is necessary, but git makes everyone make manual decisions all the time without any clarity on the decisions being made, and forces many workflows to be destructive towards the commit history, which creates so many of these problems.

Git was a brilliant hack, but I feel over a decade in that it's time to enhance the data model a bit to provide better support for the common cases.

Rebasing and merging in kernel repositories

Posted Jun 19, 2019 18:30 UTC (Wed) by pizza (subscriber, #46) [Link] (1 responses)

> forces many workflows to be destructive towards the commit history, which creates so many of these problems.

I find it peculiar this is somehow a flaw of git, when prior to git, it was generally impossible to implement those workflows within your revision control system.

Meanwhile, git doesn't force anyone to use history-destructive workflows; indeed, any sort of "lossy" operation requires explicit instructions from the user.

Every team that chooses to use history-destructive workflows that git enables does so because they found that it *improves* their overall productivity or code quality/debugability (eg each commit is more logically self-contained and can be individually regression-tested).

Rebasing and merging in kernel repositories

Posted Jun 19, 2019 19:03 UTC (Wed) by k8to (guest, #15413) [Link]

Running a large project without squashing commits is really a mess. I don't think it's practicable to do otherwise, so yes rewritting history is largely mandated. Mercurial for example leaves this as pretty optional.

Rebasing and merging in kernel repositories

Posted Jun 19, 2019 19:01 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (3 responses)

Are you proposing that branch names be stored within commit objects (like Mercurial)? If so, I really don't want to see that because there are millions of branches named "patch-1" or PRs coming from "master" that really deserve better names. Or for long-lived branches, splitting out smaller stepping stones into its own branch for merging on a faster timeline is very valuable.

If you're proposing a new type of object, I could get behind that as long as the branch surgery operations are possible (but, e.g., then require a new name for the set of commits).

Rebasing and merging in kernel repositories

Posted Jun 19, 2019 19:08 UTC (Wed) by k8to (guest, #15413) [Link] (1 responses)

I would expect some kind of branch object to exist with a many-to-many relationship with commits which is used infrequently.

I would hope that this could be leveraged for goals like "What work has happened in this branch since it was created?" as well as the ability to set useful per-branch state on something smarter than the name.

As an aside, it's interesting that you refer to many poorly named branches. In my world I never create those. They're typically always something like <defectID>-<description> or <requirementID>-<description> Yeah, over the lifetime of the project they are, as an aggregate, noise, but I wouldn't ever "push" a branch with a stupid name. But it isn't hard to imagine that other projects are different.

Rebasing and merging in kernel repositories

Posted Jun 19, 2019 19:25 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

That sounds like a useful object. Not sure how often I'd use it, but it would be nice to associate stashes with branches as well instead of my standard "wipwipwip" commit message.

I work on projects mainly via with gitlab and github. Edits made via the web get those "patch-n" names and many people just commit directly to their fork's "master" and then open an MR. Once that's made, the branch name is locked in without closing it and opening a new one. We use a robot to do our merges (it does backporting to release branches and Acked-by/Reviewed-by trailer scraping for the merge commit message) and since we have that, we can tell it to merge it with a better branch name in the merge commit message too.

Rebasing and merging in kernel repositories

Posted Jun 22, 2019 6:02 UTC (Sat) by marcH (subscriber, #57642) [Link]

> Are you proposing that branch names be stored within commit objects (like Mercurial)? If so, I really don't want to see that because there are millions of branches named "patch-1" or PRs coming from "master" that really deserve better names.

There could be a distinction between private branch names versus branches names meant to published. In fact git tags already miss that.

Rebasing and merging in kernel repositories

Posted Jun 20, 2019 1:00 UTC (Thu) by jgg (subscriber, #55211) [Link] (1 responses)

On the topic of merges it would be useful to explain when Signed-off-by should be applied to the merge commit. I think it should absolutely be present if the merge was not done totally automatically, otherwise I've seen it randomly placed in merge commits..

Linus also gave me some advice on the diffstat - if you have a PR that contains other trees, already merged, the diffstat produced automatically by git is misleading. I now build the diffstats off the branch that contains the merge conflict resolution, so it looks closer to reality - ie Linus does not see a diff stat with a wack of drivers/net changes in RDMA's tree because he already merged that stuff via netdev's tree.

It would also be nice to have some words on merging downstream pull requests. For instance, if someone sends their work on rc5 because their work has dependencies on rc5, should a maintainer merge that into a tree based on rc1? Or should they first merge rc5 and then merge the pull request (I've been doing the former, mostly, as it results in less questionable merge commits)

The 'Changing the parent commit' is also a bit weird. I fully agree with the concept that we should merge the patch as it was tested - however in many trees patches come from the mailing list via 'git am', so they have 'changed the parent commit'. People regularly re-parent and respin mailing list applied patches so they apply cleanly. Standard practice.

But people working with a PR work flow should generally not reparent this, except I've seen some maintainers ask for PRs to apply without conflicts. Maybe this documentation will help make things more uniform.

Rebasing and merging in kernel repositories

Posted Jun 20, 2019 12:17 UTC (Thu) by jnareb (subscriber, #46500) [Link]

Note that thanks to `git rebase --exec` you can run at least "does it compile" test for each rebased commit, automatically (note however that it makes Git automatically use "interactive rebase" machinery).

Rebasing and merging in kernel repositories

Posted Jun 22, 2019 6:14 UTC (Sat) by marcH (subscriber, #57642) [Link]

Very useful document, thanks!

There are basically two classes of problems I've met in my unofficial and too frequent git support role:
1. The users' lack of computer science knowledge or lack of interest for graphs and other concepts required by git. As clearly explained in "10 things I hate about git", git makes the complex possible (love it) and the simple... not easier. Asking git questions in interviews is great because it's like git was designed to block rote learning.
2. The lack of thought that was put in the lack of user interface. See for instance "git reset demystified" or "git help checkout"

2. is exactly why words like "reparenting" (nice!) show up in mitigation, sorry I mean in documentation like this. From a plain English perspective there should be no conceptual difference between "rebasing" versus "reparenting", there's a difference only because "git rebase" is yet another misnamed command. It should have been called "git replay". Or split into the two commands "git rewrite [-i]" and "git reparent" that you described. Yes these two commands would be just variants sharing most of their implementations - who cares.

"replay" and "rewrite" aside, the word I recommend you use is "volatile" (vs final) for a branch or SHA1. I haven't seen yet a better word to qualify what you refer to in large parts of this draft.

Another useful thing to know and maybe worth mentioning: git cherry-pick accept ranges (on --stdin even) which *for some cases* provides an alternative more convenient than "git reparent". It's also a better way to "test merge" because it divides and conquers conflicts.

> The chances of such a series having been adequately tested are relatively low

a.k.a. "rebasing is lying". You could say that amending commits and rewriting history in private is OK (and recommended) because you're only lying to yourself. But don't lie publicly.

Rebasing and merging in kernel repositories

Posted Jun 26, 2019 17:08 UTC (Wed) by marcH (subscriber, #57642) [Link]

Just FYI, github tries to avoid these issues by discouraging volatile commits, having very poor support for force pushes and favoring a "fixup + squash" workflow instead. More details in https://github.com/zephyrproject-rtos/zephyr/pull/14444

(Just a datapoint; not an "endorsement")


Copyright © 2019, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds