|
|
Subscribe / Log in / New account

Git evolve: tracking changes to changes

By Jonathan Corbet
November 11, 2022
The Git source-code management system exists to track changes to a set of files; the stream of commits in a Git repository reflects the change history of those files. What is seen in Git, though, is the final form of those commits; the changes that the patches themselves went through on their way toward acceptance are not shown there. That history can have value, especially while changes are still under consideration. The proposed git evolve subcommand is a recognition that changes themselves go through changes and that this process might benefit from tooling support.

Some patches are applied to a project's repository soon after being written, but other take more work. Consider, for example, support for stackable security modules, which has been through (at least) 38 revisions over many years. If and when this work lands in the Linux kernel mainline, it will bear little resemblance to what was initially posted years ago. Each revision will have undergone changes that will have rippled through much of the 39-part patch set. Git can support iteration on a series like that, but it can be a bit awkward, leading many developers to use other tools (such as Quilt) to manage in-progress work.

Commits, meta-commits, and changes

The proposed evolve functionality for Git adds a new level of tracking for "meta-commits", which can be thought of as versions of a specific commit. The meta-commits describing the history of a given commit are stored in a special branch that is called a "change". The documentation tosses the "meta-commit" and "change" terms around almost as if they were interchangeable and, for the most part, they can be thought of as the same. Meta-commits simply hold the history of a change — the evolution that a given commit has gone through over time.

Consider an extended example: if a developer does some work and commits it, the result will be the new commit itself (identified by its hash — we'll call it A in this case) and a meta-commit stored in a new change branch with a name like metas/mc1 (the naming of changes is a subject of its own, with the obligatory hook so that users can add scripts to generate their own names). The result is a structure that, given the limits of your editor's diagramming skills, can be represented as:

[The first commit]

Here we see the new commit A on the trunk branch in the local repository; the change branch metas/mc1 contains a meta-commit with reference to the hash of that commit.

Now imagine that this commit, like many, is not perfect in its initial form; it will need to be improved. If, later on, this developer uses a command like git commit --amend to change this commit, Git will update the metas/mc1 change to refer to the hash of the amended commit (B here), but also to note that this commit "obsoletes" commit A:

[The first commit, amended]

The old commit A will remain in the repository and can be consulted if, later, somebody wants to see what changed between A and B.

If our developer adds another commit C (without --amend) to the same branch, the result will be another change branch, call it metas/mc2, referring to this new commit:

[A second commit]

A large patch series will thus have a number of active change branches, one for each commit in the series. Notably, the mechanism described above ensures that the change name for each commit in the series remains stable, even as the commits themselves are changed. The first commit in the series is metas/mc1, even as that commit itself evolves over time and its hash changes. There is a set of commands to list the known change, and a simple git reset or git checkout command can be used to reset the branch to a given change.

Now suppose that commit B needs further changes; our developer has inexplicably forgotten to use reverse Christmas-tree ordering for their variable declarations and has been called out on it. They can use git reset to go back to that commit — the one described by metas/mc1 — to fix this unacceptable state of affairs. A bit of editing and a new git commit with --amend will yield a new commit D, and metas/mc1 will be updated to reflect the fact that commit B has been obsoleted.

[Amending the first commit]

The first commit in the series has been updated, but now our second commit in the series (C), the one described by metas/mc2, still has the old commit B as its parent, so the sequence has been split. If the developer now runs git evolve, though, all changes that were based on metas/mc1 (in any version) will be rebased, recreating the full change history.

[After git evolve]

The commit formerly known as C has been rebased on top of D, restoring the full patch series. The git evolve command can also be used to update a set of changes to a new base in the repository — rebasing all of the changes to reflect changes merged elsewhere.

More than rebase

Thus, git evolve can be used somewhat like git rebase, but there are some differences. Perhaps most significant is that commits can be modified in various places in the stream, then all evolved together at the end. A developer can, for example, make changes to patches 3, 7, and 9 of a 12-part series, each isolated from the other, then use git evolve to stitch the sequence back together at some future time.

Another difference is that the change history might not be strictly linear. As a simple example, imagine a repository with a single commit; the developer could amend that commit to create a new change, like the commit B shown above. If, then, our developer uses git reset to get back to the pre-amend commit (A) and amends it again, there will now be two changes, each of which obsoletes commit A. The documentation calls this "divergence"; the change history for a patch series can contain any number of divergences and changes built upon them. A divergence could be caused by trying alternative fixes for a problem, for example.

Git will be able to track that divergence indefinitely, but there will come a point when things need to be resolved. For example, if the developer runs git evolve, Git will need to know how to resolve the divergence so that it can rebase the rest of the series. The usual resolution at that point is to do a merge of the diverging changes, but it is also possible to simply pick one side.

Since changes are Git branches in their own right, they can be pushed and pulled between repositories. So developers can share the current state of their work — and how it got to that state — with other developers or with some sort of change-tracking system. Anybody who can access a change can review the various versions of the patch and see the direction in which the work is heading.

Finally, changes are ephemeral, in that they are really only relevant until the work they described is finalized and committed to a trunk branch. At that point, the change is presumably perfect and the story of how it got to its current state is no longer of interest. So, whenever a git evolve command sees that the commit described by a change has been merged, it will automatically delete the changes themselves. So a developer's set of active changes will normally reflect the work that is actually in progress at any given time.

An evolving story

The above description was mostly taken from this document describing the proposed feature. The document is thorough and detailed, but a bit challenging. Your editor only had to read it a dozen times or so, though, to get a superficial understanding of what is going on.

The git evolve patches are not new; indeed, they have been through a fair amount of evolution themselves. An initial design for the feature was posted by Stefan Xenos in late 2018, and the first implementation patches came out in January 2019. The most recent version of these patches, as of this writing, was posted by Christophe Poucet in early October. There has been interest in the patches over the years, but the complexity of the feature also arguably makes it hard for others to properly review.

As a result, it is still not clear whether git evolve will find its way into the Git mainline or not. There are some clear use cases for git evolve, and each version of the patch set has evoked active discussion. Whether the benefits of the feature justify the added complexity will be something for the Git maintainers to evaluate, though. If the evolve functionality can clear the bar, it could enhance Git with features that developers currently must seek in other tools. Some more complexity evolved into Git here might thus simplify life overall.


to post comments

Git evolve: tracking changes to changes

Posted Nov 11, 2022 16:27 UTC (Fri) by magfr (subscriber, #16052) [Link] (1 responses)

What is the difference between this and and feature branches?

Differences from feature branches

Posted Nov 11, 2022 16:31 UTC (Fri) by corbet (editor, #1) [Link]

The new functionality will track the changes in a feature branch over time, not just the state of the branch itself. It is also intended to make it easier to change commits in the series that makes up the feature branch then stitch the results back together into a coherent series again.

Git evolve: tracking changes to changes

Posted Nov 11, 2022 16:51 UTC (Fri) by willy (subscriber, #9762) [Link] (3 responses)

Hard to git send-email (I imagine?) Maybe this will finally be the impetus to move away from antediluvian patch review processes.

Git evolve: tracking changes to changes

Posted Nov 13, 2022 8:49 UTC (Sun) by milesrout (subscriber, #126894) [Link] (1 responses)

Typical unproductive Reddit/HN-level comment. "Old = bad, me very smart". Just don't.

Git evolve: tracking changes to changes

Posted Nov 14, 2022 0:28 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

I'm pretty sure "willy" is intimately familiar with how the Linux kernel patch submission process works. Hell, his patchsets have had numerous whole LWN articles written about them. His opinions on it are well-informed at least by experience in my book.

Git evolve: tracking changes to changes

Posted Nov 15, 2022 10:25 UTC (Tue) by geert (subscriber, #98403) [Link]

Do you want to send all of this using git send-email?
For submission (assuming the Linux kernel), you only want to send the latest version, with all the metas/mcX metadata converted into human-readable form, and inserted below the "---" lines of the corresponding patch.

Git evolve: tracking changes to changes

Posted Nov 11, 2022 16:58 UTC (Fri) by josh (subscriber, #17465) [Link] (1 responses)

I'm surprised this tracks each *individual* commit separately, rather than tracking the history of a whole series of patches. It's not too hard a problem to look at a 9-patch PATCHv1 series, and a 11-patch PATCHv2 series, and figure out which patches in v2 correspond to which ones in v1, including across reordering. There are several versions of that code, including in upstream git.

Git evolve: tracking changes to changes

Posted Nov 11, 2022 17:48 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

I'm not sure if they did it that way because "that's how Mercurial does it"* or because it's meant to replace the "squash before pushing" pattern (i.e. instead of building a stack of patches that you're just going to squash, you just build one patch and repeatedly amend it - the meta-history is all preserved, so you don't lose anything by doing this).

* The system described in this story is nearly identical to Mercurial's changeset obsolescence feature, except that Mercurial doesn't bother with change branches, and just draws arrows directly between A and B instead.

Git evolve: tracking changes to changes

Posted Nov 11, 2022 18:00 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (1 responses)

At Google, we (well, the more adventurous of us, anyway) use the Mercurial equivalent of this functionality. It generally works quite well, as long as you understand what it is doing and how obsolescence works, and/or stay entirely on the beaten trail and avoid doing complicated things. However, there is a bit of an impedance mismatch between typical Git workflows (i.e. "I just have to get my branch to point at a commit containing the right data, and then I'm done") and typical changeset obsolescence workflows (i.e. "I really would like to preserve the meta-history"). This is usually more of a problem for advanced Git users than for beginners, as the latter are mostly just using the amend command. But if your workflow is rife with weird git reset commands and such, then you might have to think about changeset obsolescence and preserving meta-history, which some Git users probably won't like.

On the plus side, it's really not *that* complicated, and if you already understand Git's data model, then you're most of the way to understanding meta-history (it's just another DAG).

Git evolve: tracking changes to changes

Posted Nov 12, 2022 9:12 UTC (Sat) by Sesse (subscriber, #53779) [Link]

I remember using quilt to do this in Google around… 2008? =) With q4. (Which now is long gone, I believe.)

The most exciting part of this for me is probably the fact that you don't need git rebase to delve into a patch stack anymore. It always felt like a hack (I'm not rebasing anything, I just want to go down here to make a change!), and is too dependent on the current state of the repository not to mess things up. (Not to mention, if you get conflicts during rebase, you have to be careful _not_ to use --amend, because then you suddenly collapsed two patches.) --fixup is possible (and for some strange reason not default) but only if your top patches don't make large differences.

Git evolve: tracking changes to changes

Posted Nov 11, 2022 19:25 UTC (Fri) by flussence (guest, #85566) [Link] (3 responses)

This sounds like a *huge* improvement to the status quo.

I often have to deal with (other people's) long-term divergent forks of still-living upstream codebases, and usually the worst part is this issue of rebasing one forest on top of another — whereupon they either put off the work until there's an unavoidable flag day somewhere down the road, or they give up trying entirely at which point it either becomes a hard fork or risks sinking the entire effort for good. Hopefully things like that are about to become much easier.

Git evolve: tracking changes to changes

Posted Nov 11, 2022 20:43 UTC (Fri) by mb (subscriber, #50428) [Link] (2 responses)

I don't see how this improves the worst part of this process:
Merge conflicts.

Resolving conflicts are really 95% of the work.
If I have to enter 5 or 1 git command doesn't really matter.

Git evolve: tracking changes to changes

Posted Nov 12, 2022 3:18 UTC (Sat) by pabs (subscriber, #43278) [Link]

I rebase long-running feature branches incrementally, dealing with conflicts at the commits where they occur, using git-imerge. Unfortunately this requires a lot of CPU even though it uses bisection to reduce that. There are also mergify and git-mergify-rebase for doing this, but I haven't tried those yet.

https://github.com/mhagger/git-imerge
https://github.com/brooksdavis/mergify
https://github.com/CTSRD-CHERI/git-mergify-rebase

Git evolve: tracking changes to changes

Posted Nov 12, 2022 19:13 UTC (Sat) by NYKevin (subscriber, #129325) [Link]

Have you ever done git push --force and then regretted it? One major goal of changeset obsolescence is to provide a "safe" (or at least safer) alternative to that workflow, and also to make it reasonably practical to overwrite branches that other people are actively working on (git evolve reads the obsolescence information out of the meta-history, and automatically figures out which things need to be rebased or merged to fix up everybody's history). That means you can have multiple developers working on a series of patches, with all the tooling of Git, without having to coördinate their force-pushes or recreate the history in a "clean" state at the end.

Git evolve: tracking changes to changes

Posted Nov 11, 2022 21:49 UTC (Fri) by JamesGuthrie (guest, #161591) [Link] (2 responses)

This reminds me of gerrit, which I became fond of using, once I grokked it. Only being superficially familiar with how gerrit works, it would be interesting to know how similar/different this is.

Git evolve: tracking changes to changes

Posted Nov 14, 2022 10:13 UTC (Mon) by kleptog (subscriber, #1183) [Link] (1 responses)

One of the nice things about the Gerrit approach is that it can also track patches across multiple released versions. So for any particular patch you can immediately see to which releases/branches it has been cherry-picked. I don't know if git evolve solves that problem. It works transparently through cherry-picks and rebases without any explicit support in Git itself. Questions like "to which releases has the following patch been cherry-picked" can easily and reliably be answered from the command line.

Unfortunately, the Gerrit approach has been vehemently rejected by kernel developers. I hope this git evolve can eventually offer the same functionality. Although the documentation accompanying the commit suggests it could replace the Gerrit change-id footer, since it also says the change information is thrown away after merge, I don't think that will happen.

Git evolve: tracking changes to changes

Posted Nov 14, 2022 18:30 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

The way we do this is that, when backporting, you base your topic off of the *oldest* target branch. When merging, we then merge into all relevant branches. Ideally, this allows one to use a single MR and keep things correlated that way. If there are conflicts because branches change over time, you can resolve them on that MR and say "HEAD^2 goes to backport-target" and then that commit goes there while the HEAD always goes to the "newest" branch. If backporting is noticed after merging (or conflicts prove to be too complicated to juggle), cross-referenced MRs with `git cherry-pick -x` to incorporate the in-history link is preferred).

Git evolve: tracking changes to changes

Posted Nov 12, 2022 5:04 UTC (Sat) by eean (subscriber, #50420) [Link] (2 responses)

It's hard to imagine this being useful without tools using it being built on top. The linked document describes some of those. Like having a history of a Gerrit change over time merged into the actual Git repo sounds neat. Otherwise I wonder how much mental load these features will take to execute day-to-day.

Git evolve: tracking changes to changes

Posted Dec 12, 2022 8:02 UTC (Mon) by luismbo (guest, #162653) [Link] (1 responses)

FWIW, Gerrit already stores change histories within the Git repo itself. Git doesn’t download those refs by default and it doesn’t provide a convenient way to browse that history, but Gerrit’s UI displays git commands for fetching specific versions of a change.

Git evolve: tracking changes to changes

Posted Dec 12, 2022 11:21 UTC (Mon) by kleptog (subscriber, #1183) [Link]

In recent versions, Gerrit stores *everything* in Git. Changes, reviews, buildbot verifications, user information, comments (draft and published). There's an external Lucene index for searching but that's it. Actually kinda neat, since you can extract all the activity in a project with just Git commands. It does result in tens of thousands of refs for large projects but git can handle that just fine.

Stupid content tracker indeed.

Orientation of arrows

Posted Nov 12, 2022 9:01 UTC (Sat) by Jandar (subscriber, #85683) [Link]

I find the diagrams hard to parse. I assume the arrow metas/mc1 -> A is in the correct orientation but the arrow B -> C seems wrong. A new commit points to the old commit not the other way around. So some arrows needs to be mentally flipped while reading some not.

Git evolve: tracking changes to changes

Posted Nov 12, 2022 17:16 UTC (Sat) by cyperpunks (subscriber, #39406) [Link] (7 responses)

I am not sure, but this workflow seems to mirror the current workflow with a merge request and "squash commits on push to trunk" frequently used in GitLab.

In a GitLab merge request, all commits during a review process will be available in the merge request forever, while on trunk, only a single squashed commit is visible.

Having such workflow available in native git would be very valuable.

Git evolve: tracking changes to changes

Posted Nov 12, 2022 18:22 UTC (Sat) by Sesse (subscriber, #53779) [Link] (3 responses)

git merge --squash? Not that it sounds similar to me at all; the very point of such a workflow is _avoiding_ to have to squash.

Git evolve: tracking changes to changes

Posted Nov 12, 2022 23:53 UTC (Sat) by eean (subscriber, #50420) [Link]

it allows you to do a squash or repeated --amends while retaining the change history. Because such tooling is pretty common it's easy to imagine that part being straightforward. but a year later and you want to dig into the history of a line of code, how useful or available is all this history.

Git evolve: tracking changes to changes

Posted Nov 13, 2022 8:49 UTC (Sun) by cyperpunks (subscriber, #39406) [Link] (1 responses)

It's similar because you get two sets of history lines. On trunk you have one large commit, this is good as there several consumers of a git and most of them is read only: qa, release eng, docs team, support etc. For developers the full history with all commits is of interest, in gitlab this history is available in the merge request, which is very useful. Without "git evolve" I don't see how to make that happen in native git?

Git evolve: tracking changes to changes

Posted Nov 14, 2022 0:30 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

Why is `git log --first-parent` not sufficient for the teams that you listed there and developers shunted off into "figure out what MRs matter" and fetching them manually?

Git evolve: tracking changes to changes

Posted Dec 9, 2022 10:04 UTC (Fri) by LtWorf (subscriber, #124958) [Link] (2 responses)

Are you sure gitlab forces users to do this?

I've used it (not so often) but I don't recall anything of sorts, just the possibility to do this.

Git evolve: tracking changes to changes

Posted Dec 9, 2022 13:30 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

There are project-level options to force squashing, but it is by no means a "GitLab must do this" kind of thing.

Git evolve: tracking changes to changes

Posted Dec 9, 2022 13:32 UTC (Fri) by anselm (subscriber, #2796) [Link]

In Gitlab, squash-on-merge is entirely optional.

Git evolve: tracking changes to changes

Posted Nov 14, 2022 11:55 UTC (Mon) by make (subscriber, #62794) [Link]

Is this similar to stgit (https://stacked-git.github.io/)?
stgit is one of those tools I cannot imagine living without.

Git evolve: tracking changes to changes

Posted Nov 14, 2022 17:32 UTC (Mon) by NTmatter (subscriber, #118709) [Link] (2 responses)

This seems a bit reminiscent of Fossil's approach to recording history. It tries to retain all of the "work" that went in to your current state, even if it isn't included in the final product and was ultimately discarded or erroneous.

A bit of the relevant discussion is available here: https://www2.fossil-scm.org/home/doc/trunk/www/fossil-v-g...

Git evolve: tracking changes to changes

Posted Nov 14, 2022 22:43 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (1 responses)

At least in Mercurial, obsolete changesets that have only obsolete descendants are considered "hidden" and (mostly) excluded from pushing and pulling. The effect of this is that new clones never receive hidden changesets, and only get the final, "clean" history. If there is no centralized or otherwise long-lived repository, then old hidden commits may be lost entirely. To my understanding, Fossil does not work that way: Old commits are considered an integral part of the repository, and you can't just pretend they don't exist. However, I am far from an expert on Fossil, so I could be entirely wrong about that.

Git evolve: tracking changes to changes

Posted Nov 15, 2022 15:22 UTC (Tue) by gracinet (guest, #89400) [Link]

Hi, Mercurial developer here

> At least in Mercurial, obsolete changesets that have only obsolete descendants are considered "hidden" and (mostly) excluded from pushing and pulling. The effect of this is that new clones never receive hidden changesets, and only get the final, "clean" history.

Yes, not exchanging obsolete changesets is an important design feature. On the other hand, assuming standard use of the Evolve extension, the obsolescence information (obsmarkers) is still exchanged. This helps with keeping consistency in situations where there is no obvious central authority.

That being said, I'm not familiar with your setup at Google. Mercurial being highly configurable, it may be quite different to what I'm used to.

> If there is no centralized or otherwise long-lived repository, then old hidden commits may be lost entirely.

Assuming you mean if the only repository that has them disappears or gets forgotten, this is certainly true. I don't know much about git-evolve (first heard of it through this article), the natural thing to compare would be with plain Git where obsolete commits (not ancestor of any ref) could disappear *from* the original repository (unless GC is disabled).

> To my understanding, Fossil does not work that way: Old commits are considered an integral part of the repository, and you can't just pretend they don't exist. However, I am far from an expert on Fossil, so I could be entirely wrong about that.

Same here

Git evolve: tracking changes to changes

Posted Nov 15, 2022 8:48 UTC (Tue) by taladar (subscriber, #68407) [Link] (1 responses)

This seems very commit focussed. Wouldn't it make more sense to track the evolution of a topic branch relative to the branch it is based on. Having to keep the number of commits as well as the responsibility of each commit the same feels very limiting.

Git evolve: tracking changes to changes

Posted Nov 15, 2022 22:42 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

> Having to keep the number of commits as well as the responsibility of each commit the same feels very limiting.

Mercurial's version of this functionality has no such limitation - you can change the description of a commit, and a commit can have more than one successor or more than one precursor. This is used to represent squash ("fold") and un-squash ("split") operations. You can also change the description as part of an amend, which I can only assume that Git will support, just because it's so straightforward.


Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds