Taking just the diff

Posted Jan 17, 2025 9:04 UTC (Fri) by epa (subscriber, #39769)
In reply to: Issue IDs by ewen
Parent article: The many names of commit 55039832f98c

Maybe git needs the concept of a ‘disembodied commit’ which stores a diff. The SHA of the commit is the SHA of the text diff as you say. But such a commit has no ancestor. If the diff applies cleanly you can ‘merge’ the disembodied commit into your branch, creating a merge commit whose ancestors are the previous state of your branch and the disembodied commit. Unlike cherry-picking, this would keep the same SHA, making it easier to see which branches have received a fix.

Taking just the diff

Posted Jan 17, 2025 11:58 UTC (Fri) by TomH (subscriber, #56149) [Link] (6 responses)

It already has git patch-id which basically does exactly that and computes an ID for a patch. I believe the primary use currently is internal to identify when two commits are essentially the same and help avoid unnecessary conflicts.

Taking just the diff

Posted Jan 17, 2025 13:46 UTC (Fri) by epa (subscriber, #39769) [Link] (5 responses)

So I guess it needs better tooling so you can ask the question "has this patch gone into my branch" just as easily as "is this commit an ancestor of my branch".

Taking just the diff

Posted Jan 17, 2025 13:54 UTC (Fri) by geert (subscriber, #98403) [Link] (4 responses)

The former is much more resource-intensive than the latter.

Taking just the diff

Posted Jan 17, 2025 18:16 UTC (Fri) by k3ninho (subscriber, #50375) [Link] (2 responses)

Am I missing something? "Has this patch gone into my branch?" is "Are the changes in this patch in place?"

The diff has end-result line numbers and expected text, you can test "does this line match this patch?" pretty cheaply. If the line number doesn't match up, the unified diff gives you three lines to find so you can check after and before the three lines following. Maybe the lines get split up, but that's a case that a tool can't work out intent.

K3n.

Taking just the diff

Posted Jan 17, 2025 18:47 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

How do you deal with reverts or subsequent edits to the line?

Taking just the diff

Posted Jan 18, 2025 8:41 UTC (Sat) by epa (subscriber, #39769) [Link]

I was thinking that if you applied a given patch, but then made later changes that reverted it or rearranged or even deleted the code, it would still count as having “gone into” that branch. This is similar to checking whether a commit is an ancestor of your branch — it either is or isn’t, for all that later changes might have modified the effect of that commit.

Taking just the diff

Posted Jan 19, 2025 1:59 UTC (Sun) by Heretic_Blacksheep (guest, #169992) [Link]

I'd rather have a family tree that takes a little time to (automatically) trace across all possible branches than no family tree and no easy way to track changes at all. Computers are supposed to be there to do the work for humans with useful and accurate results, not increase manual cognitive load and time needed to sort through disparate not-quite-linked identifiers.

A single unique ancestral id assigned strictly to committed content in which all children and cousin ids including changes to the ancestor are hard linked is preferable to systems where there can be confusion generated as to when, if, and why content may or may not have been integrated into the target work. Effectively what's needed in the kernel is a system in which all changes have such a unique content ID that can be referenced regardless of the sub project. The current system really doesn't appear to work in all cases at the scale the kernel is working at. And the DRM groups need to stop thinking they're independent of "downstream" kernel, when that's obviously a fiction. They're as dependent on kernel features just as the kernel is dependent on their features.

Taking just the diff

Posted Jan 18, 2025 11:00 UTC (Sat) by em (subscriber, #91304) [Link] (1 responses)

It would still not work. When cherry-picking some adjustments are often needed, changing the hash of the changes.

Taking just the diff

Posted Jan 18, 2025 21:17 UTC (Sat) by epa (subscriber, #39769) [Link]

But that’s exactly what I am talking about. When you merge a branch and resolve conflicts, you can make any changes you want as part of the merge. You could even end up not applying any of the change ostensibly being merged in (perhaps because the affected feature and its source code no longer exist in your branch). But still git will show that you performed a merge and the other branch is now an ancestor of yours. And this is by design.

(Another way to see whether a branch has been merged would be to look at the code and check the changes in the other branch are present in yours. This is what people sometimes had to resort to in older version control systems without change tracking, or which were not distributed, or indeed if they had no VCS at all. Both approaches have their advantages, but often we prefer the cleaner one based on explicitly tracking commit history.)

Now moving from whole branches to individual changes. You could try to work out whether a cherry-pick has been applied by looking at the current state of the code, or going back over the branch history to see if a similar-looking diff has been applied in the past. But as you say that falls down if there were conflicts or for other reasons the patch wasn’t applied exactly.

Instead, I suggest a metadata-based approach where you can create a “disembodied commit” which has a patch to apply but no ancestor commits. Merging this commit into your branch is usually simpler than merging a whole other branch, which may have unrelated changes (unless your programmers are very disciplined about always creating “daggy fixes” where the commit fixing a bug is an immediate child of the one that introduced it). Indeed merging this disembodied commit is the same as cherry-picking it, as far as the code goes. But unlike cherry-picking, the disembodied commit keeps the same SHA, and is now an ancestor of your branch. That would make it easy to ask “has this fix been applied?” without having to match the file contents, and even when the fix is only given as a text diff against some slightly different version of the codebase.

The SHA of the disembodied commit could be the hash of the original set of changes (or textual diff) but that same SHA would still be used even if you had to resolve conflicts on merging. Again, just as happens for regular branches and regular merges.