Issue IDs
Issue IDs
Posted Jan 16, 2025 23:57 UTC (Thu) by ewen (subscriber, #4772)Parent article: The many names of commit 55039832f98c
Most other large projects long ago figured out they need (one or more) issue trackers, or some other “tracking ID” to be the link between issues discovered (and the cause of the issues), and their fixes. That gives a stable ID that can be cross referenced, and annotated with more information, without changing the unique ID number.
Here even a (sha256) hash of just the patch diff (not the tree) would be a better choice than an arbitrary merkle tree hash. If the kernel won’t follow modern industry best practice.
Ewen
Posted Jan 17, 2025 8:57 UTC (Fri)
by taladar (subscriber, #68407)
[Link]
Posted Jan 17, 2025 9:04 UTC (Fri)
by epa (subscriber, #39769)
[Link] (9 responses)
Posted Jan 17, 2025 11:58 UTC (Fri)
by TomH (subscriber, #56149)
[Link] (6 responses)
Posted Jan 17, 2025 13:46 UTC (Fri)
by epa (subscriber, #39769)
[Link] (5 responses)
Posted Jan 17, 2025 13:54 UTC (Fri)
by geert (subscriber, #98403)
[Link] (4 responses)
Posted Jan 17, 2025 18:16 UTC (Fri)
by k3ninho (subscriber, #50375)
[Link] (2 responses)
The diff has end-result line numbers and expected text, you can test "does this line match this patch?" pretty cheaply. If the line number doesn't match up, the unified diff gives you three lines to find so you can check after and before the three lines following. Maybe the lines get split up, but that's a case that a tool can't work out intent.
K3n.
Posted Jan 17, 2025 18:47 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
Posted Jan 18, 2025 8:41 UTC (Sat)
by epa (subscriber, #39769)
[Link]
Posted Jan 19, 2025 1:59 UTC (Sun)
by Heretic_Blacksheep (guest, #169992)
[Link]
A single unique ancestral id assigned strictly to committed content in which all children and cousin ids including changes to the ancestor are hard linked is preferable to systems where there can be confusion generated as to when, if, and why content may or may not have been integrated into the target work. Effectively what's needed in the kernel is a system in which all changes have such a unique content ID that can be referenced regardless of the sub project. The current system really doesn't appear to work in all cases at the scale the kernel is working at. And the DRM groups need to stop thinking they're independent of "downstream" kernel, when that's obviously a fiction. They're as dependent on kernel features just as the kernel is dependent on their features.
Posted Jan 18, 2025 11:00 UTC (Sat)
by em (subscriber, #91304)
[Link] (1 responses)
Posted Jan 18, 2025 21:17 UTC (Sat)
by epa (subscriber, #39769)
[Link]
(Another way to see whether a branch has been merged would be to look at the code and check the changes in the other branch are present in yours. This is what people sometimes had to resort to in older version control systems without change tracking, or which were not distributed, or indeed if they had no VCS at all. Both approaches have their advantages, but often we prefer the cleaner one based on explicitly tracking commit history.)
Now moving from whole branches to individual changes. You could try to work out whether a cherry-pick has been applied by looking at the current state of the code, or going back over the branch history to see if a similar-looking diff has been applied in the past. But as you say that falls down if there were conflicts or for other reasons the patch wasn’t applied exactly.
Instead, I suggest a metadata-based approach where you can create a “disembodied commit” which has a patch to apply but no ancestor commits. Merging this commit into your branch is usually simpler than merging a whole other branch, which may have unrelated changes (unless your programmers are very disciplined about always creating “daggy fixes” where the commit fixing a bug is an immediate child of the one that introduced it). Indeed merging this disembodied commit is the same as cherry-picking it, as far as the code goes. But unlike cherry-picking, the disembodied commit keeps the same SHA, and is now an ancestor of your branch. That would make it easy to ask “has this fix been applied?” without having to match the file contents, and even when the fix is only given as a text diff against some slightly different version of the codebase.
The SHA of the disembodied commit could be the hash of the original set of changes (or textual diff) but that same SHA would still be used even if you had to resolve conflicts on merging. Again, just as happens for regular branches and regular merges.
Posted Jan 17, 2025 10:59 UTC (Fri)
by sima (subscriber, #160698)
[Link] (4 responses)
It still sucks for the teams that have large internal trees (like amd's display code is shared with windows and firmware), so we still have some discontinuity in tracking changes and bugs that's not reflected in the upstream git log accurately, so ideally the kernel would need to accept an opaque original commit identifier.
I don't think a bug tracker issue would necessary work, because if your initial bugfix is broken most teams just reopen the original issue. Whereas for CVE tracking upstream wants separate IDs, if that broken fix has shown up in any release tag already. So for that purpose tracking commits instead of issue IDs has some benefits too.
Posted Jan 17, 2025 20:36 UTC (Fri)
by ewen (subscriber, #4772)
[Link] (3 responses)
But I do agree with your general point that the “stable” kernel accepting an “opaque ID that can be searched for” is the key change required. Rather than a “merkle tree commit hash in exactly their upstream tree” (Linus release), which cannot be known in advance when a fix is made before (the thing being fixed) is “merged to the central repo”.
Ewen
Posted Jan 17, 2025 22:50 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link]
Edge cases:
* If a commit has yet to make it into the central repo, then it has no ID, or perhaps only an unstable ID that might change or become invalid in the future. This does not matter, because these commits are considered "unfinished" and should not be used for any serious purpose (other than asking people to review them so that they can become final).
Obviously this would not work very well for Linux's use case, but it is the sort of thing that the MITRE people were probably expecting when they set some of these rules around stable IDs.
Posted Jan 17, 2025 23:16 UTC (Fri)
by kleptog (subscriber, #1183)
[Link] (1 responses)
Of course, such a string without context is not very useful so we should prefix it with something, perhaps the capital letter I. That would make it easily recognisable and searchable.
Finally, it should not be bare, but it could be added to the commit message under Change-ID or some such.
Oops, now it looks like Gerrit changelog ID and we couldn't possibly do that...
All this talk of diff algorithms and automatic patch matching, the amount of cycles and discussion going in this, all to solve a problem that is trivially solved with a git hook. I don't understand it at all.
We can replace the I with a K and call it Kernel-Change-Id, maybe then people would accept it?
Posted Jan 24, 2025 22:57 UTC (Fri)
by marcH (subscriber, #57642)
[Link]
Of course not, because such a dead simple solution comes from lesser people who do not recognize the superiority of an email-based workflow. What were you thinking?
Also, Gerrit genuinely sucks in many ways, so absolutely nothing good can come from it and it must be entirely dismissed.
> All this talk of diff algorithms and automatic patch matching, the amount of cycles and discussion going in this, all to solve a problem that is trivially solved with a git hook. I don't understand it at all.
No one can understand it because it's not rational.
For more bad faith, don't forget to watch the seminal talk "Patches carved in stone tablets". It's a bit old (2016) but already well into the social media age where the universe is split between the "good guys" who do everything right and the "bad guys" who do everything wrong. A complex world made simple at last.
To be fairer and less bitter: DRM seems to _already_ use such a random ID! This random ID just happens to coincide with the SHA of a mutable git commit, with a "cherry-picked from:" prefix pretending that the ID is the SHA of an immutable git commit.
But this ID disguise and attempt to appease the stone tablets gods is backfiring; they have not been fooled...
Posted Jan 17, 2025 17:01 UTC (Fri)
by twmoure (subscriber, #58924)
[Link]
Posted Jan 17, 2025 18:49 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Hmm. I'm not so sure that would work either. There are different diff algorithms in git itself and one might have higher context settings for various reason (e.g., I believe Phabricator wanted "infinite" context for uploaded patches).
Posted Jan 17, 2025 20:47 UTC (Fri)
by ewen (subscriber, #4772)
[Link]
TomH pointed out elsewhere in the thread that there is an internal “git patch-id” already, in git, that might form the basis of a “hash of just the patch diff”, if the “stable” kernel could be persuaded to accept that as the ID. It’d probably require extra tooling / metadata support to store enough information to enable efficient lookup of commits by the stable “change ID hash” (eg each git tree, or user, storing a cache of “change ID hash” to “merkle tree hash” mapping for direct lookup).
Ewen
Posted Jan 17, 2025 19:42 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
Posted Jan 22, 2025 21:34 UTC (Wed)
by siddh (subscriber, #169663)
[Link]
It already exists, it's called git-patch-id.
The following commits have the same patch ID:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...
Issue IDs
Taking just the diff
Taking just the diff
Taking just the diff
Taking just the diff
Taking just the diff
Taking just the diff
Taking just the diff
Taking just the diff
Taking just the diff
Taking just the diff
Issue IDs
Issue IDs
Issue IDs
* Once a commit is in the central repo, history rewrites are forbidden (allowing for a few exceptional cases when some sensitive item gets improperly committed). The need to support that exceptional use case is one of the stronger arguments for using a non-DVCS system (which can simply assign numeric commit IDs by fiat rather than having to do this whole Merkle tree business, so you can rewrite history without changing subsequent IDs).
* When something is cherry picked, there is some notion of a "primary" or "original" commit, which will at the very least be mentioned in the commit message of the cherry pick. Good tooling can use this to resolve the original commit when given a cherry pick ID (not every org has good tooling).
* Linear history is usually enforced. When a commit becomes final, it is rebased on top of the intended branch. For some systems like Perforce, this is usually a trivial operation (Perforce's data model is that each individual file is a series of snapshots with associated numeric commit IDs, so all history is inherently linear, and "rebasing" just means "check to make sure that nobody else has edited any of the same files as us").
Issue IDs
Issue IDs
Issue IDs
Issue IDs
Issue IDs
Issue IDs
Issue IDs