Issue IDs

Posted Jan 17, 2025 10:59 UTC (Fri) by sima (subscriber, #160698)
In reply to: Issue IDs by ewen
Parent article: The many names of commit 55039832f98c

Yup, the cherry-picked from annotations we dump into our cherry-picks in drm are essentially recreating such a stable ID out of thin air. It's not perfect, but it's definitely better than trying to guess by looking at patch title and diff, since as you cherry-pick fixes around especially the diff is rarely an exact match.

It still sucks for the teams that have large internal trees (like amd's display code is shared with windows and firmware), so we still have some discontinuity in tracking changes and bugs that's not reflected in the upstream git log accurately, so ideally the kernel would need to accept an opaque original commit identifier.

I don't think a bug tracker issue would necessary work, because if your initial bugfix is broken most teams just reopen the original issue. Whereas for CVE tracking upstream wants separate IDs, if that broken fix has shown up in any release tag already. So for that purpose tracking commits instead of issue IDs has some benefits too.

Issue IDs

Posted Jan 17, 2025 20:36 UTC (Fri) by ewen (subscriber, #4772) [Link] (3 responses)

Certainly if one were using an issue tracker to get “stable ID for fix commits”, that needed to be unique, there’d need to be some conventions around how to use the bug tracker to derive those stable IDs. Either “you must open a new issue, to get a new stable ID, and have it link back to the closed issue”. Or some kind of “fix sequence number” (eg ISSUEID-SEQ, with SEQ starting at 0 and increased for each attempt at “commits fixing problem”).

But I do agree with your general point that the “stable” kernel accepting an “opaque ID that can be searched for” is the key change required. Rather than a “merkle tree commit hash in exactly their upstream tree” (Linus release), which cannot be known in advance when a fix is made before (the thing being fixed) is “merged to the central repo”.

Ewen

Issue IDs

Posted Jan 17, 2025 22:50 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

I think the average large org has a much more straightforward posture here: They have a single central repository (which might or might not be a DVCS like Git), and whatever identifier that centralized repo assigns is the identifier you use for everything.

Edge cases:

* If a commit has yet to make it into the central repo, then it has no ID, or perhaps only an unstable ID that might change or become invalid in the future. This does not matter, because these commits are considered "unfinished" and should not be used for any serious purpose (other than asking people to review them so that they can become final).
* Once a commit is in the central repo, history rewrites are forbidden (allowing for a few exceptional cases when some sensitive item gets improperly committed). The need to support that exceptional use case is one of the stronger arguments for using a non-DVCS system (which can simply assign numeric commit IDs by fiat rather than having to do this whole Merkle tree business, so you can rewrite history without changing subsequent IDs).
* When something is cherry picked, there is some notion of a "primary" or "original" commit, which will at the very least be mentioned in the commit message of the cherry pick. Good tooling can use this to resolve the original commit when given a cherry pick ID (not every org has good tooling).
* Linear history is usually enforced. When a commit becomes final, it is rebased on top of the intended branch. For some systems like Perforce, this is usually a trivial operation (Perforce's data model is that each individual file is a series of snapshots with associated numeric commit IDs, so all history is inherently linear, and "rebasing" just means "check to make sure that nobody else has edited any of the same files as us").

Obviously this would not work very well for Linux's use case, but it is the sort of thing that the MITRE people were probably expecting when they set some of these rules around stable IDs.

Issue IDs

Posted Jan 17, 2025 23:16 UTC (Fri) by kleptog (subscriber, #1183) [Link] (1 responses)

The patch tracking id doesn't need to be anything complicated. You could for example just generate a random 32 character hex string and store it in the commit message. Then it would automatically survive cherry-picks and everything. You can make a git hook that automatically generates such an id when the commit is created.

Of course, such a string without context is not very useful so we should prefix it with something, perhaps the capital letter I. That would make it easily recognisable and searchable.

Finally, it should not be bare, but it could be added to the commit message under Change-ID or some such.

Oops, now it looks like Gerrit changelog ID and we couldn't possibly do that...

All this talk of diff algorithms and automatic patch matching, the amount of cycles and discussion going in this, all to solve a problem that is trivially solved with a git hook. I don't understand it at all.

We can replace the I with a K and call it Kernel-Change-Id, maybe then people would accept it?

Issue IDs

Posted Jan 24, 2025 22:57 UTC (Fri) by marcH (subscriber, #57642) [Link]

> Oops, now it looks like Gerrit changelog ID and we couldn't possibly do that...

Of course not, because such a dead simple solution comes from lesser people who do not recognize the superiority of an email-based workflow. What were you thinking?

Also, Gerrit genuinely sucks in many ways, so absolutely nothing good can come from it and it must be entirely dismissed.

> All this talk of diff algorithms and automatic patch matching, the amount of cycles and discussion going in this, all to solve a problem that is trivially solved with a git hook. I don't understand it at all.

No one can understand it because it's not rational.

For more bad faith, don't forget to watch the seminal talk "Patches carved in stone tablets". It's a bit old (2016) but already well into the social media age where the universe is split between the "good guys" who do everything right and the "bad guys" who do everything wrong. A complex world made simple at last.

To be fairer and less bitter: DRM seems to _already_ use such a random ID! This random ID just happens to coincide with the SHA of a mutable git commit, with a "cherry-picked from:" prefix pretending that the ID is the SHA of an immutable git commit.

But this ID disguise and attempt to appease the stone tablets gods is backfiring; they have not been fooled...