|
|
Subscribe / Log in / New account

Cook: Colliding with the SHA prefix of Linux's initial Git commit

Kees Cook describes his work resulting in a kernel documentation commit whose ID shares the same first 12 characters as the initial commit in the kernel's repository.

This is not yet in the upstream Linux tree, for fear of breaking countless other tools out in the wild. But it can serve as a test commit for those that want to get this fixed ahead of any future collisions (or this commit actually landing).

LWN looked at commit-ID collisions a few weeks back.


to post comments

Totally something Case would do

Posted Dec 30, 2024 22:39 UTC (Mon) by mricon (subscriber, #59252) [Link] (8 responses)

When I first met Case, it was at a conference in San Diego, when he was plugging in a WiFi router he brought with him from home into an outlet in the hallway. His stated reason was to hope that the SSID is added to the Apple/Google tracking databases, so next time someone drives by his house in Portland and registers his home wifi signal, the tracker would be confused.

Anyway, all of this is to say that this story 100% checks out and this is totally something Case would do.

Totally something Case would do

Posted Dec 30, 2024 23:08 UTC (Mon) by mricon (subscriber, #59252) [Link] (4 responses)

Lol, this is what happens when I try to dictate and don't check that Kees got turned into Case. Sorry, Kees!

Totally something Case would do

Posted Dec 31, 2024 0:50 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

Another case of namespace collision.

Totally something Case would do

Posted Dec 31, 2024 10:15 UTC (Tue) by lkundrak (subscriber, #43452) [Link]

and possible new nickname

Totally something Case would do

Posted Jan 3, 2025 22:16 UTC (Fri) by Lennie (subscriber, #49641) [Link]

I guess we could say: Kees in point. :-)

Totally something Case would do

Posted Jan 1, 2025 0:29 UTC (Wed) by ssmith32 (subscriber, #72404) [Link]

Oof, I started reading that and thought it was going to be a sly allusion to Neuromancer ... it isn't actually a bad start to a short bit of fan fic, imho 😄

Totally something Case would do

Posted Dec 31, 2024 12:15 UTC (Tue) by bof (subscriber, #110741) [Link] (2 responses)

That reminds me of a funny guy I met on a vacation in Berlin. It was on a calm little street, and he was trodding along the sidewalk with a handcart full of switched on mobile phones. Asked him what he's doing. He happily explained that he's teaching Google Maps that the street is congested so often that it would be totally silly to send traffic through.

I laughed so hard...

Totally something Case would do

Posted Dec 31, 2024 13:38 UTC (Tue) by Wol (subscriber, #4433) [Link]

I remember Google Maps messing up badly, but that was just Maps being stupid.

Embankment was closed just west of Charing Cross. Seeing that there was not much traffic on a major thoroughfare, Google Maps was trying to send all the traffic down Embankment ...

I was wondering whether to trust Google Maps or not, and wish I hadn't - once committed, it took me well over half an hour to extricate myself just to get back to where I was, quite possibly an hour ... like *MOST* AI, when you need it most is exactly when it is most likely to screw up. If there's a traffic jam, trust the official diversion or your instincts, as AI is quite likely to mess up completely. I think I completely lost faith in it when it tried to divert a closed motorway down a single-track road ...

Cheers,
Wol

Totally something Case would do

Posted Jan 1, 2025 20:39 UTC (Wed) by deptrai (guest, #70612) [Link]

SHAs in commit messages

Posted Dec 31, 2024 10:08 UTC (Tue) by epa (subscriber, #39769) [Link] (7 responses)

It’s always seemed a bit of a gap in git’s functionality that there is no official way to mark an SHA in a commit message. If a web interface wants to hyperlink it, for example, it has to sniff for a likely-looking sequence of 0-9a-f characters. Whereas if the SHA were marked it could be automatically linked and abbreviated too for human readers. When rebasing, messages like “this reverts commit abc” or “see also bcd” could automatically be updated to the new commit in the rebased series.

(I know worse is better and you can build anything you want on top of an unstructured field, but there is value in having a single defined way which you can reasonably expect most people to follow.)

SHAs in commit messages

Posted Dec 31, 2024 10:33 UTC (Tue) by pbonzini (subscriber, #60935) [Link] (4 responses)

You could just have automatic shortening of SHA1 commit IDs in "git log", though that probably would require wrapping as well. It's essentially what the various git forges do when you paste a full commit ID in a comment or a commit message.

SHAs in commit messages

Posted Dec 31, 2024 11:32 UTC (Tue) by epa (subscriber, #39769) [Link] (3 responses)

You could shorten in 'git log' but that doesn't do much to encourage use of the full SHA in the original message. If only eight characters are going to be printed, why bother pasting in all of it? What's missing is the input side: when a commit message contains something that looks like an SHA, abbreviated or not, look it up in the current repository and store the full hex string. Better to do this guessing and magic at the point when the message is written, rather than later when it's displayed, by which time the shortened string may have become ambiguous. And then rebasing and cherry-picking can repoint to a new commit, prompting the user if needed.

SHAs in commit messages

Posted Dec 31, 2024 15:33 UTC (Tue) by jorgegv (subscriber, #60484) [Link] (2 responses)

You could shorten in 'git log' but that doesn't do much to encourage use of the full SHA in the original message. If only eight characters are going to be printed, why bother pasting in all of it?

Because when you are pasting a commit ID, you have most probably copied it from the terminal (or somewhere else), and doing double click on the full commit ID selects it completely? I mean: I believe nobody keys in commit IDs letter by letter, but instead copy&paste them everywhere (or use some kind of autocomplete), and the copy operation is the same to copy the full ID or an abbreviation.

My view is that commit IDs should be used in full size whenever a machine could be using them, and only shorten them when showing then to a human. As someone already said, that's what forges already do, and I think it's the correct way.

SHAs in commit messages

Posted Dec 31, 2024 17:03 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

People are not going to copy the full ID unless you print the full ID for them to copy. So if you want people to copy the full ID, you have to display the full ID for them to copy, and it has to be the default behavior of git log etc. (because people are lazy and will grab the abbreviation if it is more convenient).

But that's obviously a bad user experience. The better option would be one or more of the following:

* As suggested, auto-expand unambiguous abbreviations to full hashes at commit time (with a flag to disable this behavior, and possibly also a warning message on stderr to let you know that an expansion has occurred). Auto-abbreviate them in git log (with a flag to disable).
* Provide a reasonably terse syntax that performs auto-expansion at commit time (and can be escaped or disabled if necessary). You can probably do this now by configuring your editor to perform the expansion, but it would also require everyone to agree that expansion is preferred.
* When resolving commit IDs that appear in a given commit message, first look for matching ancestors of the given commit, then for matching non-descendant commits that are older than the given commit according to the date field, then for matching non-descendant commits that are less than 24 hours younger (to allow for wrong timezone shenanigans), and finally for all matching non-descendant commits in the repo. This requires you to know where the commit ID came from in the first place, but forges do in fact have that information when they are linkifying a commit message. Of course, they might already be doing something like this.

SHAs in commit messages

Posted Jan 1, 2025 0:24 UTC (Wed) by epa (subscriber, #39769) [Link]

Not git log, but many other git commands show an abbreviated SHA by default.

SHAs in commit messages

Posted Dec 31, 2024 14:21 UTC (Tue) by adobriyan (subscriber, #30858) [Link] (1 responses)

<sha1></sha1> won't be that bad but it is too late. "sha1:..." is probably the way to go.

SHAs in commit messages

Posted Jan 1, 2025 18:26 UTC (Wed) by ceplm (subscriber, #41334) [Link]

Yes, and Git (not GitHub) supports sha256-based repositories (e.g., https://src.opensuse.org/mcepl/dictd … it is being used so far in development by openSUSE for packaging repos; and yes, of course, Gitea supports it as well).

See also https://lwn.net/Articles/898522/

Just describe it

Posted Dec 31, 2024 15:26 UTC (Tue) by jengelh (guest, #33263) [Link] (3 responses)

Two readily available options (other than the obvious way of using the full hash):

1. `git describe`: "v6.11-7343-g70fd1966c93b". The chance that there are two commits with prefix 70fd19 *and* with distance 7343 from v6.11 is a lot lower than the chance that there are two commits with prefix 70fd19 anywhere in the entire history.

2. `git describe --contains` (if it yields a result): "v6.12~355^2~3" is unambiguous, at least as long as the tree has no grafts (or the modern equivalent thereof).

Just describe it

Posted Dec 31, 2024 16:21 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (2 responses)

This assumes that annotated tags have some source of "truth" to them and are never used outside of that within a repository. There are also multiple names for a commit based on which tags are available locally. There's also this bug[1] that still exists which can get the wrong tag if you have automated merges in the history that share a (1-second resolution) timestamp.

I don't think I can recommend `git describe` for this purpose.

[1] https://lore.kernel.org/git/ZNffWAgldUZdpQcr@farprobe/T/#u

Just describe it

Posted Jan 9, 2025 16:22 UTC (Thu) by smurf (subscriber, #17840) [Link] (1 responses)

> I don't think I can recommend `git describe` for this purpose.

Maybe not currently, but fixing these problems (maybe only consider signed tags?) shouldn't be too difficult.

Just describe it

Posted Jan 9, 2025 18:24 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Eh, I think limiting to tags present on some remote would also suffice and probably be more robust (in case you also sign tags for your internal development). The traversal bug just needs some elbow grease (assuming the perf hit is acceptable…maybe only run it if commits with equal commit times to break the tie appropriately?).

I wish for a longer prefix

Posted Jan 2, 2025 15:48 UTC (Thu) by mfranc (subscriber, #164119) [Link] (2 responses)

As somebody who wrote 2 tools for parsing kernel git commits (with libgit2) and is in the process writing the third, I find the ideas around comparing Subject or any other additional metadata along the hash prefix terrible. It is so much easier to simply call rev-parse on a hash prefix (no matter how long) than any proposed alternative.

I wish for a longer prefix

Posted Jan 3, 2025 7:50 UTC (Fri) by jirislaby (subscriber, #129812) [Link] (1 responses)

Perhaps git should have been taught to do the inversion of oneline/abbrev first? So that we all have the decoding on one central place. It has to be written (or taken from git web parsers) now anyway.

I wish for a longer prefix

Posted Jan 3, 2025 12:47 UTC (Fri) by mfranc (subscriber, #164119) [Link]

Unlikely with libgit2. The biggest problem of git itself is that you cannot use it as a library.


Copyright © 2024, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds