|
|
Subscribe / Log in / New account

Updating the Git protocol for SHA-256

Updating the Git protocol for SHA-256

Posted Jun 20, 2020 15:36 UTC (Sat) by hmh (subscriber, #3838)
In reply to: Updating the Git protocol for SHA-256 by ms-tg
Parent article: Updating the Git protocol for SHA-256

I am not well versed on multihash, but a first look failed to find a canonical, immutable registry of already in use algorithms and their mapping to IDs (the numerical ones that are ABI since they end up embedded in the base# representations) along with the procedures to interact with such a registry. I mean something like IANA does.

At that point it becomes app specific, and other than the obvious protocol best practice that you should explicitly encode the protocol version (in this case what hash and hash parameters if not implied), there is little to be gained.

Prefixing (hidden by base# or explicitly) the hash type in git has already been covered by other replies and posts, and yes, imho it really should be done if at all possible.


to post comments

Updating the Git protocol for SHA-256

Posted Jun 20, 2020 17:11 UTC (Sat) by cyphar (subscriber, #110703) [Link] (5 responses)

Multihash defines exactly two things, an extensible format and a table of hash functions. So it definitely does what you say it doesn't (in fairness, the link @ms-tg gave you isn't as useful as the project's page[1]).

Now there isn't an IANA-like procedure, everything is done via PRs on GitHub but that's just differences in administrative structure.

[1]: https://multiformats.io/multihash/

Updating the Git protocol for SHA-256

Posted Jun 20, 2020 18:42 UTC (Sat) by hmh (subscriber, #3838) [Link] (1 responses)

A procedure to add new hashes is a procedure, PRs in github are fine.

This link you sent is much better, the other one lacks essential information...

I am quite sure git would severely restrict the allowed hashes, but at least the design of multihash seems sane and safely extensible, including when ones does the short-sighted error of enshrining short prefixes of the hash anywhere that is not a throw away command line call... A bad practice that is very common among git users.

Updating the Git protocol for SHA-256

Posted Jun 20, 2020 23:17 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

> including when ones does the short-sighted error of enshrining short prefixes of the hash anywhere that is not a throw away command line call... A bad practice that is very common among git users.

"Best practice" for short usage in more permanent places includes the date (or tag description) and summary of the commit in question (which both greatly ease conflict resolution when it occurs and gives some idea of what's going on without having to copy/paste the has yourself).

IANA

Posted Jun 22, 2020 15:37 UTC (Mon) by tialaramex (subscriber, #21167) [Link]

IANA offers a _lot_ of different procedures. Varying from Private Use and Experimental (chunks of namespace carved off entirely for users to do with as they please without talking to IANA at all) through to Standards Action (you must publish an IETF Standards Track document e.g. a Best Common Practice or an RFC explicitly designated Internet Standard) and where the namespace is hierarchically infinite or near infinite (e.g. OIDs, DNS) IANA just delegates one layer of the namespace and more or less lets the hierarchy sort it out. Technically these OIDs don't even belong to IANA (it hijacked the ones used for the Internet many years ago) but it delegates them this way anyway and it's too late for the standards organisations that minted them to say "No".

RFC 8126 lists 10 such procedures for general use in new namespaces.

So what Multihash are doing here sounds like a typical new IANA namespace which has an Experimental/ Private Use region (self-assigned) and then Specification Required for the rest of the namespace. You must document what you're doing, maybe with a Standards Organisation, maybe you write a white paper, maybe even you just spin up a web site with a technical rant, but you need to document it and then you get reviewed and maybe get in.

Apparently Multihash is writing up some sort of formal document to maybe got to the IETF, but given they started in 2016 and it's not that hard they may not ever get it polished up and standardised anywhere, it's not a problem.

Updating the Git protocol for SHA-256

Posted Jun 24, 2020 4:03 UTC (Wed) by nevyn (guest, #33129) [Link] (1 responses)

Hmm, as someone who has done a bunch of work with hashes over the last couple of years I'd not heard of multihash before, and looking at https://multiformats.io/#projects-using-multiformats it seems the main user is still just ipfs. This wouldn't necessarily be bad if it was new and gaining usage, but it's more worrying given it's been around over half a decade and supposed to be established.

Another similar point is the table itself, the hashes added are done ad hoc when someone uses them and wants to use multihash ... again, fine if the project is very new and gaining traction but much less good if the project is established and you go see that none of https://github.com/dgryski/dgohash are there. I understand it's volunteer based contributions but if you want people to actually use your std. it's going to be much easier if they can use it without having to self register well known/used decade old types.

Then there's the format itself. I understand that hashes are variable length but showing abbreviated hashes is very well known at this point. A new git repo. shows 7 characters for the --abbrev hash, ansible with over 50k commits only shows 10 (and even then github only shows 7), and they want to add "1220" to the front of that? And they really want you to show it to the user all the time? Even if abbreviated hashes weren't a thing, most users are going to think it's a bit weird if literally all the hashes they see start with the same 4 hex characters (at a minimum -- using blake2b will eat 6, I think). I also doubt many developers would want to store the hases natively, because it doesn't take many instances before storing the exact same byte sequence with each piece of actual data becomes more than trivial waste.

Updating the Git protocol for SHA-256

Posted Jun 25, 2020 17:02 UTC (Thu) by pj (subscriber, #4506) [Link]

...all valid criticisms, but I've yet to see an alternative with equivalent functionality and more widespread support. If you know of one, I'd love to hear about it! Though as you say, multihash is still fairly young so would likely welcome feedback that would help adoption/functionality/usability.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds