|
|
Subscribe / Log in / New account

Whatever happened to SHA-256 support in Git?

By Jonathan Corbet
June 23, 2022
The news has been proclaimed loudly and often: the SHA-1 hash algorithm is terminally broken and should not be used in any situation where security matters. Among other things, this news gave some impetus to the longstanding effort to support a more robust hash algorithm in the Git source-code management system. As time has passed, though, that work seems to have slowed to a stop, leaving some users wondering when, if ever, Git will support a hash algorithm other than SHA-1.

Hash functions are, of course, at the core of how Git works. Every object in its data store — every version of every file, among other things — is hashed, with the resulting value serving as the key under which that object is stored. Commits, too, are represented by a hash of the current state of the tree, the commit message, and the hash(es) of the parent commit(s). The security of the hash function is a key part of the integrity of a repository as a whole. If an attacker could replace a commit with another having the same hash value, they could perhaps inject malicious code into a repository without risking detection. That prospect is worrisome to anybody who depends on the security of code stored in Git repositories — everybody, in other words.

The Git project has long since chosen SHA-256 as the replacement for SHA-1. Git was originally written with SHA-1 deeply wired into the code, but all of that code has since been refactored and can handle multiple hash types, with SHA-256 being the second supported type. It is now possible to create a Git repository using SHA-256 (just use the --object-format=sha256 flag) and most local operations will work just fine. The foundation for support of alternative hash algorithms in Git was part of the 2.29 release in 2020 and appears to be solid.

That 2.29 release, though, is the last one that features alternative-hash work in any significant way; there has been no mention of this work in the project's release notes since a fix showed up in 2.31, released in March 2021. The 2.29 work marked SHA-256 as experimental and warned that "that there is no interoperability between SHA-1 and SHA-256 repositories yet". There was some work toward interoperability posted in 2020, but those patches do not appear to have ever been merged into the Git mainline.

In other words, work on supporting the use of a hash algorithm other than SHA-1 in Git appears to have ground to a halt. That recently led Stephen Smith to post a query about its status to the development list. This response from Ævar Arnfjörð Bjarmason is illuminating and, for those looking forward to full SHA-256 support, potentially discouraging:

I wouldn't recommend that anyone use it for anything serious at the moment, as far as I can tell the only users (if any) are currently (some) people work on git itself.

Bjarmason pointed out that there is still no interoperability between SHA-1 and SHA-256 repositories, and that none of the Git hosting providers appear to be supporting SHA-256. That support (or the lack thereof) matters; a repository that cannot be pushed to a Git forge will be essentially useless to many people. There is also the risk (which cannot really be made to go away) that the longer hashes used with SHA-256 may break tools developed outside of the Git project. The overall picture is one of a feature that is not yet ready for real-world use.

That said, it is worth noting that brian m. carlson, who has done the bulk of the hash-transition work so far, disagrees with Bjarmason's assessment. In his view, the only "defensible" reason to use SHA-1 at this point is interoperability with the Git forge providers. Otherwise, he said, SHA-1 is obsolete, and performance with SHA-256 can be "substantially faster". But he agrees that the needed interoperability does not exist, and nobody has said that it is coming anytime soon.

What has happened here looks, to an extent at least, like a story that has played out numerous times over the course of free-software history. A problem has been identified, and a great deal of core foundational work has been done to solve it. That solution appears to be well considered and solidly implemented. In a sense, the job is 90% done. All that is left is the hard work of making the transition to a new hash easy for users — what could be thought of as "the other 90%" of the job.

This sort of interface and compatibility development is hard and developers often do not find it particularly rewarding, so it tends to be neglected by our community. The Git project, one might argue, is especially prone to user-interface challenges, but the problem is wider than that. There are certain sorts of tasks that volunteers are often uninclined to pick up, and that companies may not feel the need to fund.

Given the threat that the SHA-1 hash poses, one might think that there would be a stronger incentive for somebody to support this work. But, as Bjarmason continued, that incentive is not actually all that strong. The project adopted the SHA-1DC variant of SHA-1 for the 2.13 release in 2017, which makes the project more robust against the known SHA-1 collision attacks, so there does not appear to be any sort of imminent threat of this type of attack against Git. Even if creating a collision were feasible for an attacker, Bjarmason pointed out, that is only the first step in the development of a successful attack. Finding a collision of any type is hard; finding one that is still working code, that has the functionality the attacker is after, and that looks reasonable to both humans and compilers is quite a bit harder — if it is possible at all.

So few people are losing sleep over the possibility that a Git repository could be deliberately corrupted by way of an SHA-1 hash collision anytime soon. The combination of a lack of urgency and little apparent interest in doing the work has seemingly brought the SHA-256 transition to a halt. Perhaps that is how the situation will remain until another SHA-1 weakness turns up and brings attention back to the situation. But, as Randall Becker pointed out, there is a cost to this inaction:

Adding my own 0.02, what some of us are facing is resistance to adopting git in our or client organizations because of the presence of SHA-1. There are organizations where SHA-1 is blanket banned across the board - regardless of its use. [...] Getting around this blanket ban is a serious amount of work and I have very recently seen customers move to older much less functional (or useful) VCS platforms just because of SHA-1.

It is a bit of a stretch to imagine that remaining with SHA-1 will threaten Git's dominance in the near future. But it could, perhaps, give a toehold to a competitor that would lead to trouble in the longer term, especially if the security of SHA-1 crumbles further.

Given that, one might think that companies that are dependent on Git would see some value in solving this particular problem. Many companies use Git, but some have based their entire business model around it. The latter companies have benefited greatly from the community's investment in Git, and they have a lot to lose if Git loses its prominence. It would seem to make sense for one or more of these companies to make the relatively small investment needed to push this transition to completion; that would be good for the community — and for their own future as well.


to post comments

Whatever happened to SHA-256 support in Git?

Posted Jun 23, 2022 14:19 UTC (Thu) by LtWorf (subscriber, #124958) [Link] (18 responses)

Wouldn't the people that care about security just sign the commits?

Whatever happened to SHA-256 support in Git?

Posted Jun 23, 2022 14:35 UTC (Thu) by dtlin (subscriber, #36537) [Link] (16 responses)

Git signs the commit or tag object, not the whole file tree. So if you don't trust SHA-1, GPG doesn't add any security - the file content under a signed commit or tag could still be replaced.

Whatever happened to SHA-256 support in Git?

Posted Jun 23, 2022 19:10 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (2 responses)

Furthermore, all currently feasible SHA-1 attacks are collision attacks - i.e. attacks in which the same person creates both the "good" commit and the "bad" commit. Signatures are primarily designed to deal with the case where the "good" and "bad" commits are created by different people (i.e. they are used to prove that a given commit was authored by the person identified in its metadata, and not an imposter). You can also use signatures to prove that some third party has reviewed the commit and believes it to be non-malicious, but to my understanding, that is not the typical use case (and, as you say, it is defeated by the collision attack anyway).

Whatever happened to SHA-256 support in Git?

Posted Jun 24, 2022 0:07 UTC (Fri) by wahern (subscriber, #37304) [Link] (1 responses)

Chosen-prefix attacks have already been demonstrated.[1] SHA-1 is as dead as MD-5. Which is to say, not entirely, but nobody who cares about their reputation wants to be in the company of those quibbling about how it can still be used for this or that.

[1] https://eprint.iacr.org/2020/014.pdf

Whatever happened to SHA-256 support in Git?

Posted Jun 24, 2022 1:27 UTC (Fri) by bartoc (guest, #124262) [Link]

Well, md5 has a (somewhat computationally theoretical) full preimage attack, and it's chosen prefix attacks and general collision attack algorithms are somewhat faster than those for SHA-1. I kinda suspect it's cheaper to social engineer code into most git repos than it is to try and find a SHA-1 collision (not to mention simply finding and inserting the colliding object may not be enough to cause the modified code to spread).

It is about time to start switching over to sha-256, but of all the overdue migrations away from harmful stuff in the world SHA-1 git repos are .... not that harmful.

Whatever happened to SHA-256 support in Git?

Posted Jun 23, 2022 20:04 UTC (Thu) by walters (subscriber, #7396) [Link]

I started https://github.com/cgwalters/git-evtag before the sha1 breakage, I think it still makes sense. May try at some point to get it into git again.

Whatever happened to SHA-256 support in Git?

Posted Jun 24, 2022 3:18 UTC (Fri) by brasic (subscriber, #159230) [Link] (11 responses)

> Git signs the commit or tag object, not the whole file tree. […] the file content under a signed commit or tag could still be replaced.

Fortunately this is incorrect! Well, the second part is incorrect; the first is quite right but doesn’t imply what you think it does.

Every git commit is the root of a merkle tree, or as the kids call it, a “blockchain”. A git commit object id is the hash of a string which includes among other things the commit’s immediate parent object ids, and the commit’s root tree object id. Here is the canonical serialization of a commit, the input bytes passed to the hash function: https://github.com/git/git/blob/39c15e485575089eb77c769f6...

The tree oid is also the root of a separate merkle tree which recursively hashes the contents, file names and permissions of every file in the repo.

Since the input string which is hashed to produce the commit oid includes the tree oid, the contents of every file in that commit and all prior ones are part of the id and any change to any file will produce an entirely different object id.

The actual string which is signed is exactly the canonical serialization above. Then the sig is added to a header and the object id is computed (now including the signature as a has component)

So you’re quite right that only the commit is signed. But because of the magic of git, signing a commit is equivalent to signing the whole tree and all of history!

Whatever happened to SHA-256 support in Git?

Posted Jun 24, 2022 7:22 UTC (Fri) by azumanga (subscriber, #90158) [Link]

I think there is (to me) a misunderstanding of "whole tree".

The problem is, while yes the hash "represents" the whole tree, if SHA1 is broken signing a hash for either a single commit, or whole tree, are both in practice useless.

Whatever happened to SHA-256 support in Git?

Posted Jun 24, 2022 14:17 UTC (Fri) by angelsl (subscriber, #144646) [Link]

Yes, so you could replace the contents of a file (blob object), or the tree object, or the commit object itself, by finding a hash collision.

Whatever happened to SHA-256 support in Git?

Posted Jun 25, 2022 4:46 UTC (Sat) by alison (subscriber, #63752) [Link]

> Every git commit is the root of a merkle tree, or as the kids call it, a “blockchain”. A git
> commit object id is the hash of a string which includes among other things the commit’s
> immediate parent object ids, and the commit’s root tree object id.

Is the algorithm used by TPMs also a merkel tree?

Whatever happened to SHA-256 support in Git?

Posted Jun 27, 2022 14:02 UTC (Mon) by KaiRo (subscriber, #1987) [Link] (7 responses)

Please do all of us a favor and don't use the word "blockchain" when you obviously don't know what makes something one. While blockchain usually use merkle trees, that doesn't mean a merkle tree is a blockchain, it's (usually) part of one. There is enough FUD and scamming around this area, it's neither useful to you to join into that muddy crowd nor is it useful to those of us who are trying to make decent and honest use out of the social engineering and technology combination that actual blockchains represent.

Whatever happened to SHA-256 support in Git?

Posted Jul 5, 2022 22:17 UTC (Tue) by koh (subscriber, #101482) [Link] (6 responses)

Please do all of us a favour and enlighten us as to what technically (usually) are those fundamental differences you're hinting at.

Whatever happened to SHA-256 support in Git?

Posted Jul 6, 2022 3:18 UTC (Wed) by nybble41 (subscriber, #55106) [Link] (5 responses)

The key element which sets a blockchain apart from an arbitrary Merkle tree (or DAG) is the Byzantine consensus system which ensures that there is only *one* dominant chain in the distributed system. Git repos are organized into one or more Merkel trees, but it's a federated system where each node is a silo with its own data, not a distributed one where all the nodes (eventually) come to share a single Merkle root with new "blocks" being added to a common "chain".

Whatever happened to SHA-256 support in Git?

Posted Jul 6, 2022 14:04 UTC (Wed) by geert (subscriber, #98403) [Link] (3 responses)

Sounds like Linux kernel development, where (ideally) all forks end up being merged into Linus' tree, eventually...

See James Bottomley's closing keynote at OLS2007 (https://www.linux.com/news/ols-closes-keynote/).

Whatever happened to SHA-256 support in Git?

Posted Jul 6, 2022 14:34 UTC (Wed) by farnz (subscriber, #17727) [Link] (2 responses)

The distinction is that in the Linux development model, Linus is a single point of failure - the consensus algorithm in the federated git tree world is "we trust Linus". In a blockchain, the consensus algorithm will choose a tree from the set in the federation such that no individual tree in the federated set is "more trusted" than others - if Linus were to go rogue or go on vacation, a blockchain development model would choose someone else's tree as "mainline Linux" automatically.

This is the key to the blockchain's difference from other Merkle trees - in a blockchain, consensus is formed automatically and does not depend on humans choosing trusted people, while in most Merkle trees, the consensus decision depends on humans making trust decisions.

It's mathematically neat that we can have consensus without needing trust, but it's not necessarily a practical result.

Whatever happened to SHA-256 support in Git?

Posted Jul 6, 2022 15:21 UTC (Wed) by excors (subscriber, #95769) [Link] (1 responses)

> in a blockchain, consensus is formed automatically and does not depend on humans choosing trusted people

...except when, say, the core developers can't agree on a technical change for the project and so they fork the blockchain and now you've got two versions that both claim to be authoritative, and they have to fight it out on social media to convince users/miners/exchanges/etc to support their side. Maybe the mathematical model is trustless but that's because it's modeling an unrealistically abstract version of the problem - the practical implementation is never trustless, it's just obscuring who you're having to trust. (And as demonstrated over and over again, users often end up having to trust people who really don't deserve that trust.)

Whatever happened to SHA-256 support in Git?

Posted Jul 6, 2022 15:46 UTC (Wed) by farnz (subscriber, #17727) [Link]

To be fair, that's an issue because you're choosing between two different blockchains, each of which does the trust thing automatically.

And that sort of problem is what I meant by saying that it's mathematically neat, but not necessarily practical - being able to form a consensus without trust is cool, but there are other dimensions involved beyond simply forming a consensus, such as which blockchain to trust.

Whatever happened to SHA-256 support in Git?

Posted Jul 7, 2022 11:06 UTC (Thu) by koh (subscriber, #101482) [Link]

If I understand correctly: Merkle DAG + automated choice of the "mainline" branch to add nodes to + automated distribution of all nodes/commits in a network.

Not sure about the "all" in the last part, but that helped, many thanks!

Whatever happened to SHA-256 support in Git?

Posted Jun 23, 2022 14:42 UTC (Thu) by smoogen (subscriber, #97) [Link]

Then you need to build into your checkout tooling to check that the signatures are actually valid. That means knowing what this 3rd party person's key is, how valid it is to that project, etc. I expect that like most of the usage of 'signatures'... it would be dictated but turned off in any build system because it is so hard to keep working. Most developers would rather have someone giving them hacked code than deal with GPG signature problems on a checkout.

Which is why saying 'you can't use SHA-1' is an easier dictum from a security groups compliance method. You know that the signature's etc would be better, but you know within 2 minutes of saying using it would be ok that it would be turned off in the name of 'get that build out the door'.

Whatever happened to SHA-256 support in Git?

Posted Jun 23, 2022 14:43 UTC (Thu) by martin.langhoff (guest, #61417) [Link] (18 responses)

The git ecosystem is vast. This is both needed, and something that'll break all sorts of stuff.

Which reminds me... how is that IPv6 transition going? :-)

Whatever happened to SHA-256 support in Git?

Posted Jun 23, 2022 19:03 UTC (Thu) by Sesse (subscriber, #53779) [Link] (17 responses)

Around than 40% of end users (desktop or mobile) support IPv6. https://www.google.com/intl/en/ipv6/statistics.html

If you are an ISP and enable IPv6 in your network, you can expect to see more IPv6 traffic than IPv4 traffic on average.

IPv6

Posted Jun 23, 2022 19:18 UTC (Thu) by corbet (editor, #1) [Link] (12 responses)

As a highly precise and rigorous experiment that surely generalizes to the net as a whole, I did a couple of greps out of the LWN server log and found that just under 20% of our hits come from IPv6 addresses.

IPv6

Posted Jun 23, 2022 19:21 UTC (Thu) by Sesse (subscriber, #53779) [Link]

Note that if your IPv6 connectivity is significantly slower than your IPv4 connectivity (on average), clients will generally prefer IPv4 (they send SYN packets for both, and let them race, with a slight preference for IPv6).

IPv6

Posted Jun 23, 2022 20:32 UTC (Thu) by jem (subscriber, #24231) [Link] (8 responses)

I suspect IPv6 net traffic is skewed towards connections over the mobile network, with the traditional DSL connections still being IPv4-only. Maybe lwn.net readers are predominantly using computers in a traditional (home) office setting? Company networks also typically don't support IPv6, which can be seen in the graphs published by Google as spikes during the weekends.

IPv6

Posted Jun 24, 2022 1:41 UTC (Fri) by bartoc (guest, #124262) [Link] (5 responses)

IME that's not so true anymore. You will probably see more usage of stuff like NAT64 and 464xlat in mobile clients though (honestly, such schemes are not all that useful in the real world). Many, if not most wired network connections in the states support v6 native, and I don't really see why that wouldn't keep growing, nobody actually likes having to deal with cgnat, including the ISPs.

Traditional DSL connections may still use it, because most DSL infrastructure is somewhat old and not well maintained, but cable and fiber ISPs have been OK about upgrading people.

The big cloud hosting providers don't tend to support v6 for internal routing yet, which is a bit unfortunate because "just using native ipv6" would meet a lot of the container networking requirements without having to administer a BGP server (I kinda can't believe that some of these container runtimes have caught on with that requirement, it's quite heroic in a way).

IPv6

Posted Jun 24, 2022 7:28 UTC (Fri) by jem (subscriber, #24231) [Link] (4 responses)

>IME that's not so true anymore. You will probably see more usage of stuff like NAT64 and 464xlat in mobile clients though (honestly, such schemes are not all that useful in the real world).

Mobile phones are dual stack, too, just like desktop/laptop computers. An operator can choose to use NAT64 to provide IPv4 connectivity from a IPv6-only handset, but that's their choice.

Mobile technology is newer and faster moving, old landlines are in the category "they work, so don't fix them".

>Many, if not most wired network connections in the states support v6 native, and I don't really see why that wouldn't keep growing, nobody actually likes having to deal with cgnat, including the ISPs.

The ISPs will still have to support IPv4 some way or another for a long time. In practice, this means some sort of NAT.

I guess most of LWN's subscribers are from the States, so it's fair to look at the numbers from a US perspective. The percentage Google reports for the US (51%) is above the average (~40%). The top three countries are France (70%), India (64%), and Germany (64%). Then there are countries with huge populations, and even a whole continent (Africa) which are seriously lagging behind.

IPv6

Posted Jun 25, 2022 18:51 UTC (Sat) by Wol (subscriber, #4433) [Link] (3 responses)

> Mobile technology is newer and faster moving, old landlines are in the category "they work, so don't fix them".

In the UK, "old landlines" will soon be history. Our POTS here has already been upgraded to VOIP - my old POTS phone is now plugged into my broadband router and works fine (for a somewhat jaded definition of "fine" :-(

Dunno about other countries in Europe, though ...

Cheers,
Wol

IPv6

Posted Jun 25, 2022 20:05 UTC (Sat) by Sesse (subscriber, #53779) [Link] (2 responses)

Here (Norway), the copper network (POTS, ISDN, DSL) is simply left to die; a little of it is still left, but if it breaks, it won't be fixed. Nearly all voice is 2G/4G/5G (3G has largely been turned down). Data is DOCSIS (cable), FTTH or 4G/5G.

IPv6

Posted Jun 26, 2022 11:27 UTC (Sun) by jem (subscriber, #24231) [Link] (1 responses)

Deployment of IPv6 in Norway is lagging six years behind the global average, though. (Based on the aforementioned Google stats: https://www.google.com/intl/en/ipv6/statistics.html#tab=p...)

IPv6

Posted Jun 26, 2022 11:29 UTC (Sun) by Sesse (subscriber, #53779) [Link]

This is very much true, and it is largely due to the incumbent ISPs lagging. Most countries' status is usually very much driven by what key people in a few select ISPs choose to care about. :-/

IPv6

Posted Jun 24, 2022 17:04 UTC (Fri) by Lennie (subscriber, #49641) [Link] (1 responses)

Maybe lwn.net has a certain regional bias when it comes to IPv6 traffic.

IPv6

Posted Jun 27, 2022 19:25 UTC (Mon) by ceplm (subscriber, #41334) [Link]

I think the bias is that more readers are IT professionals sitting on old IPv4 networks who use IPv6 only as a back-stop.

IPv6

Posted Jun 25, 2022 4:41 UTC (Sat) by alison (subscriber, #63752) [Link]

LWN is the funniest website that I read regularly, mostly intentionally. Keep up the good work!

IPv6

Posted Jul 9, 2022 5:53 UTC (Sat) by oldtomas (guest, #72579) [Link]

Inspired by yours, I tried an equally precise and rigorous experiment. Context: bog standard (Debian Gnu-)Linux box. I moved a couple of weeks ago. In my old flat, I gave up on IPv6 (your bog standard DSL, one of ghe Big Providers around here). In my new flat (same setup, the other of the Big Providers, yes, we have more than one)... surprise:

tomas@trotzki:~$ ping lwn.net
PING lwn.net(prod3.lwn.net (2600:3c03::f03c:91ff:fe82:68b2)) 56 data bytes
64 bytes from prod3.lwn.net (2600:3c03::f03c:91ff:fe82:68b2): icmp_seq=2 ttl=56 time=102 ms
64 bytes from prod3.lwn.net (2600:3c03::f03c:91ff:fe82:68b2): icmp_seq=3 ttl=56 time=102 ms
...

So it seems it's slowly coming, not just for smartphones

Whatever happened to SHA-256 support in Git?

Posted Jun 24, 2022 19:24 UTC (Fri) by dvdeug (guest, #10998) [Link] (2 responses)

My boss insists on turning off IPv6 on all computers we install. I'm not a fan, but he's the boss, and he's had problems with it historically.

Whatever happened to SHA-256 support in Git?

Posted Jun 25, 2022 17:27 UTC (Sat) by jezuch (subscriber, #52988) [Link] (1 responses)

> problems with it historically

...and people still recommend against XFS because of a data-eating bug that was fixed in 2005 *sigh*

Whatever happened to SHA-256 support in Git?

Posted Jun 27, 2022 8:47 UTC (Mon) by taladar (subscriber, #68407) [Link]

And spout nonsense like "never change a running system" because that is what some old person told them in the 70s.

Whatever happened to SHA-256 support in Git?

Posted Jun 25, 2022 16:20 UTC (Sat) by farnz (subscriber, #17727) [Link]

One fun thing about that is that you can expect to see more IPv6 traffic by byte volume or packet count than IPv4, but not necessarily by connection count.

A thing that drives IPv6 adoption in mobile is that data-intensive services like Netflix and YouTube are IPv6-enabled - so by enabling IPv6 for your customers, you can use stateless routing to get that traffic off your backbone and onto the video provider network nearer the cell site, whereas for CGNAT (including NAT64 and 464XLAT here), you have the complexity of maintaining distributed state to handle.

Whatever happened to SHA-256 support in Git?

Posted Jun 23, 2022 14:55 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (6 responses)

Was there discussion of prefixing sha256 hashes with some constant prefix to know that they're not sha1? For example, all sha256 hashes are prefixed with `h`, so commit `h000000` is known to not be a sha1. I skip over `g` because `gdeadbeef` is already common to demarcate hashes in snapshot tarballs in various places.

Whatever happened to SHA-256 support in Git?

Posted Jun 23, 2022 18:21 UTC (Thu) by klossner (subscriber, #30046) [Link] (5 responses)

Isn't the length of the hash all you need to distinguish? SHA1 hashes are 40 characters long while SHA256 hashes are 64. (Which breaks any home-brew software that operates on git trees and hard-codes the 40-character width.)

Whatever happened to SHA-256 support in Git?

Posted Jun 23, 2022 18:25 UTC (Thu) by bluss (guest, #47454) [Link] (1 responses)

A lot of the tools (for example git log --graph) use abbreviated hashes.

Whatever happened to SHA-256 support in Git?

Posted Jun 23, 2022 20:26 UTC (Thu) by wtarreau (subscriber, #51152) [Link]

There's no problem with that at all, not more than there is any with abbreviated commits nowadays. Git would just need to try to resolve an abbreviated commit to both SHA1 and SHA2 and complain in case of multiple matches. Then for the 40-char ones (SHA1) it would just have to do the same. In practice you won't design SHA2 hashes that purposely commit with SHA1, and the probability that it happens by accident is as low as having two identical SHA1 commits by accident, i.e. so close to zero that it practically is for our entire civilization.

Whatever happened to SHA-256 support in Git?

Posted Jun 24, 2022 16:18 UTC (Fri) by smammy (subscriber, #120874) [Link] (2 responses)

There's also Multihash, for what it's worth.

Whatever happened to SHA-256 support in Git?

Posted Jun 25, 2022 17:46 UTC (Sat) by ms-tg (subscriber, #89231) [Link] (1 responses)

> There's also Multihash, for what it's worth.

How can we get this amplified? From my understanding, adopting multihash would go a long way to future-proofing git, as there would be a single “before multihash” case to account for, and then all future iterations would be signaling the encoding in-band with the ability to add future options cleanly? Wouldn’t it?

Whatever happened to SHA-256 support in Git?

Posted Jun 27, 2022 15:24 UTC (Mon) by smammy (subscriber, #120874) [Link]

Git people are so into shortened hashes that I doubt they'd go for a format that requires a four-digit prefix. Multihash has been discussed but obviously that never went anywhere.

Whatever happened to SHA-256 support in Git?

Posted Jun 23, 2022 15:28 UTC (Thu) by zblaxell (subscriber, #26385) [Link] (5 responses)

> seen customers move to older much less functional (or useful) VCS platforms

Which VCS platforms both 1) don't use SHA1 and 2) don't introduce a ton of additional vulnerabilities compared to git?

Do these customers prefer to do the relentless auditing tasks to ensure the integrity of a centralized VCS? If the customers are doing it anyway, wouldn't that auditing also detect a successful SHA1 collision attack against a git repo?

Whatever happened to SHA-256 support in Git?

Posted Jun 23, 2022 16:02 UTC (Thu) by dullfire (guest, #111432) [Link] (1 responses)

While I don't know the details of any such corporation, it wouldn't surprise me if some organizations don't care about the security aspect, just that what ever solution they adopt isn't banned by a policy (presumable set by people who know better... but more likely people just blacklisting technical terms that they see bad reputations for).

Whatever happened to SHA-256 support in Git?

Posted Jun 26, 2022 3:16 UTC (Sun) by gdt (subscriber, #6284) [Link]

As a worked example, Australia's Information Security Manual states

Only hashing algorithms from the SHA-2 family are approved for use. When using SHA-2 for hashing, an output size of at least 224 bits is used, preferably SHA-384.

To use Git you cannot make any claim that hashing in Git contributes to addressing your organisation's threat model: say the threat of subversion of the repository. Then apply for an exception, arguing that Git's use of SHA-1 is out-of-scope as it is not implicated in any threat model. You may then be asked to show how the threat of subversion of the repository is countered, which could be GPG-signing each commit from a key only held on a trusted processor (eg, a Yubikey).

Of course this application for an exception may not be successful: not every organisation's security policy makers may have a deep technical understanding; not every application for an exception may fully address the threat model; and there may be a overarching policy of limiting exceptions in fields with large and widespread consequences, such as the supply chain threat from subversion of software builds.

Whatever happened to SHA-256 support in Git?

Posted Jun 23, 2022 17:13 UTC (Thu) by tlater (guest, #116684) [Link] (1 responses)

It's a regulation thing: https://csrc.nist.gov/Projects/Hash-Functions/NIST-Policy...

Various industries in the US require that you comply by those standards, and given how many companies at least work with US companies that means it tears through a lot of the world.

While it's a bit ridiculously broad to state you simply are not allowed to generate sha-1 hashes ever (one of the addendum documents makes this explicit), it does make sense from a policy perspective. Otherwise there's just no incentive to ever change, and some deeply rooted uses of sha-1 will be passed over and eventually found to be problematic.

If git can't adapt, in theory competitors should step up and through all that newfound industrial funding eventually become less of a mess. The policy makes sense, even if it in practice results in some pretty silly trade-offs in the short term.

Of course, companies should just spend the money to get that 10% of the work done, but well, not everybody lives in the open source world, and I imagine a lot of the people who decide where budget goes just understand git as yet another product, not a community project that they have the power to modify. I imagine they also look at the competitors and don't see the problems, especially given they likely migrated to git at some point in the past, so it's just regressing back to the state of 10 years ago, which isn't that long in the kind of industry that cares about regulation like this.

Whatever happened to SHA-256 support in Git?

Posted Jun 24, 2022 5:35 UTC (Fri) by wtarreau (subscriber, #51152) [Link]

The rule is not *that* drastic, it says:

"Federal agencies should stop using SHA-1 for generating digital signatures, generating time stamps and for other applications that require collision resistance. Federal agencies may use SHA-1 for the following applications: verifying old digital signatures and time stamps, generating and verifying hash-based message authentication codes (HMACs), key derivation functions (KDFs), and random bit/number generation. Further guidance on the use of SHA-1 is provided in SP 800-131A."

i.e. don't use it if you need security, but its other properties remain useful.

Whatever happened to SHA-256 support in Git?

Posted Jun 25, 2022 23:13 UTC (Sat) by salimma (subscriber, #34460) [Link]

I'm curious too. Mercurial also hasn't moved

https://www.mercurial-scm.org/wiki/SHA1TransitionPlan

Whatever happened to SHA-256 support in Git?

Posted Jun 23, 2022 19:48 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

The last time SHA-256 in git came up, I almost vomited from the clumsy format for hashes that they'd chosen. Hashes in SHA-1 are in binhex, why not instead just use a different alphabet to encode SHA-256 hashes?

And it's not hard to do. For example, instead of 0-9a-f use g-v. Or abuse the first letter of the hash as the version.

Whatever happened to SHA-256 support in Git?

Posted Jun 24, 2022 17:54 UTC (Fri) by khim (subscriber, #9252) [Link]

You can even do both at the same time: “h” would mean “normal SHA256-bit hash in 65 letters” and “stuvwxyz” would start “40-letters long SHA256-bit hash” (if you take ASCII, excluse 37 “really bad” symbols, e.g. first 32, 127, “ ”, “%”, “$” and “.” then you can encode 13 bits in two characters).

This way you would even have an option for these old tools where 40-letters space is reserved for hash.

This should not be a default because at some point even longer hash would be needed, most likely.

Whatever happened to SHA-256 support in Git?

Posted Jun 30, 2022 10:35 UTC (Thu) by Karellen (subscriber, #67644) [Link]

My first thought was to prefix hashes with the hash type and a separator, e.g. "sha256:0123...". That way you could add new hashes whenever the devs wanted, by just adding the algorithm and assigning a new prefix. Any hash already stored inside a git repo that is not preceded by a type is automatically assumed to be sha1. A commit with an sha256 hash could have parents with either sha256 or sha1 hashes, or even both!

The way they decided to go about things (last I checked) did seem a bit constraining.

Compliance nonsense

Posted Jun 24, 2022 2:31 UTC (Fri) by roguelazer (subscriber, #101286) [Link] (4 responses)

I know that I've had to have arduous discussions with several external auditors over why it's "okay" that git uses SHA-1 and why we're not switching to some commercial VCS that doesn't (obviously) use SHA-1; I imagine this is happening to corporations all over the world which are subject to various regulatory regimes with a "no SHA1 or MD5" checklist entry. I'm surprised no big corporation has dumped a bunch of funding into this project to satisfy such auditors.

Compliance nonsense

Posted Jun 24, 2022 7:51 UTC (Fri) by epa (subscriber, #39769) [Link] (3 responses)

Or make a quick and dirty fork of git which only uses SHA-256 and isn't backwards compatible or interoperable.

Compliance nonsense

Posted Jun 26, 2022 14:49 UTC (Sun) by jthill (subscriber, #56558) [Link] (2 responses)

Except that exists already, it's just Git. `git init --object-format=sha256` and your repo uses sha256 only and can't talk to sha1 repos. I'd be curious how easily the web frontends' private-server options can be made to use the new object format if they don't have to talk to any poor left-behind sha1 repos either.

As a side note, afaik all known or suspected collision-generating methods require some place to hide gobs of carefully-chosen noise bits in both colliding texts. pdf is a binary format and can hide arbitrary noise. source code can not. There is no possibility that anyone get an engineered source file past even the most cursory code review. The garbage would appear the first time anyone so much as glanced at the diffs.

Compliance nonsense

Posted Jun 28, 2022 11:49 UTC (Tue) by cortana (subscriber, #24596) [Link] (1 responses)

Eagerly awaiting a tool that produces collisions by adding commented ascii art to source code... :)

Compliance nonsense

Posted Jul 15, 2022 15:31 UTC (Fri) by epa (subscriber, #39769) [Link]

That's an interesting point. If the hash function is known to be weak (or you want to hedge against it becoming broken in future) then you could add an extra defence with a 'normalized hash'. If the file looks like C source code then strip out the comments, normalize the whitespace, and perhaps rename all the variables that aren't visible from outside the compilation unit. Then both the original content and the normalized one are hashed separately and both of these go into the final commit id.

Whatever happened to SHA-256 support in Git?

Posted Jun 24, 2022 10:20 UTC (Fri) by k3ninho (subscriber, #50375) [Link] (1 responses)

>What has happened here looks, to an extent at least, like a story that has played out numerous times over the course of free-software history. A problem has been identified, and a great deal of core foundational work has been done to solve it. That solution appears to be well considered and solidly implemented. In a sense, the job is 90% done. All that is left is the hard work of making the transition to a new hash easy for users — what could be thought of as "the other 90%" of the job.

It used to be 80:20, and I thought we learned to improve estimates based on past evidence, not get worse at them.

I've found myself saying 'changing a computer system changes how people work' quite a bit recently and, it doesn't seem unfair to note that here there's a change to the system that doesn't have corresponding effort to change the way users use git.

Part of the 'scratch your own itch' of the free software and open source community is that people adapt as part of adopting updated editions of the software they're involved in; part of the 'being in community' involves support and training to help other users out. While a code change might be in place and deemed 'done', the adoption and migration phases are not.

Notably with git, is there a need or any benefit to recomputing the history of a tree with SHA256 hashes, like some kind of Export-Transform-Load (ETL) task? Who would you trust to publish the first trees or to attest they've replicated the work?

K3n.

Whatever happened to SHA-256 support in Git?

Posted Jun 24, 2022 12:15 UTC (Fri) by dbnichol (subscriber, #39622) [Link]

This part of the article is what stuck me, too. Git has been around for nearly 20 years now. There are vast amounts of existing git repos with sha1 identifiers in them.

I'd say the project is at best 50% done if there's no interoperability with sha1 repos. Even if you switched git to default to sha256 on new repos and convinced all the major hosting providers to rewrite the history on all their repos to sha256 today, it would be years of pain before that trickled down through all the repos in the wild.

Unless there's compatibility with sha1 repos and a nearly automatic way to rewrite existing repos to sha256 in a compatible way, then it's essentially unusable. That seems like just as big a problem if not bigger than making git capable of using a different hashing algorithm.

At this point...

Posted Jun 26, 2022 18:34 UTC (Sun) by jd (guest, #26381) [Link] (3 responses)

...Switch to SHA-3. It won't, apparently, slow adoption at all and, at least, will still be secure by the time everyone uses it.

At this point...

Posted Jun 27, 2022 8:56 UTC (Mon) by kilobyte (subscriber, #108024) [Link] (1 responses)

BLAKE3 instead? Much faster, an earlier version of it was a SHA-3 finalist, there's no risk the NSA picked an algorithm they know how to break (there were some irregularities during the competition), can be arbitrarily parallelized.

At this point...

Posted Feb 14, 2024 7:57 UTC (Wed) by JeffBai (guest, #103577) [Link]

(Very late reply, but!) https://git-scm.com/docs/hash-function-transition/ mentions that blake2sp-256 was a contender. I have no interest in looking through the big mail thread to find out what happened to it. Could be something about OpenMP.

Plan is to make it easy to transition.

Posted Jun 30, 2022 13:53 UTC (Thu) by gmatht (subscriber, #58961) [Link]

As jthill already mentioned, we can already do `git init --object-format=sha256`. The difficulty is making it easy to transition to a new hash. Once it is easy to transition to a new hash, there is less need to pick a future-proof hash, and in the meanwhile sha256 has more widespread hardware acceleration. See: https://stackoverflow.com/questions/60087759/git-is-movin...

Whatever happened to SHA-256 support in Git?

Posted Dec 31, 2022 7:29 UTC (Sat) by luto (guest, #39314) [Link]

I don’t understand why interoperability can’t exist. Imagine a hybrid repository: objects can be hashed with any algorithm (SHA-1 or SHA-256). Objects hashed with SHA-256 can refer to any object, but objects hashed with SHA-1 can only refer to other objects hashed with SHA-1. You can also add a rule, per-repo, that commits claiming to be after a certain date or (transitively) referring to commits past a certain date can’t refer to trees that (transitively) refer to SHA-1 objects.

This means that conversion to SHA-256 is more or less an all-in affair. You start adding any SHA-256 objects, and you very quickly can’t add any new SHA-1 objects. But you could convert.

Whatever happened to SHA-256 support in Git?

Posted Oct 4, 2023 2:37 UTC (Wed) by xnox (subscriber, #63320) [Link] (1 responses)

Is conversion from sha1 to sha256 git format reproducible?

Whatever happened to SHA-256 support in Git?

Posted Oct 4, 2023 13:44 UTC (Wed) by geert (subscriber, #98403) [Link]

Yes it is. Both hashes are (eventually[*]) made from the same input.

[*] For blobs (files), this is obvious, as the hash is calculated from the file contents plus some file metadata.
For all other objects (trees, commits), the hash is calculated from tree or commit info plus hashes calculated before.


Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds