Announcing the first SHA-1 collision [LWN.net]

Announcing the first SHA1 collision

Posted Feb 23, 2017 15:26 UTC (Thu) by xav (guest, #18536) [Link] (15 responses)

Is there a consequence for git (which uses SHA-1 for its objects) ? I seem to remember it's not so harmful, but still ...

Announcing the first SHA1 collision

Posted Feb 23, 2017 16:23 UTC (Thu) by lamawithonel (subscriber, #86149) [Link] (7 responses)

shattered.io says this:

How is GIT affected?

GIT strongly relies on SHA-1 for the identification and integrity checking of all file objects and commits. It is essentially possible to create two GIT repositories with the same head commit hash and different contents, say a benign source code and a backdoored one. An attacker could potentially selectively serve either repository to targeted users. This will require attackers to compute their own collision.

Adding their two files to a repo in separate commits seemed to work OK, but running `git diff HEAD~1` has some interesting results.

diff --git a/shattered-2.pdf b/shattered-2.pdf
new file mode 100644
index 0000000..b621eec
Binary files /dev/null and b/shattered-2.pdf differ
lines 1-4/4 (END)

Announcing the first SHA1 collision

Posted Feb 23, 2017 16:33 UTC (Thu) by lamawithonel (subscriber, #86149) [Link]

Slash that last bit. Misread the output. Seems individual file collisions are probably OK in most cases.

Announcing the first SHA1 collision

Posted Feb 23, 2017 16:36 UTC (Thu) by georgm (subscriber, #19574) [Link] (1 responses)

The thing is that the files are not simply stored by their file's own SHA1, but are prefixed with blob and size, but this is fakable with this commit.

If I understand it correctly, you could craft two commits with same SHA-1, but manipulate the tree this commit represents with totally different content.

Announcing the first SHA1 collision

Posted Feb 23, 2017 16:43 UTC (Thu) by georgm (subscriber, #19574) [Link]

simple examle:

1. add a file "foo" with content "test" to a new git repo

2. git cat-file -p HEAD
tree 831915b1b2a9cc94f7ea7e14f4de9e00950ccc28
...

3. git cat-file -p 831915b1b2a9cc94f7ea7e14f4de9e00950ccc28
100644 blob 9daeafb9864cf43055ae93beb0afd6c7d144bfa4 foo

4. echo -e "blob 5\0test" | sha1sum
9daeafb9864cf43055ae93beb0afd6c7d144bfa4

Announcing the first SHA1 collision

Posted Feb 24, 2017 7:19 UTC (Fri) by alexl (subscriber, #19068) [Link] (3 responses)

It should be noted that git doesn't even verify the sha1 on pull by default.
See e.g.:
https://groups.google.com/forum/#!msg/binary-transparency...

So, unless you enable transfer.fsckObject or manually run git fsck certain of the attacks described in these threads are already possible.

Announcing the first SHA1 collision

Posted Feb 24, 2017 18:04 UTC (Fri) by hkario (subscriber, #94864) [Link] (1 responses)

what if the commit is signed?

Announcing the first SHA1 collision

Posted Feb 24, 2017 20:54 UTC (Fri) by flussence (guest, #85566) [Link]

That's another thing git doesn't check by default, so it won't help.

Malicilously replacing git objects

Posted Feb 25, 2017 13:10 UTC (Sat) by DigitalBrains (subscriber, #60188) [Link]

> It should be noted that git doesn't even verify the sha1 on pull by default.

That article links to Debian bug 813157, where the last message seems to claim the contrary if I read it right:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=813157#39

IIUC, SHA-1 checksums for objects aren't even transferred with a pull, let alone that they can be falsified.

Announcing the first SHA1 collision

Posted Feb 23, 2017 16:24 UTC (Thu) by anselm (subscriber, #2796) [Link] (5 responses)

With these collision attacks, the attacker comes up with two separate new documents that are constructed such that they have the same hash value. This is a lot easier than finding a new document that hashes to the specific given hash value of another (existing) document.

As far as git is concerned, an attacker would have to make two separate repositories that contain a collision (people would presumably then keep using the “benign” version but the attacker would be able to replace it with the “malicious” version even in the face of later changes). It is much more difficult to use this attack to introduce a collision into a pre-existing repository.

The lesson to take away from this as far as documents are concerned is to never sign anything without first tweaking it a bit (e.g., by adding a few random spaces here and there). That should mess up an attacker who has just spent an inordinate amount of CPU time coming up with another version of the document that has the same hash value as the one they gave you to sign.

Announcing the first SHA1 collision

Posted Feb 23, 2017 17:18 UTC (Thu) by georgm (subscriber, #19574) [Link]

Well. If you add binaries (like the PDFs), e.g. via pull request, you could add a forged PDF to a repository and then replace its contents with the forged version. The two PDFs would't have the same SHA1, but the same "git blob SHA1".

Noise must be early

Posted Feb 23, 2017 21:50 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (3 responses)

Note that _appending_ noise at the end doesn't help you, the bad guys will get the same results by adding the same noise to their imposter document because of how the MD-style hashing works, your noise must appear as early as possible in the document, before a collision has occurred. This (adding noise before signing) is the reason why you'll notice any SSL certificates you've bought in the last few years have gibberish as a "serial number". The serial number is one of the few signed elements of an SSL certificate which appears before the elements the subscriber will control like their name. Practical collision attacks for MD-style hashes (and SHA-1 is such a hash) revolve around inputs that force the internal state of the hash to have desirable properties, which you can't do if somebody gets to shove gibberish through the hash before you can get there.

This is intended as defence in depth against attacks in which the certificate applied for is one half of a collision (the other half being a target that the CA would not willingly sign, such as a CA:TRUE certificate or a certificate for a valuable name the attacker doesn't control)

Last time this was done (as proof of concept, not by actual bad guys) for MD5, the serial numbers from at least one major public CA were fairly predictable in sequence, allowing the attack team to conclude they should be able to purchase certificate #643015 in a week's time. So they could use that week to design a document for certificate #643015, produce a collision imposter for that document, and then as the appointed moment approached, "run up the score" by purchasing certificates #643012, #643013, #643014 and then purchase #643015, snip the signature off and apply it to their imposter document.

(A natural reaction from the CAs was that they could just revoke certificate #643015 after detecting this. Unfortunately, revocation in the X.509 PKI design applies to a serial number, so that doesn't revoke the imposter, which has a different serial number ...)

Noise must be early

Posted Feb 24, 2017 0:42 UTC (Fri) by anselm (subscriber, #2796) [Link] (2 responses)

Practical collision attacks for MD-style hashes (and SHA-1 is such a hash) revolve around inputs that force the internal state of the hash to have desirable properties, which you can't do if somebody gets to shove gibberish through the hash before you can get there.

And this is why we now have SHA-3, which is not susceptible to the “append stuff to a collision and still have a collision” artefact.

Noise must be early

Posted Feb 27, 2017 17:36 UTC (Mon) by nevyn (guest, #33129) [Link] (1 responses)

How is this possible, my understanding is that these attacks trick the hash function to be in the same state for both inputs at a certain point. Same state + same input = same result, no?

Noise must be early

Posted Feb 27, 2017 18:34 UTC (Mon) by excors (subscriber, #95769) [Link]

As far as I can see, it doesn't quite have the property that anselm claimed.

With SHA-1, SHA-2, etc, the output of the hash function is a copy of the state. Knowing the hash means you know the state and can do a length extension attack - given H(X), but unknown X, you can trivially compute H(pad(X) || Y) for any Y. (That's quite bad if X is a secret and you're using prepend-X-then-hash as a signature. You need something like HMAC to prevent that attack.)

If you have two different inputs with the same hash, you know they have the same state, so you can append the same blocks onto both inputs and they will continue colliding.

With SHA-3 the state is 1600 bits, and the output is some function that maps the state onto a smaller number of bits. Knowing the hash doesn't let you derive the whole state, so you can't do the length extension attack, and there's no need for HMAC. That's a useful property of SHA-3.

Two different inputs with the same 256-bit hash may not have the same 1600-bit internal state, so they may no longer collide after you append blocks to both of them. But if you construct two different inputs that lead to an identical internal state then you could append blocks and continue colliding, so it's still susceptible to that problem.

Announcing the first SHA1 collision

Posted Feb 23, 2017 21:25 UTC (Thu) by joey (guest, #328) [Link]

Here's a thread on the git mailing list about it http://www.spinics.net/lists/git/msg296649.html

Summary: It doesn't immediately break git, because the attack does not allow generating two git commits with the same sha1 that point to two different trees of files. This is a single-prefix attack; if a chosen-prefix attack later becomes available then that would be possible, which would break git rather badly.

This attack may result in some mitigations being added to git. Or even, in my wildest dreams, git getting the hash parameterized so repositories can be upgraded to sha3..

Announcing the first SHA1 collision

Posted Feb 23, 2017 16:22 UTC (Thu) by giggls (subscriber, #48434) [Link] (6 responses)

Would it be possible to combine say SHA-1 and MD5?

I seriously doubt that it is very likely (maybe it is even possible to compute this probability) to have matching SHA-1 and MD5 sums.

Announcing the first SHA1 collision

Posted Feb 23, 2017 17:33 UTC (Thu) by hkario (subscriber, #94864) [Link] (1 responses)

Combining MD5 with SHA-1 gives you the security of the stronger function only, not the sum or multiplication of the functions:
https://www.iacr.org/cryptodb/archive/2004/CRYPTO/1472/14...

IOW, security of MD5+SHA1 signatures is as strong (for practical purpouses) as SHA-1 alone

Announcing the first SHA1 collision

Posted Feb 24, 2017 9:07 UTC (Fri) by At0mik (guest, #90781) [Link]

I just rapidly went through the paper you suggested. Very interested reading.

However, if I understood correctly, the proof constructs on the fact that the 2 hash functions don't have any other attacks than the "birthday paradox".
In practice, SHA-1 and MD5, are vulnerable because researchers found sophisticated ways to construct a collision faster than the bruteforce.

In my limited understanding of the theory, SHA-1 + MD5, MAY still be as secure as a "completely secure" 160-bit hash function.

Seems like the google sha1 collision can be constructed in 2^50 time, which is much faster than 2^80 of a secure 160bit hash function.

Announcing the first SHA1 collision

Posted Feb 23, 2017 18:03 UTC (Thu) by flussence (guest, #85566) [Link] (3 responses)

Combining as in XOR'ing them together somehow would make matters worse.
Combining by concatenating would make almost zero difference (2⁶⁴ + 2⁸⁰ ≠ 2¹⁴⁴), and you'd just end up with a longer, slower output than using sha256 in the first place.

Announcing the first SHA1 collision

Posted Feb 27, 2017 13:33 UTC (Mon) by alankila (guest, #47141) [Link] (2 responses)

Isn't that a little bit shortsighted? Birthday paradox attacks are not practical at 160 bits, so nobody should care about them. But using 2 hash functions concatenated together places severe constraints for the manufactured hash collision, namely that it must succeed in colliding two hash functions simultaneously, which are completely different.

I'd imagine this would make searching for collisions that much harder, even if the other hash is much weaker, e.g. it may take 2^50 effort to discover a collision with SHA-1, but then it takes additional 2^x effort to make that collide with MD5 as well. I do not know what the literature says for the weakness of MD5 but if we for sake of argument assume that collision can be manufactured in 2^40 effort then together you need to do 2^90 worth of effort to collide the concatenation of SHA-1 and MD5, which can be considered secure again.

Announcing the first SHA1 collision

Posted Feb 27, 2017 14:02 UTC (Mon) by anselm (subscriber, #2796) [Link] (1 responses)

According to Joux, Multicollisions in Iterated Hash Functions. Application to Cascaded Constructions, using several MD5/SHA-1/2-style hash functions at the same time doesn't really buy you much as far as collision resistance is concerned. For example, using a single SHA-256 hash would be much more secure against collisions than concatenating an MD-5 hash and 128 bits of a SHA-1 hash to yield a “combined” 256-bit hash.

In addition, concatenating several hashes makes you much less secure against first-preimage attacks (e.g., hashed-password cracking).

Announcing the first SHA1 collision

Posted Feb 28, 2017 11:51 UTC (Tue) by alankila (guest, #47141) [Link]

I see. The attack described is basically stating that you can attack the weaker algorithm, e.g. MD5 in such a way that you can construct a long chain of collisions independently of each other, e.g. in this case you'd build 2^80 collisions of the hash by taking a message that is at least 80*128 bits long, and then one collision for each 128-bit block, yielding grand total of 80 pairs of blocks, all which collide to the same hash. Using those 80 pairs, you can generate 2^80 distinct messages, and have a high likelihood of generating a birthday paradox collision in the other hash as well.

However, my point was that the md5(M) || SHA1(M) combination would still be secure against this specific attack against SHA1, because of the necessity of finding collisions at specific position. You'd have to actually run the MD5 collision machine to generate alternatives at some specific location of the message, thus raising the cost back to exponential level.

Again, birthday paradoxes don't matter. The fact that combining md5 with sha1 together doesn't improve birthday paradox attack resistance much is not relevant, it's already hopelessly impractical to attack the hash by birthday paradox construction alone.

Announcing the first SHA1 collision

Posted Feb 23, 2017 23:22 UTC (Thu) by joey (guest, #328) [Link] (2 responses)

The repository this makes me worry about most is git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git

An attacker could take a working firmware file, disassemble it enough to add a conditional jump to a backdoor, spend $100k or so to generate two colliding versions of it, and submit the one that does not run the backdoor. Then they can provide a mirror of that repository (perhaps on github, people like to use mirrors of repositories on github for some reason) that contains the other version.

The modified firmware file would be 128 bytes longer, plus the size of the backdoor. So not a lot larger.

Question is, would enough people use the mirror long enough before someone noticed the difference, to make it worth their while?
And, would it be cheaper to spend $100k on backdooring a lot of different firmware files in obfuscated ways, and rely on the obfuscation to prevent the backdoors being noticed?

Announcing the first SHA1 collision

Posted Feb 24, 2017 1:10 UTC (Fri) by excors (subscriber, #95769) [Link]

Why not just upload the malicious version directly to git.kernel.org? It's not like anybody is going to notice the backdoor in your binary patch anyway.

If they do review it at all, the only thing they could reasonably do is compare against a vendor-supplied version of the firmware (under the necessary assumption that the vendor is trusted), but then they'd notice your collision-friendly non-backdoored version doesn't match the original, and it would be hard to justify why you added random bytes onto the end, so you still wouldn't get any real benefit from your colliding files.

Announcing the first SHA1 collision

Posted Feb 24, 2017 7:04 UTC (Fri) by nhippi (subscriber, #34640) [Link]

This collision is not useful for "attack random people" attack. There are so many simpler and cheaper ways to hack a bunch of random boxes. Even if you are being personally targeted, it's easier and cheaper to just find out what software you use and find a vulnerability in it. At least for now - of course the price and time for this attack will only get cheaper over time.

Announcing the first SHA-1 collision

Posted Mar 3, 2017 21:16 UTC (Fri) by mina86 (guest, #68442) [Link] (1 responses)

But is this enough for TOR to admit that 80-bit truncated SHA-1 is a terrible idea?

Announcing the first SHA-1 collision

Posted Mar 6, 2017 14:44 UTC (Mon) by robbe (guest, #16131) [Link]

I don’t think anybody at Tor had the hots for SHA1 in the last 5+ years.

You conveniently forgot to be specific, but I’ll assume you are talking about hostnames of onion services. Proposition 224 will fix that, but is not making progress at stellar pace. I don’t have insight whether qualified reviewers or implementors are lacking.

I guess anybody could help change that by sponsoring. Most institutional sponsors of Tor have little interest in onion services.