Announcing the first SHA1 collision

Posted Feb 23, 2017 16:24 UTC (Thu) by anselm (subscriber, #2796)
In reply to: Announcing the first SHA1 collision by xav
Parent article: Announcing the first SHA-1 collision

With these collision attacks, the attacker comes up with two separate new documents that are constructed such that they have the same hash value. This is a lot easier than finding a new document that hashes to the specific given hash value of another (existing) document.

As far as git is concerned, an attacker would have to make two separate repositories that contain a collision (people would presumably then keep using the “benign” version but the attacker would be able to replace it with the “malicious” version even in the face of later changes). It is much more difficult to use this attack to introduce a collision into a pre-existing repository.

The lesson to take away from this as far as documents are concerned is to never sign anything without first tweaking it a bit (e.g., by adding a few random spaces here and there). That should mess up an attacker who has just spent an inordinate amount of CPU time coming up with another version of the document that has the same hash value as the one they gave you to sign.

Announcing the first SHA1 collision

Posted Feb 23, 2017 17:18 UTC (Thu) by georgm (subscriber, #19574) [Link]

Well. If you add binaries (like the PDFs), e.g. via pull request, you could add a forged PDF to a repository and then replace its contents with the forged version. The two PDFs would't have the same SHA1, but the same "git blob SHA1".

Noise must be early

Posted Feb 23, 2017 21:50 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (3 responses)

Note that _appending_ noise at the end doesn't help you, the bad guys will get the same results by adding the same noise to their imposter document because of how the MD-style hashing works, your noise must appear as early as possible in the document, before a collision has occurred. This (adding noise before signing) is the reason why you'll notice any SSL certificates you've bought in the last few years have gibberish as a "serial number". The serial number is one of the few signed elements of an SSL certificate which appears before the elements the subscriber will control like their name. Practical collision attacks for MD-style hashes (and SHA-1 is such a hash) revolve around inputs that force the internal state of the hash to have desirable properties, which you can't do if somebody gets to shove gibberish through the hash before you can get there.

This is intended as defence in depth against attacks in which the certificate applied for is one half of a collision (the other half being a target that the CA would not willingly sign, such as a CA:TRUE certificate or a certificate for a valuable name the attacker doesn't control)

Last time this was done (as proof of concept, not by actual bad guys) for MD5, the serial numbers from at least one major public CA were fairly predictable in sequence, allowing the attack team to conclude they should be able to purchase certificate #643015 in a week's time. So they could use that week to design a document for certificate #643015, produce a collision imposter for that document, and then as the appointed moment approached, "run up the score" by purchasing certificates #643012, #643013, #643014 and then purchase #643015, snip the signature off and apply it to their imposter document.

(A natural reaction from the CAs was that they could just revoke certificate #643015 after detecting this. Unfortunately, revocation in the X.509 PKI design applies to a serial number, so that doesn't revoke the imposter, which has a different serial number ...)

Noise must be early

Posted Feb 24, 2017 0:42 UTC (Fri) by anselm (subscriber, #2796) [Link] (2 responses)

Practical collision attacks for MD-style hashes (and SHA-1 is such a hash) revolve around inputs that force the internal state of the hash to have desirable properties, which you can't do if somebody gets to shove gibberish through the hash before you can get there.

And this is why we now have SHA-3, which is not susceptible to the “append stuff to a collision and still have a collision” artefact.

Noise must be early

Posted Feb 27, 2017 17:36 UTC (Mon) by nevyn (guest, #33129) [Link] (1 responses)

How is this possible, my understanding is that these attacks trick the hash function to be in the same state for both inputs at a certain point. Same state + same input = same result, no?

Noise must be early

Posted Feb 27, 2017 18:34 UTC (Mon) by excors (subscriber, #95769) [Link]

As far as I can see, it doesn't quite have the property that anselm claimed.

With SHA-1, SHA-2, etc, the output of the hash function is a copy of the state. Knowing the hash means you know the state and can do a length extension attack - given H(X), but unknown X, you can trivially compute H(pad(X) || Y) for any Y. (That's quite bad if X is a secret and you're using prepend-X-then-hash as a signature. You need something like HMAC to prevent that attack.)

If you have two different inputs with the same hash, you know they have the same state, so you can append the same blocks onto both inputs and they will continue colliding.

With SHA-3 the state is 1600 bits, and the output is some function that maps the state onto a smaller number of bits. Knowing the hash doesn't let you derive the whole state, so you can't do the length extension attack, and there's no need for HMAC. That's a useful property of SHA-3.

Two different inputs with the same 256-bit hash may not have the same 1600-bit internal state, so they may no longer collide after you append blocks to both of them. But if you construct two different inputs that lead to an identical internal state then you could append blocks and continue colliding, so it's still susceptible to that problem.