LWN.net Logo

Hash collisions

Hash collisions

Posted Feb 1, 2010 23:53 UTC (Mon) by otaylor (subscriber, #4190)
In reply to: Hash collisions by epa
Parent article: Security in the 20-teens

Any particular patch is only a tiny change away from generating a hash collision with any other patch. The only question is knowing *what* change to make. You could do it by inserting 128 bits of data in any only place, but you could also do it by distributing changes around in smaller places, like added spaces, or alternate wording in a comment. Given the ability to generate hash collisions, it may be only a small step to generating innocuous looking hash collisions based on a desired end-goal. But yes, introducing evil code through collisions is the hard way around. (Linus had a mail analyzing this at one point.)


(Log in to post comments)

Hash collisions

Posted Feb 2, 2010 0:54 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

in theory, two patches could be one character away from being a collision.

the thing is that none of the methods that are known to generate collisions can do so to this extent. they all depend on creating a large, randomish blob in one or both files.

please educate me here.

I was under the impression that what people had succeeded in doing was to create two files with the same hash, not take a file that someone else generated and create a new file with the same hash as the original. I was further under the impression that to make this match, both files end up with large chunks of randomish data in them.

I think that everyone is in agreement that if a hash is broken to the point where someone can take an existing file and create a new file that looks reasonable with the same hash there is a serious problem.

where there is disagreement is if the current state of affairs, where someone looking at one or both of the files will see that something is weird here (they do not look like normal C source code), is there a serious problem.

I know there are a lot of people doing research on hash collisions. If someone can dig up two source code files that have the same hash, from any source it would go a long way towards making your case.

Even then there is the question of if one could plausibly be a replacement for the other, but just finding two source files would be a better start then all the theoretical arguing that you have been doing.

Hash collisions

Posted Feb 2, 2010 10:27 UTC (Tue) by copsewood (subscriber, #199) [Link]

MD5 was broken totally when it became possible to have 2 SSL certs that hashed to the same value.

Not sure how much more or less difficult doing something similar with 'C' source code would be. But I doubt that SHA1 is anywhere close to this level of brokenness.

Hash collisions

Posted Feb 2, 2010 14:09 UTC (Tue) by otaylor (subscriber, #4190) [Link]

It's hard for me to see how my few sentence comment could possibly considered as "all the theoretical arguing that you have been doing." My point was not that I know of any way of generating dangerous collisions, or that I am losing a single second of sleep over the security of my GIT repositories, but rather that I found the argument "It would be quite a task to generate a hash collision that also compiles as valid C code" weak. The current collision generating attacks I'm aware of (not specifically talking about SHA1) don't require generating a new file from scratch, but rather inserting random-looking data into a padding section of a file format. It doesn't seem a huge step from there to inserting "steganographered" random data. But even restricting to the simplest case of random-looking data at the end of the file, one out of every 65536 random-looking data blocks ends with '*/'... Anyways, I'm not an academic or even amateur cryptographer, and have no intention of becoming such, so while I try to avoid talking total nonsense, if you find posts based on general considerations offensive, please feel free to ignore what I write.

Hash collisions

Posted Feb 2, 2010 15:18 UTC (Tue) by jsatchell (guest, #6236) [Link]

Important point about hashes: finding a collision is not the same as solving the pre-image problem.

The acknowledged weakness in SHA1 means that it is possible to find a pair of texts, given control over both texts, that hash to the same value much faster than the expected 2^80 steps.

But most attacks on a VCS involve taking a known text, and finding another text that hashes to the same value; often this is just solving the pre-image problem, which is much harder. There is no suggestion in the open literature that the collision weakness of SHA1 is matched by a comparable pre-image one.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds