in theory, two patches could be one character away from being a collision.
the thing is that none of the methods that are known to generate collisions can do so to this extent. they all depend on creating a large, randomish blob in one or both files.
please educate me here.
I was under the impression that what people had succeeded in doing was to create two files with the same hash, not take a file that someone else generated and create a new file with the same hash as the original. I was further under the impression that to make this match, both files end up with large chunks of randomish data in them.
I think that everyone is in agreement that if a hash is broken to the point where someone can take an existing file and create a new file that looks reasonable with the same hash there is a serious problem.
where there is disagreement is if the current state of affairs, where someone looking at one or both of the files will see that something is weird here (they do not look like normal C source code), is there a serious problem.
I know there are a lot of people doing research on hash collisions. If someone can dig up two source code files that have the same hash, from any source it would go a long way towards making your case.
Even then there is the question of if one could plausibly be a replacement for the other, but just finding two source files would be a better start then all the theoretical arguing that you have been doing.
Not sure how much more or less difficult doing something similar with 'C' source code would be. But I doubt that SHA1 is anywhere close to this level of brokenness.
Hash collisions
Posted Feb 2, 2010 14:09 UTC (Tue) by otaylor (subscriber, #4190)
[Link]
It's hard for me to see how my few sentence comment could possibly considered as "all the theoretical arguing that you have been doing." My point was not that I know of any way of generating dangerous collisions, or that I am losing a single second of sleep over the security of my GIT repositories, but rather that I found the argument "It would be quite a task to generate a hash collision that also compiles as valid C code" weak. The current collision generating attacks I'm aware of (not specifically talking about SHA1) don't require generating a new file from scratch, but rather inserting random-looking data into a padding section of a file format. It doesn't seem a huge step from there to inserting "steganographered" random data. But even restricting to the simplest case of random-looking data at the end of the file, one out of every 65536 random-looking data blocks ends with '*/'... Anyways, I'm not an academic or even amateur cryptographer, and have no intention of becoming such, so while I try to avoid talking total nonsense, if you find posts based on general considerations offensive, please feel free to ignore what I write.