LWN.net Logo

Advertisement

GStreamer, Embedded Linux, Android, VoD, Smooth Streaming, DRM, RTSP, HEVC, PulseAudio, OpenGL. Register now to attend.

Advertise here

MD5 collisions

June 15, 2005

This article was contributed by Joe 'Zonker' Brockmeier.

It may be time to retire MD5. The MD5 Message-Digest Algorithm RFC says that "It is conjectured that it is computationally infeasible to produce two messages having the same message digest, or to produce any message having a given prespecified target message digest." At the time, this may have been true -- the RFC was written in 1992 -- but a number of researchers are finding that MD5 hashes aren't as unique as one might like.

Within the last year several researchers have come forward with results that show it's possible to create meaningful collisions of MD5 hashes. Dan Kaminsky published "MD5 to be considered harmful someday" (PDF) in December 2004; this paper describes the creation of two executables with the same MD5 hash using a tool called Stripwire (available here). Kaminsky writes that this would be an "excellent vector for malicious developers to get unsafe code past a group of auditors, perhaps to acquire a required third party signature."

Alternatively, build tools themselves could be compromised to embed safe versions of dangerous payloads in each build. At some later point, the embedded payload could be safely "activated", without the MD5 changing. This has implications for Tripwire, DRM, and several package management architectures.

Kaminksy isn't the only one to find ways around MD5. Vlastimil Klima published Finding MD5 Collisions - a Toy For a Notebook in March of this year, where he describes finding MD5 collisions in 8 hours on a notebook PC with a 1.6 GHz Pentium. Arjen Lenstra, Xiaoyun Wang and Benne de Weger published "a method for constructing pairs of X.509 certificates where the "to be signed" parts of the certificates form a collision for MD5. Xiaoyun Wang and Hongbo Yu published a paper this year on how to break MD5 (PDF) and other hash functions.

Now Stefan Lucks and Magnus Daum have come up with a method for creating two documents with the same digital signature. Lucks and Daum describe creating two postscript documents, using Wang and Yu's attack, that have meaningful content and the same MD5 hash. They describe a scenario between "Alice and her boss" where Alice creates two postscript documents with the same MD5 hash. One, which is presented for a digital signature, is a letter of recommendation - the other is a document granting "Alice" access to confidential information.

The files are available for download from the Institute for Cryptology and IT-Security website. If one opens the files with a text editor, the content for both the letter of recommendation and the order are present, but manipulated so that only one letter is displayed in a normal postscript viewer. Lucks and Daum demonstrate that the MD5 hash collision attacks are not just hypothetical attacks with no practical applications.

Given the number of practical attacks on MD5, it may be time to move to a Federal Information Processing Standards (FIPS) approved hash algorithm, such as SHA-256, or SHA-512. Note that vulnerabilities have recently been found in SHA-1, however, and NIST is already planning to phase it out by 2010.


(Log in to post comments)

MD5 collisions

Posted Jun 16, 2005 5:47 UTC (Thu) by iabervon (subscriber, #722) [Link]

Note, however, that the Lucks and Daum paper is really a social engineering and judgement error; it involves getting the victem to sign a malicious document based on accepting an innocent-looking rendering. Then the attacker can replace one block of non-rendered garbage with a different block of non-rendered garbage, in a way that the weak collision attack on MD5 allows, and thereby tell the scripting engine to render entirely different text.

Of course, the document the victem is given actually contains the malicious text, which may be a sufficient attack even without the block swap; the attacker could just get a document with malicious comments signed, and then claim that the signer had signed off on the stuff in the comments. Or, as someone pointed out on slashdot, the document could display differently depending on the size of the output device or any number of other factors. These days, PDFs can even make HTTP requests after the signature check, and render the result.

Essentially, the class of documents that the Lucks and Daum attack works on is really a class which shouldn't be signed without a detailed analysis, even if this attack were not available.

MD5 collisions

Posted Jun 16, 2005 6:48 UTC (Thu) by beejaybee (guest, #1581) [Link]

"Given the number of practical attacks on MD5, it may be time to move to a Federal Information Processing Standards (FIPS) approved hash algorithm, such as SHA-256, or SHA-512."

Assuming we can be persuaded that these are fundamentally more secure. My guess is that rather less effort has been directed at breaking these than MD5 and SHA-1 given that there is less practical value in breaking them.

"Note that vulnerabilities have recently been found in SHA-1"

Sure.

But isn't there a fundamental point here - if we sign a document with both MD5 and SHA-1, it becomes at least several orders of magnitude harder to fabricate a forgery, and several orders of magnitude more than 8 hours is reasonably safe - maybe more so than a relatively unresearched, untried algorithm.

Fact of the matter is, whatever algorithm is used, the possibility of hash collisions can _never_ be discounted.

MD5 collisions

Posted Jun 24, 2005 20:29 UTC (Fri) by xorbe (subscriber, #3165) [Link]

That's what I said -- sign with 2 orthogonal hashes (vs doubling one algorithm's output).

MD5 collisions

Posted Jun 16, 2005 9:54 UTC (Thu) by copsewood (subscriber, #199) [Link]

I think the moral of this story is that the person signing something should preferably have a hand in authoring it in a way that will change its hash, e.g. by modifying whitespace in an unpredictable manner, or by adding a few random characters extraneous to the message being signed. Also, given the possibility of document rendering compromise, the least bad signing device is one which is only used for cryptographic applications, is under the physical control of the person signing the document, and can display what is being signed using a simple format prior to signing.

Not that ink on paper signatures are very secure either.

MD5 collisions

Posted Jun 16, 2005 12:32 UTC (Thu) by ekj (subscriber, #1524) [Link]

well-designed digital signature applications (i.e. gpg) already do this.

A random (program-selected) number is prepended to the document before hash is computed and signed. So in essence the gpg signature says:

"The hash of 35fc1270fb13d260 plus the file is: [so and so]"

The published attacks don't work against this. Though offcourse it's possible that other attacks will.

MD5 collisions

Posted Jun 16, 2005 12:42 UTC (Thu) by pdc (guest, #1353) [Link]

This is one reason why people should be trained to understand the difference between plain text (in the programmer's sense of the word) and more complicated formats like PostScript and Microsoft Word, where a lot of the bytes in the file are invisible in the rendered version. If you look at the PostScript documents used in the paper, the inclusion of invisible binary data is quite obvious; a plain text document would not be vulnerable to this particular attack. Programs for signing documents could be designed to refuse to sign anything except the actual words.

The trick is we want to use signing of binaries to help us avoid trojan-horse device-drivers being put on company's web sites, and in this case, the possibility of adding a binary lump to the executable in order to fiddle the MD5 hash is unavoidable. We could instead insist on drivers being supplied as signed source code to be compiled by the installer (Gentoo style) but in practice source code is a difficult read so it is not likely to be scrutinized in detail by the person who has downloaded it. I can just about imagine a code-signing protocol where a canonical version of the source code is automatically extracted (stripping comments) so that hiding binary junk in the code to fiddle the digest is harder because it has to be compilable.

MD5 collisions

Posted Jun 16, 2005 15:55 UTC (Thu) by Ross (guest, #4065) [Link]

One way to fix this problem is for the signer to add some text to the
document describing what he or she thinks they are signing.

Another way to minimize the problem is to use different keys for different
roles. The "secret security clearance" key should probably be used only to
sign those types of documents.

Only practical for temporary subterfuges

Posted Jun 17, 2005 5:56 UTC (Fri) by skybrian (subscriber, #365) [Link]

The method used to "forge" a Postscript document might work for a while so long as nobody
suspects a forgery. But if the person who allegedly signed the document ever sees the new
document and says "I never signed that" and brings it to the attention of any reasonably clueful
investigator, it should be pretty trivial to "view source" and see that a forgery was attempted -
you see *both* the orginal and the attempted forgery. This is so incredibly suspicious that it
ought to be enough to repudiate the signature and maybe even get the forger convicted. So at
most it's going to cause some temporary confusion that very quickly unravels. The forger
has to assume that the forgery will eventually be detected, which limits the sort of things this
attack could be used for.

Only practical for temporary subterfuges

Posted Jun 17, 2005 11:38 UTC (Fri) by kleptog (subscriber, #1183) [Link]

Another thing to remember is that a digital signiture has only the same force as a normal signiture, ie not terribly much legally.

If you can demonstrate that your were under duress or that you wern't signing what you thought you were signing then your signiture is invalid. Just because your signiture appears on something doesn't prove anything, if you say you didn't willingly and knowingly sign it, they have to provide reasonable evidence that you did.

Ofcourse, with digital signitures the problems are larger because you can grant access to many things automatically without some person checking if it makes sense.

Only practical for temporary subterfuges

Posted Jun 19, 2005 0:45 UTC (Sun) by giraffedata (subscriber, #1954) [Link]

The overwhelming presumption when your signature appears on something is that you meant to agree to whatever the document says. Even where we don't think it's true, we presume it in order to be fair to the other parties. Except in specific areas where signers are presumed to be idiots, you would have a heavy burden of proof to show that you didn't mean to agree.

But note that this issue of repudiation doesn't even apply to the example given -- someone releases confidential information based on seeing the boss' signature. It doesn't matter if the boss later proves he didn't sign it.

Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds