LWN.net Logo

Advertisement

E-Commerce & credit card processing - the Open Source way!

Advertise here

Comprehensive integrity verification with md5deep (Linux.com)

Mayank Sharma looks at md5deep on Linux.com. "Most of the ISO images and other software you grab off the Internet come with a message digest -- a cryptographic hash value that you can use to verify their integrity. While almost all Linux distributions come with utilities to read and generate digests using MD5 and SHA1 hash functions, the md5deep utilities can do that and more. md5deep computes MD5, SHA-1, SHA-256, Tiger, and Whirlpool digests across Linux, Windows, Mac OS X, *BSD, Solaris, and other operating systems. It can recursively traverse directories, computing sums for files under subdirectories as well."
(Log in to post comments)

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 23, 2007 15:11 UTC (Thu) by jengelh (subscriber, #33263) [Link]

-rwxr-xr-x 1 root root  27368 Mar 11 23:53 md5sum
-rwxr-xr-x 1 root root 396980 Feb  7  2007 openssl
-rwxr-xr-x 1 root root  31464 Mar 11 23:53 sha1sum
-rwxr-xr-x 1 root root  39656 Mar 11 23:53 sha224sum
-rwxr-xr-x 1 root root  39656 Mar 11 23:53 sha256sum
-rwxr-xr-x 1 root root  84712 Mar 11 23:53 sha384sum
-rwxr-xr-x 1 root root  84712 Mar 11 23:53 sha512sum

That does SHA1, 128(*), 224, 256, 384, 512, MD5, RMD160 and some other exotics - I really don't need yet another do-it-all tool to do what is already available - and installed by default (on a Linux system, mind ye) I might add. Are there even Tiger/Whirlpool users? If there were, I am sure someone would have already posted a TIGERSUMS file next to their ISOs.

(*) No idea what `openssl -sha` calculates, but its output is different from sha1sum.

Googling around.
MD5SUMS: 1,540,000.
SHA1SUMS: 134,000.
TIGERSUMS: 0.
WHIRLPOOLSUMS: 3. (Whoops!)

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 23, 2007 16:21 UTC (Thu) by Los__D (subscriber, #15263) [Link]

I don't really think the many algorithms is the major point of md5deep, but besides that, you are quite right. The *sum programs and a bit of shell can do exactly the same.

"openssl dgst -sha" is sha-0 AFAIK, you'd use "openssl dgst -sha1" for the same result as sha1sum.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 24, 2007 0:35 UTC (Fri) by jengelh (subscriber, #33263) [Link]

Ah, pity these openssl manpages and help (openssl dgst -invalidoption), they don't show all options :-(

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 24, 2007 1:17 UTC (Fri) by Los__D (subscriber, #15263) [Link]

On mine, sha, sha1, sha256 and sha512 is in the help.

Both sha and sha1 is in the manpage here, but, strangely enough, neither sha256 or -512 is.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 24, 2007 1:19 UTC (Fri) by jengelh (subscriber, #33263) [Link]

I was especially trying "dgst -sha224". It works, but nowhere as much documented as 256 or 512.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 24, 2007 2:26 UTC (Fri) by Oddscurity (subscriber, #46851) [Link]

Given that OpenBSD prides itself on their documentation, or so I gather, I don't think they'd mind if you filed a bug against their OpenSSL documentation regarding these missing options.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 24, 2007 2:29 UTC (Fri) by Los__D (subscriber, #15263) [Link]

Of course, you're right. On my way! :)

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 24, 2007 2:53 UTC (Fri) by Los__D (subscriber, #15263) [Link]

Sent, thanks for the nudge.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 24, 2007 2:33 UTC (Fri) by Los__D (subscriber, #15263) [Link]

Wait, OpenSSL has nothing to do with OpenBSD, that's OpenSSH.

Doesn't change that it should be reported, though.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 24, 2007 2:35 UTC (Fri) by Oddscurity (subscriber, #46851) [Link]

Ah yes! My bad there :)

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 25, 2007 0:12 UTC (Sat) by sitaram (subscriber, #5959) [Link]

> *sum program*sum programs and a bit of shell can do exactly the same.

True, but md5deep is blazingly fast compared to md5sum or sha1sum -- approximately 5 times as fast in CPU time, in my crude benchmarks with my own data.

Actually, "openssl dgst" is a tad faster than even md5deep, but (AFAIK) it doesn't recurse.

I often use it to figure out what kind of stuff has changed between 2 directories.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 28, 2007 6:22 UTC (Tue) by ekj (subscriber, #1524) [Link]

There's no point to that if they are on the same computer. Computing the sums require reading the entire content of both directories, in which case you could just aswell compare the content directly (say with diff) instead of doing the rundabout of creating a sum and then compare the sum only to find out if there's differences at all, and then if yes, read the file again to figure out *what* the difference is.

If it's on two different computers it makes sense though, since transfering the sums will be much faster than transfering the complete files if the files are large.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 28, 2007 10:00 UTC (Tue) by sitaram (subscriber, #5959) [Link]

Not true. md5deep catches stuff that has been renamed but whose content is identical. A plain directory compare has no hopes of doing that.

Think of the hash as a pseudo-filename that is directly linked to the actual content.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 28, 2007 10:29 UTC (Tue) by jengelh (subscriber, #33263) [Link]

<p><i>A plain directory compare has no hopes of doing that.</i></p>

<p><code>
find . -type f -print0 | xargs -0 md5sum <b>| sort</b>
</code></p>

<p>But that is as less plain as md5deep.</p>

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 28, 2007 10:30 UTC (Tue) by jengelh (subscriber, #33263) [Link]

A plain directory compare has no hopes of doing that.

find . -type f -print0 | xargs -0 md5sum | sort

But that is as less plain as md5deep. (Someone remove the old post, and this parentheses.)

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 23, 2007 17:36 UTC (Thu) by smoogen (subscriber, #97) [Link]

The main reason for using Deep is in forensics.. or in the case of otehr images where I want to high confidence that it hasnt been tampered with.
In these cases, I want to say that:

file X had size X1, md5sum X2, sha1sum X3, sha2sum X4, tigersum X5, etc
file Y had size X1, md5sum Y2, sha1sum Y3, sha2sum Y4, tigersum Y5, etc

while there are some set of files that have the size of X1, and may have same md5sum's.. there is a 'smaller' set that have the same md5sum && sha1sum, and a smaller set of checksums that have the same md5sum && sha1sum && sha2sum, etc.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 23, 2007 23:29 UTC (Thu) by djao (subscriber, #4263) [Link]

What a lot of people don't understand about hash functions is that comparing multiple hash values from multiple hash functions is basically useless.

The Wikipedia article on hash functions has a brief paragraph explaining why multiple hash functions do not result in any better security than simply using the best hash function out of the bunch by itself. Although the article cites redundancy as a legitimate benefit of multiple hashes, this benefit is meaningless in a forensics context, since forensics pertains to the study of information from the past, and we know exactly which hash functions have been broken in the past at any particular point in time.

The only situation in which a user might benefit from hash function redundancy is when some resource needs to be secured in a forward-looking manner against possible future breakthrouths. This is almost the exact opposite of forensics.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 24, 2007 3:54 UTC (Fri) by gouyou (subscriber, #30290) [Link]

The only situation in which a user might benefit from hash function redundancy is when some resource needs to be secured in a forward-looking manner against possible future breakthrouths. This is almost the exact opposite of forensics.

Taking a look at auditing logs is 100% part of forensics. So t can be part of the strategy to permit future forensics: keeping a couple of files with hashes from your system over time is easily doable and should not consume too much space, keeping a complete backup over time is often not possible. This would let you get a chance to do the forensics, especially if your system might have been compromised for several months ago.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 24, 2007 9:14 UTC (Fri) by smoogen (subscriber, #97) [Link]

Thankyou for pointing that out. I hadnt thought of what I was saying as the concatenation argument.. but it does seem to be.

I would argue that it might not be the exact opposite of forensics. You have a time frame where you have taken evidence of a box and you do not know when that forensics will be presented at some point in the future. Therefore you would want to ensure that your evidence is robust enough to stand challenges that md5sum or sha1sum or blawsum is weak.. but you could say that sha2sum and foosum arent. I guess I am saying redundancy of evidence might be important in an evidence chain in this case.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 24, 2007 10:39 UTC (Fri) by etrusco (subscriber, #4227) [Link]

Wrong. The wikipedia article clearly is talking about "chaining" hash functions.
Of course storing the hashes of different functions is safer; it's most certainly impossible (and probably mathematically provable/ed) to forge two packets of data which match hashes for both e.g. MD5 and SHA1.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 26, 2007 10:49 UTC (Sun) by djao (subscriber, #4263) [Link]

If you think about what you said for about five seconds, you'll realize that concatenating hash values is mathematically equivalent to storing both hash values.

Your second sentence is nonsensical in light of the fact that the Joux multicollision attack is an attack to accomplish exactly that which you claim is "most certainly impossible."

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 28, 2007 6:28 UTC (Tue) by ekj (subscriber, #1524) [Link]

it's most certainly impossible (and probably mathematically provable/ed) to forge two packets of data which match hashes for both e.g. MD5 and SHA1.

Complete nonsense. Aslong as the input is larger than the output there *HAS* to be collisions. So, if your input is larger than sizeof(md5hash)+sizeof(sha1hash) then it is a mathemathical certanity that collisions will exist. If they are "much" larger (as is mostly the case) then there exists gazillions of colissions.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 29, 2007 19:30 UTC (Wed) by etrusco (subscriber, #4227) [Link]

And it just takes gazillions of years to calculate them...

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 30, 2007 0:41 UTC (Thu) by ekj (subscriber, #1524) [Link]

Assuming the hash is good, it takes a lot of time, or an implausible amount of luck, yes.

Thing is, we don't actually have even a single hash that we *know* is good. We've got several that we *think* are good, but we did in the past too, and more than once we had unwelcome surprises. (md5 is called md_*5* for a reason)

In practice though, if the weakest link in your security-chain is the strength of sha-256 or any of the othern modern hash-functions (hell even md5 is decent in practice), then your security is better than 99.9% of the stuff out there.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 24, 2007 5:14 UTC (Fri) by MKallas (guest, #38539) [Link]

I'd really appreciate if browsers would autocheck for md5sum and .sig files and show whether the files had been validated.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 24, 2007 5:45 UTC (Fri) by Kluge (subscriber, #2881) [Link]

Looks like there was a project to implement this for Firefox in the Google SoC: http://lwn.net/Articles/245785/

Unfortunately, neither the Firefox devs or the IETF liked the way it was implemented.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 24, 2007 5:43 UTC (Fri) by drag (subscriber, #31333) [Link]

Unfortunatly the guy failed to mention that storing hashes of system files for security is almost completely worthless unless you can store them in a read-only format or on a unaccessable volume. If the hashes, or the program your using to check the hashes are accessable by root then they can be molested and can't be trusted in case of a attack. You have a similar problem with the rpm checksum features also. Those can be edited making them unreliable in case of a break-in.

This why these sorts of intrusion detection systems are not used more often. They tend to be very expensive to properly run. Otherwise they are usefull for trying to combat file corruption, but not against malicious attack.

Also, how does md5deep compare to AIDE or Tripwire?

Now, if somebody was to create a similar sort of checksum intrusion detection that used something like Mosref http://swik.net/mosref then that would be very cool.

Comprehensive integrity verification with md5deep (Linux.com)

Posted Aug 28, 2007 6:30 UTC (Tue) by ekj (subscriber, #1524) [Link]

True. To make hashes truly useful for security, you need to digitally sign them with a key NOT available where the files are distributed from.

For example, if Linus creates a linux.tar.bz2 and a SUMS-file, and signs the latter using his private key which is never store online, then you can, assuming you've got a good copy of his public key, be certain that you've got the real linux-kernel, even if malicious people may have broken into the ftp-server you use.

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds