LWN.net Logo

An attempt to backdoor the kernel

An attempt to backdoor the kernel

Posted Nov 7, 2003 7:14 UTC (Fri) by hingo (subscriber, #14792)
In reply to: An attempt to backdoor the kernel by lm
Parent article: An attempt to backdoor the kernel

In a general sense, the above is right, but...

A prime design criteria for a good hash function is that the resulting checksums should be evenly spread
out over the output space, even if the inputs are very similar. Thus, assuming an even distribution is
correct, if the hash function is "good".

henrik


(Log in to post comments)

Hashes and collisions

Posted Nov 7, 2003 10:54 UTC (Fri) by nowster (subscriber, #67) [Link]

Even with a good hash function, you need to build in the ability to handle collisions. Consider the following problem:

How many people do you have to have in a room before you have a better than 50% chance that two of them share the same birthday (ignoring the year)?

The answer is surprisingly low (23). This is known as the Birthday Paradox. The probability of a collision approaches "evens" when (very) approximately sqrt(total_number_of_slots) hash slots have been used. Collisions are possible (but very rare) even with just two hash entries.

Hashes and collisions

Posted Nov 10, 2003 16:49 UTC (Mon) by dd9jn (subscriber, #4459) [Link]

You are mixing up simple hash fucntions (e.g. CRC-32) with cryptographic hash functions (e.g. SHA-1). The latter have a couple of important design criteria and thus you won't ever see a duplicated hash value. If you ever find or can construct a different second file hashing to the same value you have broken that hash function with huge consequences for about all cryptographic protocols. Even for the old MD5 algorithms, no collision has ever been found (the often reported weakness found by Hans Dobbertin is on a reduced MD5 variant). Using MD5 or SHA-1 is a perfectly good choice to identify a text.

Hashes and collisions

Posted Nov 13, 2003 10:47 UTC (Thu) by rjw (guest, #10415) [Link]

Erm.

A hash function produces a small fixed number of bits from a large, potentially limitless number of bits.

Just think for one second about that.
Lets take a file on a crappy 32-bit operating system as a limit. And hash it to 160 bits.

How the hell do you think that every one of 2^(2^32) combinations can map to 2^160 combinations uniquely?

Cryptographic hash functions merely make it computationally hard to find a match, and even harder to find a match which contains what you want it to.

Hashes and collisions

Posted Nov 13, 2003 9:01 UTC (Thu) by hingo (subscriber, #14792) [Link]

The Birthday Paradox is a different problem. In that, you are asking for the probability of any two persons "to collide". The cracker who wants to inject code into a BK repository, faces the task of constructing a collision for a given file (somewhat equivalent to finding a person with the same birthday as mine). The probability of finding a collision there is much lower (practically impossible with current hashes, with birthdays of course, you only have 365 days to pick from).

henrik

Hashes and collisions

Posted Feb 16, 2004 14:37 UTC (Mon) by Omnifarious (guest, #19508) [Link]

The birthday paradox, applied to SHA-1, means that you must have approximately 1/2**75 separately hashed entities before you start seeing a significant chance of collision.

This is still pretty tiny. So, in theory, hash by value should actually be pretty good. The paper is making the point that, while hash functions are designed to have certain properties, there is no guarantee that they do, and any lapse is catastrophic for algorithms that rely on them that heavily.

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds