LWN.net Logo

An attempt to backdoor the kernel

An attempt to backdoor the kernel

Posted Nov 7, 2003 3:55 UTC (Fri) by lm (guest, #6402)
In reply to: An attempt to backdoor the kernel by jonabbey
Parent article: An attempt to backdoor the kernel

> in theory, the odds against a single pair-wise collision should be a
> perfectly even 1 in 2^160 (or 1.46xe48), but that's still not zero,

As I've said in the past "In theory, practice and theory are the same, but in practice they are different" :)

Your 1 in 2^160 assumes a perfectly random distribution of inputs. Math people love to assume Gaussian (or some other equally pleasant to math) distribution but those nasty programmers have a bad habit of not being that random.

What does that mean? It means that all your math is meaningless. You are doing the math over the wrong set of data, data which doesn't exist, so you get a slanted (and incorrect) view of the world.

If a file system or a SCM system is using hashes for their names of objects, be afraid. 99% of the time or more it will work but it will not work all of the time.


(Log in to post comments)

An attempt to backdoor the kernel

Posted Nov 7, 2003 7:14 UTC (Fri) by hingo (subscriber, #14792) [Link]

In a general sense, the above is right, but...

A prime design criteria for a good hash function is that the resulting checksums should be evenly spread
out over the output space, even if the inputs are very similar. Thus, assuming an even distribution is
correct, if the hash function is "good".

henrik

Hashes and collisions

Posted Nov 7, 2003 10:54 UTC (Fri) by nowster (subscriber, #67) [Link]

Even with a good hash function, you need to build in the ability to handle collisions. Consider the following problem:

How many people do you have to have in a room before you have a better than 50% chance that two of them share the same birthday (ignoring the year)?

The answer is surprisingly low (23). This is known as the Birthday Paradox. The probability of a collision approaches "evens" when (very) approximately sqrt(total_number_of_slots) hash slots have been used. Collisions are possible (but very rare) even with just two hash entries.

Hashes and collisions

Posted Nov 10, 2003 16:49 UTC (Mon) by dd9jn (subscriber, #4459) [Link]

You are mixing up simple hash fucntions (e.g. CRC-32) with cryptographic hash functions (e.g. SHA-1). The latter have a couple of important design criteria and thus you won't ever see a duplicated hash value. If you ever find or can construct a different second file hashing to the same value you have broken that hash function with huge consequences for about all cryptographic protocols. Even for the old MD5 algorithms, no collision has ever been found (the often reported weakness found by Hans Dobbertin is on a reduced MD5 variant). Using MD5 or SHA-1 is a perfectly good choice to identify a text.

Hashes and collisions

Posted Nov 13, 2003 10:47 UTC (Thu) by rjw (guest, #10415) [Link]

Erm.

A hash function produces a small fixed number of bits from a large, potentially limitless number of bits.

Just think for one second about that.
Lets take a file on a crappy 32-bit operating system as a limit. And hash it to 160 bits.

How the hell do you think that every one of 2^(2^32) combinations can map to 2^160 combinations uniquely?

Cryptographic hash functions merely make it computationally hard to find a match, and even harder to find a match which contains what you want it to.

Hashes and collisions

Posted Nov 13, 2003 9:01 UTC (Thu) by hingo (subscriber, #14792) [Link]

The Birthday Paradox is a different problem. In that, you are asking for the probability of any two persons "to collide". The cracker who wants to inject code into a BK repository, faces the task of constructing a collision for a given file (somewhat equivalent to finding a person with the same birthday as mine). The probability of finding a collision there is much lower (practically impossible with current hashes, with birthdays of course, you only have 365 days to pick from).

henrik

Hashes and collisions

Posted Feb 16, 2004 14:37 UTC (Mon) by Omnifarious (guest, #19508) [Link]

The birthday paradox, applied to SHA-1, means that you must have approximately 1/2**75 separately hashed entities before you start seeing a significant chance of collision.

This is still pretty tiny. So, in theory, hash by value should actually be pretty good. The paper is making the point that, while hash functions are designed to have certain properties, there is no guarantee that they do, and any lapse is catastrophic for algorithms that rely on them that heavily.

An attempt to backdoor the kernel

Posted Nov 13, 2003 9:43 UTC (Thu) by ekj (subscriber, #1524) [Link]

Actually no. You're wrong.

A Cryptographically strong hash-function (such as sha1) does *not* assume that the inputs are random. On the contrary, it is made under the assumption not only that the inputs are non-random (as changesets are), but even that the inputs may be deliberately choosen so as to provoke a collision.

A hash-function is cryptographically strong even if in this scenario, the chanse of collisions still is no bigger than the mathemathical minimum 1 in 2^num_bits. That is, there is no (known) way of generating different strings such that the probability that the strings have identical sha1sum is higher than 2^num_bits.

It's still true that two changesets (or files or whatever) migth be identical trough pure dumb luck, but if I where you, I'd find something else to worry about, the chanse that a comsic ray will flip a bit in your ram and cause the program to give the wrong result is probably much much higher.

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds