Proof

Posted Feb 26, 2017 23:17 UTC (Sun) by excors (subscriber, #95769)
In reply to: Proof by moltonel
Parent article: Linus on Git and SHA-1

> Meh, that still relies on being lucky

DRAM error rates are apparently somewhere vaguely around 2^-35 errors per bit per hour. If you have two files that differ by a single bit, and you leave them in memory for a millisecond before memcmping them, that's about a 1 in 2^56 chance of one of the differing bits flipping and giving a false match. With a non-broken 256-bit hash function, there's only a 1 in 2^256 chance of two different files giving a false match. So by checking the actual bytes you're massively *increasing* the risk of an incorrect result caused by bad luck.

Proof

Posted Feb 27, 2017 15:56 UTC (Mon) by hmh (subscriber, #3838) [Link] (1 responses)

Not with ECC. If you don't have it, well... Hopefully you're using something that detects corruption on data at rest in memory.

Also, as far as the crypto goes, it has been proved that a combination of several hashes is approximately as strong as the strongest one, at least against brute force.

It would make sense to use very different hashes with the same strength as a hedge against a weakness being found in one of them. But that would mean, e.g. using both sha2-256 and sha3-256 (sha3 is a very different construct than Sha 2 exactly for that reason, otherwise BLAKE might have been selected for sha3). And the total strength would be 256 bits, not more.

Proof

Posted Feb 27, 2017 17:45 UTC (Mon) by excors (subscriber, #95769) [Link]

I used the numbers for ECC-detected correctable errors (which should be harmless) from https://research.google.com/pubs/pub35162.html , but that seems to indicate detectable uncorrectable errors are only ~40 times rarer. Presumably some undetectable multi-bit errors are possible in ECC too, though very much rarer again. But still astronomically more likely than an accidental hash collision.

The point is that you can't practically divide algorithms into "definitely correct" vs "probably correct but relies on being lucky" - everything running on hardware has some non-zero chance of giving the wrong result. Instead you need to look at how close to zero those chances are, and divide them into "absurdly unlikely to ever fail even if every computer in the world runs this algorithm for a million years" vs "might actually fail in practice", and a non-broken hash function fits into the first category. (memcmp on a system with ECC RAM probably goes in the first category too, but memcmp without ECC probably doesn't.)

(This is separate from the issue that non-broken hash functions have a tendency to become broken hash functions after a few decades. Any system that's designed to use a secure hash ought to be designed to migrate cleanly to a new hash function in the future.)