Often, like when using hashes in a dictionary-implementation in a programming-language, we care most about average-performance, and not at all about collisions as such (except if they're common enough to impact average performance).
Other times, like when making a hash to digitally sign, we care *very* much that it be *exceedingly* difficult to find another input that hashes to the same thing, but we care relatively little about the performance of the hash-function.
What this points to, is just that programming-languages should probably care somewhat more about worst-case performance, to the detriment of average-case-performance. Because pathological worst-case-performance can frequently lead to denial-of-service problems when things are implemented using these techniques.
A sort that uses (100 n log n) for the average case, with worst-case performance of (200 n log n) can thus be preferable to a sort that is (50 n log n) for the average case, but that degrades to (50 n^2) for the worst-case.
The thing that makes security different from engineering is that engineering is programming Murphys computer - everything that can go wrong, will go wrong every now and then. But security, in the face of a deliberate attacker, is programming Satans computer: everything that can go wrong will *deliberately* forced to go wrong again-and-again.
So yes, collision resistance is not part of the definition of "hash function", indeed a hash-function is 'any well-defined procedure or mathematical function that converts a large, possibly variable-sized amount of data into a small datum' (some of these may be -poor- hash-functions for many purposes, but even poor hash-functions are hash-functions)
There are many different sub-families of hash-functions, Collision-resistance *is* part of the definition of "cryptographically strong hash-function", which is one such sub-family.
It's usually a trade-off, the functions with good collision-avoidance even for pathological inputs, are usually much more expensive to compute. And *any* hash-function has an upper-bound on collision-avoidance limited by the size of the hash. If you're using 32 bit hashes, the birthday-paradox means you can *always* generate collisions in ~2^16 attempts. Someone with a bit of CPU can thus produce (for example) a list of email-adresses to enter into a form, that contains 100 email-adresses that *all* hash to the same value, possibly giving pathological performance for whatever you're trying to do with them after collecting them.
Even 64 bit hashes aren't immune to this, and using hashes that are substantially larger than the word-size of the architecture you're running on, tends to put substantial dents in lookup-times for the average-case. But perhaps this is a cost we have to accept, to prevent worst-case-scenarios that lead to DOS.
Upgrading a simple 32-bit hash to a cryptographically strong hash of atleast 128-bit means taking a *substantial* performance-hit though.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds