Worst case performance and "impossibly unlikely" conditions
Posted Jan 11, 2012 23:56 UTC (Wed) by nix
In reply to: Worst case performance and "impossibly unlikely" conditions
Parent article: Denial of service via hash collisions
The slower ordered-tree based dictionary implementations
... aren't necessarily slower. The last hash I implemented was a modified-splay-tree-based system where each get of a given hash item rotated it 1/Nth of the way towards the top of the tree (a customizable value). Finding a random item scales as O(lg n), which is admittedly slower than the O(1) of a perfect hash or sufficiently empty bucket hash -- but repeated searches on the same set of keys end up O(1) or O(m) where m is the size of the searched set, and additions are always O(lg n) as well. This is definitely not true of bucket hashes, as they have to be periodically expanded and perhaps contracted for efficiency, leading to nasty spikes in the time profile. Having O(1) operations suddenly take on the order of O(n)-with-a-large-constant time while an expansion happens is... not good. I know it amortizes to O(log n), but still it's a jagged and nasty way to do it.
That splay-tree-based hash had other nice properties, such as an iteration order stable under non-colliding modification, and even stable under collision with certain limitations (if you deleted the currently-iterated value, and earlier-iterated values had colliding hash values, you would see such values more than once). It was also really easy to lock for multithreaded use, and copying and freeing it was pleasantly easy.
But its primary attraction to me was the limited conditional count. Because of the absence of expansion code, the only conditional in the whole thing other than the odd-or-even part of the tree rotation code was the trivial five-line code to handle hash collisions, which could have been omitted had I chosen to record full 64-bit hash values. There was no rarely-executed expansion code, no rehashing, nothing rarely executed to harbour bugs. Or there wasn't until I added extra code to efficiently pack small-enough data items into the item pointers, anyway...
(The thing was under a proprietary license thanks to my old free-software-phobic employer, but I will probably be reimplementing it and a number of related data structures under the BSD license sometime soonish. I'm sick of not having them available. :)
to post comments)