|
|
Subscribe / Log in / New account

Bucket expansion

Bucket expansion

Posted Jul 19, 2024 14:08 UTC (Fri) by excors (subscriber, #95769)
In reply to: Bucket expansion by paulj
Parent article: A hash table by any other name

I believe it's more normal to say hash table insert has O(1) amortized cost. Specifically that means a sequence of n inserts takes O(n) time (in the strict upper-bound sense of 'O'), which is counted as an average of O(n)/n = O(1) per insert. That's still the worst case performance (in contrast to e.g. a probabilistic average-case performance analysis), it's just using the worst case over a sequence rather than a single operation. And that's usually a better way to look at it because a single operation is usually so fast that nobody cares about its performance, they care about long sequences.

(As far as I can tell from some searching, the equivalent amortized analysis of a B-tree says an insert still costs O(log n), unless you're starting from an empty tree in which case any sequence of n insert/delete operations takes O(n) time and therefore each operation has O(1) amortized cost. Hash table operations are O(1) amortized cost regardless of the initial state, so the O(1) is more widely applicable there.)


to post comments

Bucket expansion

Posted Jul 19, 2024 15:53 UTC (Fri) by paulj (subscriber, #341) [Link]

O(f(n)) is by definition a bound over the worst-case case input to a function.

Perhaps from an engineering perspective it is interesting to consider amortised costs (call that f(n) \in A(g(n), say), but if you have a function whose behaviour can vary greatly over different inputs, then what does "amortised" even mean? One work load could produce A(1) behaviour, in terms of time complexity, another A(n^2), another A(n!) perhaps. If that work load can be influenced by others, then really you must consider only the worst-case, or those others can trigger a DoS potentially. So the more formal definition of O(g(n)) is actually the more useful one, for anything with security consequences.

Hash tables have terrible security properties, because they are O(n) - not O(1). You can take mitigations (cryptographically secure hash function [wait, you were worried about cycles?], oversized table for expected loads), but it's not that hard for people to miss those or get it wrong or... simply over time for typical work-loads to grow much larger than originally anticipated (more pages, more packets, more hosts, etc.).


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds