One thing I miss (but that maybe comes later) is the effect of all this on
common-sense optimizations such as binary search. E.g., it would
be interesting for what kind of working sets binary search (having to look
random to even current prefetchers) actually starts to perform better than
linear search.
It would be a nice example of how a O(N) algorithm can dramatically
outperform a O(log N) algorithm, due to memory effects. I'm sure there's
lots more of common wisdom that turns into a lie on modern processors.
Posted Oct 24, 2007 16:21 UTC (Wed) by JoeBuck (subscriber, #2330)
[Link]
If the array all fits in cache, binary search wins: both linear and binary search, at most, load the data into cache once. If the array is extremely large, binary search still wins despite the wasted reads of entire cache lines, because at some point N/log(N) exceeds the miss penalty. There may be intermediate values where binary search loses, when the random access causes the cache miss rate to soar and this isn't compensated for adequately by the reduced expected number of element accesses.
On very small arrays binary search loses - did so for years
Posted Oct 24, 2007 20:29 UTC (Wed) by khim (subscriber, #9252)
[Link]
If you just have 4-5 elements in the array then initial complexity of binary search will dominate - with or without memory access effects. At some point binary search will win - but it'll be good exercise to find this point today...
Actually it's true for most algorithms: bubble sort also wins for small arrays, for example. Usually trivial algorithms work well for small sizes but not-so-well for medium-to-large sizes. Usually it's not worth the effort to optimize for small sizes: it's fast not matter what algorithm you are using. If your program is spending 90% of time searchive 5-element arrays - it's different story, of course, but it's rare in practice...
Missing: conventional wisdom turned into lie
Posted Oct 25, 2007 10:48 UTC (Thu) by mmutz (subscriber, #5642)
[Link]
> If the array all fits in cache, binary search wins: both linear and
> binary search, at most, load the data into cache once. If the array is
> extremely large, binary search still wins despite the wasted reads of
> entire cache lines, because at some point N/log(N) exceeds the miss
> penalty.
This is exactly the conventional wisdom I think isn't true anymore for
some realistic examples. You're missing the effect of the hardware
prefetcher, which will, for some sizes, and esp. for cold caches lead to a
performance increase for the linear search, as opposed to binary search,
which looks like random access to the prefetcher, and therefore incurs the
RAM->cache latency cost typically hidden by the prefetcher.