|
|

# Heapsort is useless in real world...

## Heapsort is useless in real world...

Posted Jan 20, 2012 1:45 UTC (Fri) by Wol (guest, #4433)
In reply to: Heapsort is useless in real world... by khim
Parent article: Denial of service via hash collisions

I'd say heapsort is also a very good choice if you want to process data in sorted order. Ie "get first item, process, get next item, process, rinse, repeat".

But there you don't care too much about time and "pathological" cases, because sorting is a minor part of your costs.

Cheers,
Wol

Heapsort is useless in real world...

Posted Jan 20, 2012 7:55 UTC (Fri) by ekj (guest, #1524) [Link]

If the "process data" step is constant time - i.e. the work performed to process one item is independent of how many items exist, then that's O(n).

If the pathological sort-case is O(n^2), then no matter how large the constant separating these two cases is, for some amounts of items, the sort will dominate.

You might have a "process data" step that takes 50ms, and a sort that you expect to take 0.2ms * n * log2(n) and conclude that processing a million items, should take ~21 minutes. (about 20 minutes for processing, 1 minute for sorting)

If pathological data causes that sort to instead take 0.2ms * n * n time, then you're looking at 20 minutes of processing, and 6 years of sorting, which is clearly likely to be a DOS.

Forcing n^2 where n*log2(n) is expected is a huge deal for largish data-sets (a million items is large, but not gargantuan), it ups the workload by a factor of 50000. Thus even if your sort *was* a small fraction of the expected work, it's now dominant.

Heapsort is useless in real world...

Posted Jan 20, 2012 22:45 UTC (Fri) by Wol (guest, #4433) [Link]

Except that, is there a pathological case for heap sort? It's actually a very well behaved sort. Setting up the heap is the most unpredictable bit, and the most pathological case (an already-sorted array) isn't that much worse than a random array.

And once the heap is created, it's completely predictable. For a heap of 2^x items, it will require x comparisons and x swaps to get the next value. (Give or take an off-by-one error on my part :-)

Cheers,
Wol

Posted Jan 22, 2012 0:54 UTC (Sun) by khim (subscriber, #9252) [Link]

Except that, is there a pathological case for heap sort?

Sometimes it's good idea to actually look up further upthread.

It's actually a very well behaved sort.

No, it's absolutely not. It's quite good WRT to comparisons and swaps, but it's absolute disaster WRT to memory access. If you data fit in L1 cache (16K-32K) it works just fine, if you hit L2 and/or L3 cache you start doing worse then merge sort or quick sort, but difference is tolerable (typically less then 2x). The real disaster happens once you hit main memory: to read one number from random address in memory you need about 250-300 ticks (and this is what happens with heapsort because prefetch can not cope with it's memory access patterns), while sustained sequential access gives you next 32bit number in 2-3 ticks. Since some part of heap will be in cache even for large arrays heapsort does not suddenly become 50-100 times slower then mergesort when you pass L3 border, to get to this level of difference you need hundreds of gigabytes of data.