What happened?
Posted Jan 14, 2012 21:45 UTC (Sat) by
cmccabe (guest, #60281)
In reply to:
What happened? by khim
Parent article:
Denial of service via hash collisions
Hmm. I didn't expect this to be so controversial.
Here is a quick example of median of medians.
#!/usr/bin/python
def find_median(list):
n = len(list)
list2 = sorted(list)
if ((n % 2) == 0):
return (list2[int(n/2)] + list2[int(n/2) + 1]) / 2
else:
return list2[int(n/2)]
def median_of_medians(k, list):
n = len(list)
if (n == k):
return find_median(list)
m = []
for i in range(0, n, k):
m.append(median_of_medians(k, list[i:i+k]))
return find_median(m)
list = [3, 1, 4, 4, 5, 9, 9, 8, 2]
print find_median(list)
print median_of_medians(3, list)
You can see that the median of our example list is 4, but the "median of medians" (with k=3) is 5. The median of medians is not the same as the median.
However, you can use median-of-medians as the pivot selection heuristic for quickselect. If you do this, you are guaranteed O(n) worst case running time for quickselect.
So yes, in theory, you could use quicksort with quickselect as the pivot selection algorithm, and median-of-medians as the pivot selection algorithm for quickselect. And it would all be N log N. However, the constant factors would be horrendous. You'd basically be doing 3x as much work as the guy who just used merge sort. That's why the Java guys went with merge sort.
If you want to know even more, check out:
http://martinsjava.blogspot.com/2009/03/test-this-is-code-more-code-end-test.html.
TL;DR: it's easy to code an in-place quicksort, and it's still useful for some things. However, it can't be made adversary-proof without losing a lot of its real-world performance advantage over other N log N sorts.
(
Log in to post comments)