Are you moron or just play one on TV?
Posted Jan 14, 2012 23:39 UTC (Sat) by khim
In reply to: What happened?
Parent article: Denial of service via hash collisions
Hmm. I didn't expect this to be so controversial.
There are no controversy. Just pure obstinacy.
Here is a quick example of median of medians.
list2 = sorted(list)...
Brilliant idea. Not. This single line guarantees that your median_of_medians function will need at least O(N log N) operations in some cases. This, in turn, will probably mean that your “outer sort algorithm” will have at least O(N log²N) complexity, not desired O(N log N). If you'll use it recursively then situation can be even worse (not sure if you can push it all the way to O(N²): for that to happen you'll need to hit “worst case” often enough and I'm not 100% sure it's possible to organize).
Let me remind you the story of whole so called “controversy”:
HelloWorld: there are algorithms that can chose the median (and hence the optimal pivot element) from a list in O(n) ⇨ TRUTH
cmccabe: You're thinking of quickselect ⇨ NONSENSE¹)
HelloWorld: There are other algorithms (such as the "median of medians" algorithm) that have a worst-case performance of O(n) ⇨ TRUTH
cmccabe: The median of medians, though possibly a useful thing, is not the same thing as the median ⇨ IRRELEVANT²)
khim: The first result gives you link to Median of Medians algorithm - and yes, this algorithm can be used to find median in linear time ⇨ TRUTH
nybble41: The result of the algorithm is, as the name implies, the median of a list of subgroup medians ⇨ NONSENSE³)
khim: The end result is true median ⇨ TRUTH⁴)
cmccabe: However, you can use median-of-medians as the pivot selection heuristic for quickselect. If you do this, you are guaranteed O(n) worst case running time for quickselect ⇨ NONSENSE⁵)
As you can see the whole controversy is kind of trivial: one side postulates true (and relevant!) facts while other side either says incorrect assertions or correct yet entirely irrelevant facts.
I'm not sure why it's so hard for you to accept the fact that Median of Medians algorithm's goal is not to find median of medians (it uses median of medians as intermediate step) - but, well, that's the root of the whole controversy. If you'll stop feeling indignant for just a moment and spent five minutes to actually take a look on Median of Medians algorithm (Wikipedia does pretty good job explaining it) then suddenly the while discussion will stop being controversial.
If you want to know even more, check out: http://martinsjava.blogspot.com/2009/03/test-this-is-code-more-code-end-test.html
AFAICS this article tells nothing about median selection - it just points out that “median of three” can not save quicksort. Which was kind of expected from the beginning but it's still nice to have formal proof.
That's why the Java guys went with merge sort.
Another fail. Java guys have not “went with merge sort”. They actually use both tuned quicksort and merge sort. Apparently they felt it'll be good idea to spice life of a programmer. Need to sort Int ? Sure, be my guest - O(N log N) is guaranteed. Managed to switch int to save on allocations? Have a nice surprise of O(N²) sort! Otherwise life will be dull, you know...
¹) Obviously here HelloWorld does not talk about quickselect because quickselect does not guarantee O(N) complexity.
²) Here Median of Medians algorithm (100% relevant because it can be used to find median in O(N) time) somehow transformed to median of medians (which is of course entirely different thing).
³) The same story again: Median of Medians algorithm phrase can fit in twitterbrain, linear algorithm for the general case of selecting the kth largest element was published (which is explanation if what the said algorithm does) overflows it. We are Twitter generation! 100 character, 14 words, 1 click? NOOOOOO! That's too long! That's too hard!
⁴) Here I've tried to simplify algorithm's goal description to fit them in twitterbrain - apparently without any success
⁵) This particular “find median of medians” algorithm requires at least O(N log N) operations which gives us at least O(N log²N) complexity. Close, but no cigar.
to post comments)