* The clever pivot selection algorithm is still recursive. Yes, it's a recursive algorithm within another recursive algorithm. We are very clever, aren't we.
* When choosing a pivot for quickselect with the method I described, you need to have k=5 rather than k=3; otherwise the quickselect can still go n^2.
* Your prose reminds me "time cube." But that's probably because I was "educated stupid and evil."
Yet another small correction...
Posted Jan 19, 2012 8:42 UTC (Thu) by khim (subscriber, #9252) [Link]
When choosing a pivot for quickselect with the method I described, you need to have k=5 rather than k=3; otherwise the quickselect can still go n^2.
Sadly this is not true. Your argorithm will still introduce disbalance on each step - even with "k == 5". Disbalance will be smaller (2+0.3N/3+0.7N instead of 1+⅓N/2+⅔N), but recursive calls will still compound it thus the whole algorithm will have larger then O(N log N) complexity (most probably O(N²) with "evil source").
Median of Medians algorithm uses two recursive calls to battle this phenomenon: it finds "more-or-less Ok median" (30%/70%) using first recursive call with 0.2N elements and then it "fixes it" using another recursive call with no more then 0.7N elements. Disbalance is fixed at each step thus it does not grow beyond certain point (30%/70%) no matter how many steps there are - and the whole thing needs O(N) operations: T(N) ≤ c*N*(1 + (9/10) + (9/10)² + …) = O(N). If you'll use "k == 3" then your first pass will use ⅓N elements and second pass will use ⅔N elements and this will mean T(N) ≤ c*N*(1 + 1 + …) ≄ O(N).
Another note...
Posted Jan 19, 2012 9:23 UTC (Thu) by khim (subscriber, #9252) [Link]
There are another interesting fact related to Median of Medians algorithm: k must be at least 5 only because algorithm looks for kth largest element. In this case “k == 3” just does not work (as I've noted above). However if want to find median then not only "k == 3" works, it usually works better then “k == 5”. This is true because in the "find median" task you can throw away not just “top left” xor “bottom right” elements (as pictured in Wikipedia's illustration) but you can throw away both “top left” and “bottom right” elements. This will lead to complexity of T(N) ≤ C₁*N*(1 + (⅗) + (⅗)² + …) = O(N) for “k == 5” and to complexity of T(N) ≤ C₂*N*(1 + (⅔) + (⅔)² + …) = O(N) for “k == 3”, but ⅗ (for “k == 5”) comes from “⅕ + ⅖” (for two recursive calls) while ⅔ (for “k == 3”) comes from “⅓ + ⅓” (for two recursive calls). Thus in most cases version with “k == 3” is faster (because recursion depth is smaller), but difference is small and you must correctly handle case of N=3M+1...
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds