Title: Order Statistics
1Order Statistics
2Order Statistic
- ith order statistic ith smallest element of a
set of n elements. - Minimum first order statistic.
- Maximum nth order statistic.
- Median half-way point of the set.
- Unique, when n is odd occurs at i (n1)/2.
- Two medians when n is even.
- Lower median, at i n/2.
- Upper median, at i n/21.
- For consistency, median will refer to the lower
median.
3Selection Problem
- Selection problem
- Input A set A of n distinct numbers and a
number i, with 1? i ? n. - Output the element x ? A that is larger than
exactly i 1 other elements of A. - Can be solved in O(n lg n) time. How?
- We will study faster linear-time algorithms.
- For the special cases when i 1 and i n.
- For the general problem.
4Minimum (Maximum)
- Minimum (A)
- 1. min ? A1
- 2. for i ? 2 to lengthA
- 3. do if min gt Ai
- 4. then min ? Ai
- 5. return min
Maximum can be determined similarly.
- T(n) ?(n).
- No. of comparisons n 1.
- Can we do better? Why not?
- Minimum(A) has worst-case optimal of
comparisons.
5Problem
Minimum (A) 1. min ? A1 2. for i ? 2 to
lengthA 3. do if min gt Ai 4.
then min ? Ai 5. return min
- Average for random input How many times
do we expect line 4 to be executed? - X RV for of executions of line 4.
- Xi Indicator RV for the event that line 4 is
executed on the ith iteration. - X ?i2..n Xi
- EXi 1/i. How?
- Hence, EX ln(n) 1 ?(lg n).
6Simultaneous Minimum and Maximum
- Some applications need to determine both the
maximum and minimum of a set of elements. - Example Graphics program trying to fit a set of
points onto a rectangular display. - Independent determination of maximum and minimum
requires 2n 2 comparisons. - Can we reduce this number?
- Yes.
7Simultaneous Minimum and Maximum
- Maintain minimum and maximum elements seen so
far. - Process elements in pairs.
- Compare the smaller to the current minimum and
the larger to the current maximum. - Update current minimum and maximum based on the
outcomes. - No. of comparisons per pair 3. How?
- No. of pairs ? ?n/2?.
- For odd n initialize min and max to A1. Pair
the remaining elements. So, no. of pairs ?n/2?. - For even n initialize min to the smaller of the
first pair and max to the larger. So, remaining
no. of pairs (n 2)/2 lt ?n/2?.
8Simultaneous Minimum and Maximum
- Total no. of comparisons, C ? 3?n/2?.
- For odd n C 3?n/2?.
- For even n C 3(n 2)/2 1 (For the initial
comparison). - 3n/2 2 lt 3?n/2?.
9General Selection Problem
- Seems more difficult than Minimum or Maximum.
- Yet, has solutions with same asymptotic
complexity as Minimum and Maximum. - We will study 2 algorithms for the general
problem. - One with expected linear-time complexity.
- A second, whose worst-case complexity is linear.
10Selection in Expected Linear Time
- Modeled after randomized quicksort.
- Exploits the abilities of Randomized-Partition
(RP). - RP returns the index k in the sorted order of a
randomly chosen element (pivot). - If the order statistic we are interested in, i,
equals k, then we are done. - Else, reduce the problem size using its other
ability. - RP rearranges the other elements around the
random pivot. - If i lt k, selection can be narrowed down to
A1..k 1. - Else, select the (i k)th element from
Ak1..n. - (Assuming RP operates on A1..n. For Ap..r,
change k appropriately.)
11Randomized Quicksort review
Rnd-Partition(A, p, r) i Random(p, r)
Ar ? Ai x, i Ar, p 1 for j
p to r 1 do if Aj ? x then i i
1 Ai ? Aj fi od Ai
1 ? Ar return i 1
Quicksort(A, p, r) if p lt r then q
Rnd-Partition(A, p, r) Quicksort(A, p, q
1) Quicksort(A, q 1, r) fi
Ap..r
5
Ap..q 1
Aq1..r
Partition
5
? 5
? 5
12Randomized-Select
- Randomized-Select(A, p, r, i) // select ith
order statistic. - 1. if p r
- 2. then return Ap
- 3. q ? Randomized-Partition(A, p, r)
- 4. k ? q p 1
- 5. if i k
- 6. then return Aq
- 7. elseif i lt k
- 8. then return Randomized-Select(A, p, q
1, i) - 9. else return Randomized-Select(A, q1, r,
i k)
13Analysis
- Worst-case Complexity
- ?(n2) As we could get unlucky and always
recurse on a subarray that is only one element
smaller than the previous subarray. - Average-case Complexity
- ?(n) Intuition Because the pivot is chosen at
random, we expect that we get rid of half of the
list each time we choose a random pivot q. - Why ?(n) and not ?(n lg n)?
14Average-case Analysis
- Define Indicator RVs Xk, for 1 ? k ? n.
- Xk Isubarray Apq has exactly k elements.
- Prsubarray Apq has exactly k elements 1/n
for all k 1..n. - Hence, EXk 1/n.
- Let T(n) be the RV for the time required by
Randomized-Select (RS) on Apq of n elements. - Determine an upper bound on ET(n).
(9.1)
15Average-case Analysis
- A call to RS may
- Terminate immediately with the correct answer,
- Recurse on Ap..q 1, or
- Recurse on Aq1..r.
- To obtain an upper bound, assume that the ith
smallest element that we want is always in the
larger subarray. - RP takes O(n) time on a problem of size n.
- Hence, recurrence for T(n) is
-
- For a given call of RS, Xk 1 for exactly one
value of k, and Xk 0 for all other k.
16Average-case Analysis
(by linearity of expectation)
(by Eq. (C.23))
(by Eq. (9.1))
17Average-case Analysis (Contd.)
The summation is expanded
- If n is odd, T(n 1) thru T(?n/2?) occur twice
and T(?n/2?) occurs once. - If n is even, T(n 1) thru T(?n/2?) occur twice.
18Average-case Analysis (Contd.)
- We solve the recurrence by substitution.
- Guess ET(n) O(n).
Thus, if we assume T(n) O(1) for n lt 2c/(c
4a), we have ET(n) O(n).
19Selection in Worst-Case Linear Time
- Algorithm Select
- Like RandomizedSelect, finds the desired element
by recursively partitioning the input array. - Unlike RandomizedSelect, is deterministic.
- Uses a variant of the deterministic Partition
routine. - Partition is told which element to use as the
pivot. - Achieves linear-time complexity in the worst case
by - Guaranteeing that the split is always good at
each Partition. - How can a good split be guaranteed?
20Guaranteeing a Good Split
- We will have a good split if we can ensure that
the pivot is the median element or an element
close to the median. - Hence, determining a reasonable pivot is the
first step.
21Choosing a Pivot
- Median-of-Medians
- Divide the n elements into ?n/5? groups.
- ? n/5? groups contain 5 elements each. 1 group
contains n mod 5 lt 5 elements. - Determine the median of each of the groups.
- Sort each group using Insertion Sort. Pick the
median from the sorted list of group elements. - Recursively find the median x of the ?n/5?
medians. - Recurrence for running time (of
median-of-medians) - T(n) O(n) T(?n/5?) .
22Algorithm Select
- Determine the median-of-medians x (using the
procedure on the previous slide.) - Partition the input array around x using the
variant of Partition. - Let k be the index of x that Partition returns.
- If k i, then return x.
- Else if i lt k, then apply Select recursively to
A1..k1 to find the ith smallest element. - Else if i gt k, then apply Select recursively to
Ak1..n to find the (i k)th smallest element. - (Assumption Select operates on A1..n. For
subarrays Ap..r, suitably change k. )
23Worst-case Split
Arrows point from larger to smaller elements.
?n/5? groups of 5 elements each.
Elements lt x
?n/5?th group of n mod 5 elements.
Median-of-medians, x
Elements gt x
24Worst-case Split
- Assumption Elements are distinct. Why?
- At least half of the ?n/5? medians are greater
than x. - Thus, at least half of the ?n/5? groups
contribute 3 elements that are greater than x. - The last group and the group containing x may
contribute fewer than 3 elements. Exclude these
groups. - Hence, the no. of elements gt x is at least
- Analogously, the no. of elements lt x is at least
3n/106. - Thus, in the worst case, Select is called
recursively on at most 7n/106 elements.
25Recurrence for worst-case running time
- T(Select) ? T(Median-of-medians) T(Partition)
T(recursive call to select) - T(n) ? O(n) T(?n/5?) O(n) T(7n/106)
- T(?n/5?) T(7n/106) O(n)
- Assume T(n) ? ?(1), for n ? 140.
T(Median-of-medians)
T(Partition)
T(recursive call)
26Solving the recurrence
- To show T(n) O(n) ? cn for suitable c and all
n gt 0. - Assume T(n) ? cn for suitable c and all n ? 140.
- Substituting the inductive hypothesis into the
recurrence, - T(n) ? c ?n/5? c(7n/106)an
- ? cn/5 c 7cn/10 6c an
- 9cn/10 7c an
- cn (cn/10 7c an)
- ? cn, if cn/10 7c an ? 0.
- n/(n70) is a decreasing function of n. Verify.
- Hence, c can be chosen for any n n0 gt 70,
provided it can be assumed that T(n) O(1) for n
? n0. - Thus, Select has linear-time complexity in the
worst case.
cn/10 7c an ? 0 ? c ? 10a(n/(n 70)),
when n gt 70.
For n ? 140, c ? 20a.