Order Statistics - PowerPoint PPT Presentation

About This Presentation
Title:

Order Statistics

Description:

Maximum: nth order statistic. Median: 'half-way point' of the set. ... Input: A set A of n distinct numbers and a number i, with 1 i n. ... – PowerPoint PPT presentation

Number of Views:328
Avg rating:3.0/5.0
Slides: 27
Provided by: GLAB5
Learn more at: http://www.cs.unc.edu
Category:
Tags: order | seta | statistics

less

Transcript and Presenter's Notes

Title: Order Statistics


1
Order Statistics
2
Order Statistic
  • ith order statistic ith smallest element of a
    set of n elements.
  • Minimum first order statistic.
  • Maximum nth order statistic.
  • Median half-way point of the set.
  • Unique, when n is odd occurs at i (n1)/2.
  • Two medians when n is even.
  • Lower median, at i n/2.
  • Upper median, at i n/21.
  • For consistency, median will refer to the lower
    median.

3
Selection Problem
  • Selection problem
  • Input A set A of n distinct numbers and a
    number i, with 1? i ? n.
  • Output the element x ? A that is larger than
    exactly i 1 other elements of A.
  • Can be solved in O(n lg n) time. How?
  • We will study faster linear-time algorithms.
  • For the special cases when i 1 and i n.
  • For the general problem.

4
Minimum (Maximum)
  • Minimum (A)
  • 1. min ? A1
  • 2. for i ? 2 to lengthA
  • 3. do if min gt Ai
  • 4. then min ? Ai
  • 5. return min

Maximum can be determined similarly.
  • T(n) ?(n).
  • No. of comparisons n 1.
  • Can we do better? Why not?
  • Minimum(A) has worst-case optimal of
    comparisons.

5
Problem
Minimum (A) 1. min ? A1 2. for i ? 2 to
lengthA 3. do if min gt Ai 4.
then min ? Ai 5. return min
  • Average for random input How many times
    do we expect line 4 to be executed?
  • X RV for of executions of line 4.
  • Xi Indicator RV for the event that line 4 is
    executed on the ith iteration.
  • X ?i2..n Xi
  • EXi 1/i. How?
  • Hence, EX ln(n) 1 ?(lg n).

6
Simultaneous Minimum and Maximum
  • Some applications need to determine both the
    maximum and minimum of a set of elements.
  • Example Graphics program trying to fit a set of
    points onto a rectangular display.
  • Independent determination of maximum and minimum
    requires 2n 2 comparisons.
  • Can we reduce this number?
  • Yes.

7
Simultaneous Minimum and Maximum
  • Maintain minimum and maximum elements seen so
    far.
  • Process elements in pairs.
  • Compare the smaller to the current minimum and
    the larger to the current maximum.
  • Update current minimum and maximum based on the
    outcomes.
  • No. of comparisons per pair 3. How?
  • No. of pairs ? ?n/2?.
  • For odd n initialize min and max to A1. Pair
    the remaining elements. So, no. of pairs ?n/2?.
  • For even n initialize min to the smaller of the
    first pair and max to the larger. So, remaining
    no. of pairs (n 2)/2 lt ?n/2?.

8
Simultaneous Minimum and Maximum
  • Total no. of comparisons, C ? 3?n/2?.
  • For odd n C 3?n/2?.
  • For even n C 3(n 2)/2 1 (For the initial
    comparison).
  • 3n/2 2 lt 3?n/2?.

9
General Selection Problem
  • Seems more difficult than Minimum or Maximum.
  • Yet, has solutions with same asymptotic
    complexity as Minimum and Maximum.
  • We will study 2 algorithms for the general
    problem.
  • One with expected linear-time complexity.
  • A second, whose worst-case complexity is linear.

10
Selection in Expected Linear Time
  • Modeled after randomized quicksort.
  • Exploits the abilities of Randomized-Partition
    (RP).
  • RP returns the index k in the sorted order of a
    randomly chosen element (pivot).
  • If the order statistic we are interested in, i,
    equals k, then we are done.
  • Else, reduce the problem size using its other
    ability.
  • RP rearranges the other elements around the
    random pivot.
  • If i lt k, selection can be narrowed down to
    A1..k 1.
  • Else, select the (i k)th element from
    Ak1..n.
  • (Assuming RP operates on A1..n. For Ap..r,
    change k appropriately.)

11
Randomized Quicksort review
Rnd-Partition(A, p, r) i Random(p, r)
Ar ? Ai x, i Ar, p 1 for j
p to r 1 do if Aj ? x then i i
1 Ai ? Aj fi od Ai
1 ? Ar return i 1
Quicksort(A, p, r) if p lt r then q
Rnd-Partition(A, p, r) Quicksort(A, p, q
1) Quicksort(A, q 1, r) fi
Ap..r
5
Ap..q 1
Aq1..r
Partition
5
? 5
? 5
12
Randomized-Select
  • Randomized-Select(A, p, r, i) // select ith
    order statistic.
  • 1. if p r
  • 2. then return Ap
  • 3. q ? Randomized-Partition(A, p, r)
  • 4. k ? q p 1
  • 5. if i k
  • 6. then return Aq
  • 7. elseif i lt k
  • 8. then return Randomized-Select(A, p, q
    1, i)
  • 9. else return Randomized-Select(A, q1, r,
    i k)

13
Analysis
  • Worst-case Complexity
  • ?(n2) As we could get unlucky and always
    recurse on a subarray that is only one element
    smaller than the previous subarray.
  • Average-case Complexity
  • ?(n) Intuition Because the pivot is chosen at
    random, we expect that we get rid of half of the
    list each time we choose a random pivot q.
  • Why ?(n) and not ?(n lg n)?

14
Average-case Analysis
  • Define Indicator RVs Xk, for 1 ? k ? n.
  • Xk Isubarray Apq has exactly k elements.
  • Prsubarray Apq has exactly k elements 1/n
    for all k 1..n.
  • Hence, EXk 1/n.
  • Let T(n) be the RV for the time required by
    Randomized-Select (RS) on Apq of n elements.
  • Determine an upper bound on ET(n).

(9.1)
15
Average-case Analysis
  • A call to RS may
  • Terminate immediately with the correct answer,
  • Recurse on Ap..q 1, or
  • Recurse on Aq1..r.
  • To obtain an upper bound, assume that the ith
    smallest element that we want is always in the
    larger subarray.
  • RP takes O(n) time on a problem of size n.
  • Hence, recurrence for T(n) is
  • For a given call of RS, Xk 1 for exactly one
    value of k, and Xk 0 for all other k.

16
Average-case Analysis
(by linearity of expectation)
(by Eq. (C.23))
(by Eq. (9.1))
17
Average-case Analysis (Contd.)
The summation is expanded
  • If n is odd, T(n 1) thru T(?n/2?) occur twice
    and T(?n/2?) occurs once.
  • If n is even, T(n 1) thru T(?n/2?) occur twice.

18
Average-case Analysis (Contd.)
  • We solve the recurrence by substitution.
  • Guess ET(n) O(n).

Thus, if we assume T(n) O(1) for n lt 2c/(c
4a), we have ET(n) O(n).
19
Selection in Worst-Case Linear Time
  • Algorithm Select
  • Like RandomizedSelect, finds the desired element
    by recursively partitioning the input array.
  • Unlike RandomizedSelect, is deterministic.
  • Uses a variant of the deterministic Partition
    routine.
  • Partition is told which element to use as the
    pivot.
  • Achieves linear-time complexity in the worst case
    by
  • Guaranteeing that the split is always good at
    each Partition.
  • How can a good split be guaranteed?

20
Guaranteeing a Good Split
  • We will have a good split if we can ensure that
    the pivot is the median element or an element
    close to the median.
  • Hence, determining a reasonable pivot is the
    first step.

21
Choosing a Pivot
  • Median-of-Medians
  • Divide the n elements into ?n/5? groups.
  • ? n/5? groups contain 5 elements each. 1 group
    contains n mod 5 lt 5 elements.
  • Determine the median of each of the groups.
  • Sort each group using Insertion Sort. Pick the
    median from the sorted list of group elements.
  • Recursively find the median x of the ?n/5?
    medians.
  • Recurrence for running time (of
    median-of-medians)
  • T(n) O(n) T(?n/5?) .

22
Algorithm Select
  • Determine the median-of-medians x (using the
    procedure on the previous slide.)
  • Partition the input array around x using the
    variant of Partition.
  • Let k be the index of x that Partition returns.
  • If k i, then return x.
  • Else if i lt k, then apply Select recursively to
    A1..k1 to find the ith smallest element.
  • Else if i gt k, then apply Select recursively to
    Ak1..n to find the (i k)th smallest element.
  • (Assumption Select operates on A1..n. For
    subarrays Ap..r, suitably change k. )

23
Worst-case Split
Arrows point from larger to smaller elements.
?n/5? groups of 5 elements each.
Elements lt x
?n/5?th group of n mod 5 elements.
Median-of-medians, x
Elements gt x
24
Worst-case Split
  • Assumption Elements are distinct. Why?
  • At least half of the ?n/5? medians are greater
    than x.
  • Thus, at least half of the ?n/5? groups
    contribute 3 elements that are greater than x.
  • The last group and the group containing x may
    contribute fewer than 3 elements. Exclude these
    groups.
  • Hence, the no. of elements gt x is at least
  • Analogously, the no. of elements lt x is at least
    3n/106.
  • Thus, in the worst case, Select is called
    recursively on at most 7n/106 elements.

25
Recurrence for worst-case running time
  • T(Select) ? T(Median-of-medians) T(Partition)
    T(recursive call to select)
  • T(n) ? O(n) T(?n/5?) O(n) T(7n/106)
  • T(?n/5?) T(7n/106) O(n)
  • Assume T(n) ? ?(1), for n ? 140.

T(Median-of-medians)
T(Partition)
T(recursive call)
26
Solving the recurrence
  • To show T(n) O(n) ? cn for suitable c and all
    n gt 0.
  • Assume T(n) ? cn for suitable c and all n ? 140.
  • Substituting the inductive hypothesis into the
    recurrence,
  • T(n) ? c ?n/5? c(7n/106)an
  • ? cn/5 c 7cn/10 6c an
  • 9cn/10 7c an
  • cn (cn/10 7c an)
  • ? cn, if cn/10 7c an ? 0.
  • n/(n70) is a decreasing function of n. Verify.
  • Hence, c can be chosen for any n n0 gt 70,
    provided it can be assumed that T(n) O(1) for n
    ? n0.
  • Thus, Select has linear-time complexity in the
    worst case.

cn/10 7c an ? 0 ? c ? 10a(n/(n 70)),
when n gt 70.
For n ? 140, c ? 20a.
Write a Comment
User Comments (0)
About PowerShow.com