Title: Randomized Algorithms for Selection and Sorting
1Randomized Algorithms for Selection and Sorting
Analysis of Algorithms
- Prepared by
- John Reif, Ph.D.
2Randomized Algorithms for Selection and Sorting
- Randomized Sampling
- Selection by Randomized Sampling
- Sorting by Randomized Splitting Quicksort and
Multisample Sorts
3Readings
- Main Reading Selections
- CLR, Chapters 9
4Comparison Problems
- Input set X of N distinct keys total ordering
lt over X - Problems
- For each key x ?X rank(x,X) x ?X x lt x
1 - For each index i ?1, , N) select (i,X) the
key x ?X where i rank(x,X) - Sort (X) (x1, x2, , xn) where xi select
(i,X)
5Randomized Comparison Tree Model
6Algorithm Samplerank s(x,X)
- begin Let S be a random sample of X-x of
size s output 1 N/s rank(x,S)-1 - end
7Algorithm Samplerank s(x,X) (contd)
- Lemma 1
- The expected value of samplerank s(x,X) is rank
(x,X)
8Algorithm Samplerank s(x,X) (contd)
9S is Random Sample of X of Size s
10More Precise Bounds on Randomized Sampling
(contd)
11More Precise Bounds on Randomized Sampling
- Let S be a random sampling of X
- Let ri rank(select(i,S),X)
12More Precise Bounds on Randomized Sampling
(contd)
- Proof We can bound ri by a Beta distribution,
implying - Weak bounds follow from Chebychev inequality
- The Tighter bounds follow from Chernoff Bounds
13Subdivision by Random Sampling
- Let S be a random sample of X of size s
- Let k1, k2, , ks be the elements of S in sorted
order - These elements subdivide X into
14Subdivision by Random Sampling (contd)
- How even are these subdivisions?
15Subdivision by Random Sampling (contd)
- Lemma 3 If random sample S in X is of size s
and X is of size N, then S divides X into
subsets each of
16Subdivision by Random Sampling (contd)
- Proof
- The number of (s1) partitions of X is
- The number of partitions of X with one block of
size ? v is
17Subdivision by Random Sampling (contd)
- So the probability of a random (s1) partition
having a block size ? v is
18Randomized Algorithms for Selection
- canonical selection algorithm
- Algorithm can select (i,X) input set X of N keys
index i ? 1, , N - 0 if N1 then output X
- 1 select a bracket B of X, so that
select (i,X) ?B with high prob. - 2 Let i1 be the number of keys less than
any element of B - 3 output can select (i-i1, B)
- B found by random sampling
19Hoars Selection Algorithm
- Algorithm Hselect (i,X) where 1 ? i ? N
- begin
- if X x then output x else choose a random
splitter k ? X - let B x ? Xx lt k
- if B? i then output Hselect(i,B) else
output Hselect(i-B, X-B) - end
20Hoars Selection Algorithm (contd)
- Sequential time bound T(i,N) has mean
21Hoars Selection Algorithm (contd)
- Random splitter k ? X Hselect (i,X) has two cases
22Hoars Selection Algorithm (contd)
- Inefficient each recursive call requires N
comparisons, but only reduces average problem
size to ½ N
23Improved Randomized Selection
- By Floyd and Rivest
- Algorithm FRselect(i,X)
- begin
- if X x then output x else choose k1, k2 ?
X such that k1lt k2 let r1 rank(k1, X), r2
rank(k2, X) - if r1 gt i then FRselect(i, x ? Xx lt k1)
- else if r2 gt i then FRselect(i-r1, x ?
Xk1?x? k2) - else FRselect(i-r2, x ? Xx gt k2)
- end
24Choosing k1, k2
- We must choose k1, k2 so that with high
likelihood, - k1 ? select(i,X) ? k2
25Choosing k1, k2 (contd)
- Choose random sample S ? X size s
- Define
26Choosing k1, k2 (contd)
27Expected Time Bound
28Small Cost of Recursions
- Note
- With prob ? 1 - 2N-? each recursive call costs
only O(s) o(N) rather than N in previous
algorithm
29Randomized Sorting Algorithms
- canonical sorting algorithm
- Algorithm cansort(X)
- begin
- if xx then output X else choose a random
sample S of X of size s - Sort S
- S subdivides X into s1 subsets
- X1, X2, , Xs1
- output cansort (X1) cansort (X2) cansort(Xs1)
- end
30Using Random Sampling to Aid Sorting
- Problem
- Must subdivide X into subsets of nearly equal
size to minimize number of comparisons - Solution random sampling!
31Hoars Randomized Sorting Algorithm
- Uses sample size s1
- Algorithm quicksort(X)
- begin
- if X1 then output X else choose a random
splitter k ? X output quicksort (x ? Xxltk)
(k) quicksort (x ? Xxgtk) - end
32Expected Time Cost of Hoars Sort
- Inefficient
- Better to divide problem size by ½ with high
likelihood!
33Improved Sort using
- Better choice of splitter is k sample
selects (?N/2?, N) - Algorithm samplesorts (X)
- begin
- if X1 then output X choose a random subset
S of X size s N/log N k ? select (?S/2?,S)
cost time o(N) - output
- samplesorts (x ? Xxltk) (k)
samplesorts (x ? Xxgtk) - end
34Randomized Approximant of Mean
- By Lemma 2, rank(k,X) is very nearly the mean
35Improved Expected Time Bounds
- Is optimal for comparison trees!
36Open Problems in Selection and Sorting
- Improve Randomized Algorithms to exactly match
lower bounds on number of comparisons - Can we de randomize these algorithms i.e., give
deterministic algorithms with the same bounds?
37Randomized Algorithms for Selection and Sorting
Analysis of Algorithms
- Prepared by
- John Reif, Ph.D.