Title: CSE621 : Parallel Algorithms
1CSE621Parallel Algorithms Lecture 3Sorting
September 13, 1999
2Overview
- Review of the previous lecture
- Sorting on 2-D n-step algorithm
- Sorting on 2-D 0-1 sorting lemma
- Sorting on 2-D \root(n)(log n 1)-step
algorithm - Sorting on 2-D 3\root(n) o(\root(n))
algorithm - Sorting Matching lower bound
- Sorting on 2-D word-model vs. bit-model
- Summary
3Review of the previous lecture
- Sorting on the CRCW and CREW PRAMs
- Odd-Even Merge Sort on the EREW PRAM
- Sorting on the One-Dimensional Mesh
- Insertion sort
- Transposition sort
- Sorting on the Two-Dimensional Mesh
- snake-order row-column sort
- Sorting Networks
- odd-even merge sort network
- bitonic sort network
4Sorting on 1-D n-step algorithm
- Previously shown insertion and odd-even
transposition sort are n-step algorithm. - How to prove that the sorting finishes before
O(n)? - gt 0-1 Sorting Lemma
5Sorting The 0-1 Sorting Lemma
- Lemma (The 0-1 Sorting Lemma)
- If an oblivious comparison-exchange algorithm
sorts all input sets consisting solely of 0s and
1s, then it sorts all input sets with arbitrary
values. - Proof (By contradiction)
- 1 Assume that an oblivious comparison-exchange
algorithm fails to correctly sort some set of
input values x1, x2, , xn. - 2 Let p be a permutation such that xp(1) lt xp(2)
lt lt xp(n) and let s be a permutation such
that the output of the sorting algorithm is
xs(1), xs(2), .., xs(n). - 3 Let k be the smallest value such that xs(k) ltgt
xp(k) (by 1). - 4 By definition, this means that xs(i) xp(i)
for 0lt i lt k and xs(k) gt xp(k). Hence, there must
be a value of rgt k such that xs(r) xp(k). - 5 Define xi 0 if xi lt xp(k) and 1 if xi gt
xp(k) and examine the actions of the algorithm
on the input set obtained by replacing xi with
xi for 0lt i lt n - 6 Since xi gt xj gt xi gt xj for every i and
j, the algorithm performs the same
comparison-exchange operations on the xinputs as
it did on the original inputs.
6Sorting The 0-1 Sorting Lemma (the proof contd)
- Proof (By contradiction)
- 7 Hence the output on the 0-1 values will be
- xs(1), xs(2), .., xs(n) 0, 0, , 0, 1, ,
0, - which is incorrect.
- 8 The result contradicts against the assumption.
7Sorting on 2-D Snake-order row-column sort
- Aka Shear sort
- How to prove that the shear sort completes the
sorting in log n 1 phases? - Use 0-1 Sorting Lemma
- After applying two phases, the number of unsorted
rows become less than the half of the total rows. - After the sorting of rows in the 1st phase and
paring two rows - 00000111 001111 0..01..1
- 1.100 1.100 1..10..0
- more 0s more 1s equal number
- - After the column exchange,
- 00000000 001101 0..00..0
- 1.10..01.1 1.111 1..11..1
- After the sorting of columns, all 0-rows move to
the upper region and all 1-rows move to the lower
region - Since at least one row in each pair becomes all-0
or all-1 and is moved out of middle region after
the sorting the row and column, the middle region
(dirty region) decreases in size by at least 1/2
for each pair of phases. - Total repetition of phases gt 2 log (\root(n))
log n
8Sort on 2-D Quadrant sorting
- Algorithm
- 1 Recursively sort each quadrant in snake-order
- 2 Sort the rows in snake order
- 3 Sort the columns
- 4 Do 4 \root(n) steps of snake-order bubble sort
0
0
0
50/50
1
1
1
Border lines
0
0
0
50/50
1
1
1
All 0
4 dirty rows gt transposition sort
All 1
9Sort on 2-D Quadrant sorting contd
- Timing Analysis
- Phase 1 O( n1/4 X 1/2 log n) O( 1/2 n1/4
log n) - Phase 2 O(\root(n))
- Phase 3 O(\root(n))
- Phase 4 4\root(n)
- Total O(n1/4 log n 6\root(n))
O(\root(n)) - Extension of idea
- Sort a mesh of size 2i X 2I
- After the sort of 2(i-1) X 2 (i-1) submesh,
the algorithm requires 6 X 2i additional steps - 6 ( 1 2 4 . . . . \root(n)/2 \root(n))
O(\root(n))
10Sort on 2-D 3\root(n) o(\root(n)) step
algorithm
- Algorithm
- 1 Divide the mesh into n1/4 blocks of size
n(3/8) X n(3/8) and simultaneously sort each
block in snake-order. - 2 Perform an n(1/8)-way unshuffle of the
columns. In particular, permute the columns so
that n(3/8) columns in each block are
distributed evenly among the n(1/8) vertical
slices. - 3 Sort each block into snake-order.
- 4 Sort each column in linear-order.
- 5 Collectively sort blocks 1 and 2, blocks 3 and
4, etc. of each vertical slice into snake-order. - 6 Collectively sort blocks 2 and 3, blocks 4 and
5, etc. of each vertical slice into snake-order. - 7 Sort each row in linear order according to the
direction of the overall n-cell snake. - 8 Perform 2 n(3/8) steps of odd-even
transposition sort on the overall n-cell snake.
11Blocks and Slices
After phase 3 at most 2 rows in each horizontal
slice
After phase 1 at most 1 row in each block
12After phase 6 each vertical slice contains at
most one dirty row
13Sort on 2-D 3\root(n) o(\root(n)) step
algorithm
- Timing Analysis
- 1 O(n(3/8) log n)
- 2 \root(n) o(n(3/8))
- 3 O(n(3/8) log n)
- 4 2\root(n)
- 5 O(n(3/8) log n)
- 6 O(n(3/8) log n)
- 7 2\root(n)
- 8 2n(3/8)
- TOTAL 3\root(n) O(n(3/8) log n) lt 3\root(n)
o(\root(n))
14Sorting on 2-D Matching lower bound
- Claim Lower bound is 3\root(n) - o(\root(n))
steps to sort n items on the 2-d mesh. - Reason 1 Any sorting algorithm on the 2-d mesh
must take at least (2\root(n) -2 steps) to move
from (1,1) to (\root(n), \root(n)) position. - Reason 2 (stronger lower bound)
- Consider numbers in the left upper triangle of
size 2n(1/4) X 2n(1/4) (unknown values). - The numbers between 1 and n-2\root(n) stored
arbitrarily in the remainder of the mesh.
15Sorting on 2-D Matching lower bound
- Reason 2 (stronger lower bound) contd
- Let x denote the number in cell (\root(n),
\root(n)) after 2\root(n)-2n(1/4) -3. - Then x is independent of the number in the
triangle. - Let C(m,x) denote the correct column for x when
precisely m of the unknown values are set to 0
and 2\root(n)-m values are set to n. - As m varies between 0 and 2\root(n), C(m,x)
varies between 1 and \root(n) achieving each
possible value at least twice. - Pick m so that C(m,x) 1. Then x will have to
move from cell (\root(n),\root(n) to a cell in
the first column. - This will take at least \root(n)-1 additional
steps. - Thus the algorithm takes at least
- 3\root(n) - 2n(1/4) - 4 steps.
- 3\root(n) - o(\root(n)) steps
16Sorting Word-model vs. Bit-model
- Previous sorting algorithms
- Algorithm and analysis is based on word-model.
- Bit-model a more precise model
- used to analyze the number of gates or components
actually needed to build that device - close to low-level machine
- Sorting algorithm in word-model
- key function comparison/ store/ send/ receive
- how to change these functions to bit-model?
17Sorting Word-model vs. Bit-model (contd)
- Comparison in bit-model
- Method 1 Use of linear array to compare the
numbers bit by bit, starting with the MSB - Method 2 Use a complete binary tree network. The
result is condensed and propagated. - Method 2 is superior to the linear array method
in two respects - use log k 1 bit steps
- use the tree to tell each leaf simultaneously
which number to pass - Can do better?
- If there are many numbers to compare (consider
insertion sort) pipelined execution in linear
array is better. - Total complexity 2nk-2 bit steps
- Complexity deciding factors interconnection
parameters such as bandwidth, diameter, and
bisection width.
18(No Transcript)
19(No Transcript)
20Sorting Non-comparison based sorting
- Sorting n k-bit numbers on a binary tree
- Assume that each leaf consists of a k-cell linear
array of bit processors. - Root log n -cell linear array of
bit-processors - Each leaf initially contains one of the k-bit
binary numbers to be sorted - The sorting completes when the ith leaf contains
the ith smallest number - Analysis based on interconnection parameters
indicates that the time complexity is larger than
W(Nk) bit steps - The argument is correct if k is bigger than
(1e)log n for some constant e. But for smaller
k, there are O(log N) time algorithm. - Sorting n 1-bit numbers on a binary tree
- Change it to a counting problem.
- After counting the number of 1s in leaves (say
m), set the values in the right most m leaves to
1 and the values in the leftmost n-m leaves to 0.
21(No Transcript)
22(No Transcript)
23Other Issues on Sorting
- Other sorting algorithms
- Quick Sort
- Radix Sort
- Extending to other topologies
- Hypercube
- Tree
24Other sorting algorithms Quick sort
- Quick sort
- Sequential version
- Choose a pivot
- Divide a list into two sub-lists which are
smaller than or equal to and larger than the
pivot - Recursive
- Pivot is very important to avoid a worst case
- Parallel version on 2-D Mesh
- Assume to sort pm, pm1, pm2, , pmk.
- Choose a random pivot and broadcast this pivot to
all k processors using embedded tree. - Each processor propagates two values ( of
elements larger than and of elements smaller
than) to its parents. - Information is propagated down the tree to enable
each element to be moved to its proper position. - Parallel version on Hypercube
- Split each dimension by newly chosen pivot value.
25Other sorting algorithms Radix sort
- Radix sort algorithm
- Relies on the binary representation of the
elements to be sorted - Examines the elements to be sorted r bits at a
time, where r lt b. - Radix sort requires b/r iterations
- Parallel radix sort
- load balance
26Summary
- Sorting on 2-D n-step algorithm
- Sorting on 2-D 0-1 sorting lemma
- Proof of correctness and time complexity
- Sorting on 2-D \root(n)(log n 1)-step
algorithm - Shear sort
- Sorting on 2-D 3\root(n) o(\root(n))
algorithm - Reducing dirty region
- Sorting Matching lower bound
- 3\root(n) - o(\root(n))
- Sorting on 2-D word-model vs. bit-model