Algorithms and Applications - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Algorithms and Applications

Description:

Algorithms and Applications – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 34

Provided by: Pao3

Category:

Tags: algorithms | applications | rb

more less

Transcript and Presenter's Notes

Title: Algorithms and Applications

1
Algorithms and Applications
Some Theory Stuff Sorting Algorithms Numerical
(Matrix) Algorithms Graph Algorithms Searching
and Optimization
2
Theory reminder

Big ?, 1 and S notation.
Definition of work (cost)
work(cost) parallel time x number of
processors
Cost-optimal algorithm
parallel time x number of processors
O(sequential time)
Optimal parallel time
opt. parallel time sequential
time/number of processors

3
Algorithms and Applications
Some Theory Stuff Sorting Algorithms Numerical
(Matrix) Algorithms Graph Algorithms Searching
and Optimization
4
Processor i,j Processori,0 Bpos
Ai // send Ai to Processorpos,0
Time O(log n) Processors n2 Cost O(n2 log n)

Rank Sort using n2 processors
if (Aj ltAi) res 1 else res 0
Reduce( sendBuf
res,
recvBuf pos, source
Processori,0, group Processorsi,
, operation )
5
Complexity of the odd-even sort

Using n processors
n phases, n-1 compare-swap operations in a phase
time O(n), cost O(n2)
Using p processors
each processor gets n/p values and sorts them
internally in time 1(n/p log (n/p))
after that, p odd-even sorting phases are needed
1(n/p) operations spent on merging two blocks,
and 1(n/p) for communicating a block
p phases 1(p x n/p) merging 1(p x n/p)
communication 1(n)
overall 1(n/p log (n/p)) 1(n) 1(n)
local sorting
merging communication

6
0-1 Sorting Lemma
Lemma If an oblivious comparison-exchange
algorithm sorts all inputs sets consisting solely
of 0s and 1s, then it sorts all input sets with
arbitrary values. Oblivious algorithm the
compare-exchange operations are prespecified,
i.e. the comparisons performed do not depend on
the outcome of the previous comparisons.
Examples odd-even transposition sort,
shearsort Allows relatively simple proofs of
correctness of oblivious compare-exchange sorting
algorithms. 0-1 Sorting Lemma and proof of
Shearsort complexity are from Leightons book
7
0-1 Sorting Lemma (Proof)
Proof By contradiction. We show that if an
algorithm A fails to sort arbitrary values, then
it does not sort all 0-1 sequences. Let the
algorithm A fails to sort an input sequence xi.
Let ai be the correct output sequence and let
bi be the output of A. Let k be the smallest
index such that bk differs from ak and let l be
the position of ak in the output sequence.
Clearly, lgtk and al lt ak. Consider the input
sequence xi replaced by the sequence yi,
where yi0 if xi ak, otherwise yi1. Since xi
xj ? yi yj for every i and j, the algorithm
performs the same compare-exchange operations on
the input y as it did on x. In particular, at the
position k there will be 1 and at position l
there will be 0. Contradiction with the fact that
A sorts all 0-1 sequences. a1, a2, , ak, ,
al,, an good output on
xi b1, b2, , bk, , bl,, bn
output of A on xi 0, 0, , 1, , 0,
output of A on yi
8
Shearsort correctness and complexity
Using the 0-1 sorting Lemma We show that
Shearsort sorts any 0-1 sequence in log n
iterations of the row and column sort
pair. Consider the situation after the columns
have been sorted
0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 1
0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1
upper region of all 0 rows
middle region of dirty rows
lower region of all 1 rows
At the beginning we assume all rows are dirty. We
show that after a pair of row and column sort,
the number of dirty rows is at least halved.
9
Shearsort correctness and complexity II

We show that from every pair of consecutive dirty
rows, at least one row becomes clean in the
column sort.
We show this for a specific way to perform the
column sort. Since the outcome of the column sort
does not depend on the way it is done, the result
holds for any column sort algorithm.
The column sort strategy
compare-exchange odd rows with the consecutive
even rows
move the resulting clean rows out
sort somehow the remaining dirty rows

10
Shearsort correctness and complexity III
Consecutive pairs of dirty rows after rows are
sorted
0 . . . . . 01 . . . 1 0 . . . 01 . . . . .
1 0 . . . . 01 . . . . 1 1 . . . 10 . . . .
. 0 1 . . . . . 10 . . . 0 1 . . . .
10 . . . . 0 (more 0s)
(more 1s) (equal number)
Consecutive pairs of dirty rows after performing
the first step of the column sort
0 . . . . . . . . . . . 0 0 . . 01 . . 10 .
. 1 0 . . . . . . . . . . . 0 1 . . 10 . .
01 . . 1 1 . . . . . . . . . . . 1 1
. . . . . . . . . . . 1 (more 0s)
(more 1s) (equal number)
11
Shearsort correctness and complexity IV
Example
Before the first step
After the first step
After moving out newly clean rows
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1
1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 0 1 1 0 0 0 0
0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 0 1 1 1 1 1
1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1
1 1 1 1 1 1 1 1 1 1 clean row 1 1 1 1 1 1
1 0 0 0 dirty row 0 0 0 0 0 0 0 0 0 0
newly cleaned row
Summing together O(?n log n) sorting
row/column number of phases)
12
Merge sort complexity

The divide phase
at best, the data are already where they are
needed
The merging phase
merging two subsequences of size k costs O(k)
computation and communication
step i of the merging phase merges subsequences
of size 2i, until n is reached
total complexity O(248n) O(n)
not very parallel, only log n processors can be
efficiently utilized

13
Sorting Networks
Comparators
x
xmax(x,y)
x
xmin(x,y)
y
ymin(x,y)
y
ymax(x,y)
columns of comparators
interconnection network
input wires
output wires
The rest of the sorting slides is according to
Kumars book.
14
Bitonic sort I

Bitonic sequence
A sequence (a0, a1, , an-1) is bitonic if
There exists i such that (a0, a1, , ai) is
monotonically increasing and (ai, ai1, , an-1)
is monotonically decreasing, or
there exists a cyclic shift of indices such that
1. is satisfied
Which of these are bitonic?

15
Bitonic Sort II

Bitonic split
s1 (min(a0, an/2), min(a1, an/21), ,
min(an/2-1, an-1))
s2 (max(a0, an/2), max(a1, an/21), ,
max(an/2-1, an-1))
Properties of bitonic split
s1 and s2 are bitonic
every element of s1 is smaller then every
element of s2

16
Bitonic Sort III

Bitonic merge
sorts a bitonic sequence in log n steps

3
3
3
3
0
5
5
5
0
3
8
8
8
8
5
9
9
0
5
8
10
10
10
10
9
9
10
12
12
12
14
14
14
14
12
20
0
9
12
14
95
95
35
18
18
90
90
23
20
20
60
60
18
35
23
40
40
20
23
35
35
35
95
60
40
23
23
90
40
60
18
18
60
95
90
0
20
40
90
95
17
Bitonic Sort IV
BM2
BM4
BM2
BM8
BM2
BM4
BM2
BM16
BM2
BM4
BM2
BM8
BM2
BM4
BM2
18
Bitonic Sort V
BM2s
BM4s
BM8s
10
10
5
3
20
20
9
5
5
9
10
8
9
5
20
9
3
3
14
10
8
8
12
12
12
14
8
14
14
12
3
20
90
0
0
95
0
90
40
90
60
60
60
60
40
40
90
40
23
23
95
35
35
35
35
23
95
95
23
18
18
18
18
0
19
Bitonic Sort VI

Implementation
simulate the sorting network
a column of comparators can be simulated in
parallel
how to map processors to comparators so that
communication is minimized?
Complexity (comparisons, ignoring communication)
BM2BM4BMn
Gi1 i O(log2 n)

log(n)
20
Bitonic Sort VII
Mapping Bironic Sort to a Hypercube (n
processors)
0000
1
0001
2,1
0010
1
0011
3,2,1,
0100
1
0101
2,1
0110
1
0111
4,3,2,1,
1000
1
1001
2,1
1010
1
1011
3,2,1
1100
1
1101
2,1
1110
1
1111
21
Bitonic Sort VIII
BM16 in hypercube in detail
step 1
step 2
3
4
step 4
step 3
2
1
22
Bitonic Sort VIII
Overall complexity in a hypercube n processors
Tp O(log2 n) O(log2 n)
comparisons communication

p processors
n/p comparators per process
use compare-and-swap operations

Tp O(n/p log(n/p)) O(n/p log2 p) O(n/p
log2 p)
local sort comparisons communication
23
Bitonic Sort IX
BM16 in mesh in detail
step 1
0000
0001
0010
0011
0100
0101
0110
0111
mapping
1000
1001
1010
1011
1100
1101
1110
1111
step 2
step 3
step 4
24
Bitonic Sort X
Overall complexity in a mesh n processors Tp
O(log2 n) O(?n)
comparisons communication

p processors
n/p comparators per process
use compare-and-swap operations

Tp O(n/p log(n/p)) O(n/p log2 p)
O(n/?p)
local sort comparisons communication
25
Parallel Quicksort

Sequential Quicksort
choose pivot
split the sequence into L (ltpivot) and R (gt
pivot)
recursively sort L and R
Naïve Parallel Quicksort
parallelise only the last step
Complexity
O(nn/2n/4) O(n) (average)
dominated by the sequential splitting
only log n processors can be efficiently used

26
Parallel Quicksort II

Parallel Quicksort for shared memory computer
every processor gets n/p elements
repeat
choose pivot and broadcast it
each processor i splits its sequence into Li
(ltpivot) and Ri (gt pivot)
collect all Lis and Ris into global L and R
split the processors into left and right in the
ratio L/R
the left processors recursively sort L, the
right processors R
until a single processor is left for the whole
(reduced) range
sort your range sequentially

27
Parallel Quicksort III
First step
P0
P1
P2
P3
P4
7
13
18
2
17
1
14
20
6
10
15
9
3
16
19
4
11
12
5
8
pivot selection
pivot 7
P0
P1
P2
P3
P4
after local rearrangement
7
2
18
13
1
17
14
20
6
10
15
9
3
4
19
16
5
12
11
8
after global rearrangement
7
2
18
13
1
17
14
20
6
10
15
9
3
4
19
16
5
12
11
8
28
Parallel Quicksort IV
Second step
P0
P1
P2
P3
P4
pivot selection
7
2
18
13
1
17
14
20
6
10
15
9
3
4
19
16
5
12
11
8
pivot 5
pivot 17
P0
P1
P2
P3
P4
after local rearrangement
1
2
7
6
3
4
5
14
13
17
18
20
10
15
9
19
16
12
11
8
after global rearrangement
1
2
7
6
3
4
5
14
13
17
18
20
10
15
9
19
16
12
11
8
29
Parallel Quicksort V
Third step
P0
P1
P2
P3
P4
pivot selection
1
2
7
6
3
4
5
14
13
17
18
20
10
15
9
19
16
12
11
8
pivot 11
P0
P1
P2
P3
P4
after local rearrangement
1
2
6
7
3
4
5
10
13
17
18
19
14
15
9
20
8
12
11
16
after global rearrangement
10
13
17
14
15
9
8
12
11
16
30
Parallel Quicksort VI
Fourth step
P2
P3
after local rearrangement
10
13
17
14
15
9
8
12
11
16
Solution
P0
P1
P2
P3
P4
1
2
6
7
3
4
5
8
9
10
18
19
11
12
13
20
14
15
16
18
31
Parallel Quicksort VII

Complexity analysis
selecting pivot O(1)
broadcasting pivot O(log p)
local rearrangement O(n/p)
global rearrangement O(log p n/p)
multiply by log p iterations
local sequential sort O(n/p log (n/p))
Overall complexity
O(n/p log n/p) O(n/p log p) O(log2 p)

lDest Scan( Li, , )
rDest Scan( Ri, , ) copyElements(AlDest,
Li , Li,) copyElements(AlSizerDest, Ri ,
Ri,)
local sort local rearr.
broadcasting and scan()
global moving
32
Parallel Quicksort VIII

Message passing implementation
more complications with explicit moving the data
around
main complication in the global rearrangement
phase
each process may need to send its Li and Ri to
several other processes
each process may receive its new Li and Ri from
several other processes
the destination of the pieces of Li and Ri
(where to send them in the global rearrangement)
contains destination process and an address
within that process
all-to-all communication may be necessary
asymptotic complexity remains the same

33
Sorting Conclusions
Parallel rank sort - the only one non
compare-exchange Odd-even transposition sort
O(n) time with n processors Shear sort 2D mesh,
O(?n log n) time with n processors 0-1 Sorting
Lemma Naïve parallel merge sort - O(n) time with
n processors Sorting Networks Bitonic sort -
O(log2 n) time with n processors, hypercube and
mesh implementations Quick Sort - O(log2 n) time
with n processors, shared memory and
message-passing implementations

Write a Comment

User Comments (0)