Algorithms and Applications - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Algorithms and Applications

Description:

Algorithms and Applications – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 34
Provided by: Pao3
Category:

less

Transcript and Presenter's Notes

Title: Algorithms and Applications


1
Algorithms and Applications
Some Theory Stuff Sorting Algorithms Numerical
(Matrix) Algorithms Graph Algorithms Searching
and Optimization
2
Theory reminder
  • Big ?, 1 and S notation.
  • Definition of work (cost)
  • work(cost) parallel time x number of
    processors
  • Cost-optimal algorithm
  • parallel time x number of processors
    O(sequential time)
  • Optimal parallel time
  • opt. parallel time sequential
    time/number of processors

3
Algorithms and Applications
Some Theory Stuff Sorting Algorithms Numerical
(Matrix) Algorithms Graph Algorithms Searching
and Optimization
4
Processor i,j Processori,0 Bpos
Ai // send Ai to Processorpos,0
Time O(log n) Processors n2 Cost O(n2 log n)

Rank Sort using n2 processors
if (Aj ltAi) res 1 else res 0
Reduce( sendBuf
res,
recvBuf pos, source
Processori,0, group Processorsi,
, operation )
5
Complexity of the odd-even sort
  • Using n processors
  • n phases, n-1 compare-swap operations in a phase
  • time O(n), cost O(n2)
  • Using p processors
  • each processor gets n/p values and sorts them
    internally in time 1(n/p log (n/p))
  • after that, p odd-even sorting phases are needed
  • 1(n/p) operations spent on merging two blocks,
    and 1(n/p) for communicating a block
  • p phases 1(p x n/p) merging 1(p x n/p)
    communication 1(n)
  • overall 1(n/p log (n/p)) 1(n) 1(n)
  • local sorting
    merging communication

6
0-1 Sorting Lemma
Lemma If an oblivious comparison-exchange
algorithm sorts all inputs sets consisting solely
of 0s and 1s, then it sorts all input sets with
arbitrary values. Oblivious algorithm the
compare-exchange operations are prespecified,
i.e. the comparisons performed do not depend on
the outcome of the previous comparisons.
Examples odd-even transposition sort,
shearsort Allows relatively simple proofs of
correctness of oblivious compare-exchange sorting
algorithms. 0-1 Sorting Lemma and proof of
Shearsort complexity are from Leightons book
7
0-1 Sorting Lemma (Proof)
Proof By contradiction. We show that if an
algorithm A fails to sort arbitrary values, then
it does not sort all 0-1 sequences. Let the
algorithm A fails to sort an input sequence xi.
Let ai be the correct output sequence and let
bi be the output of A. Let k be the smallest
index such that bk differs from ak and let l be
the position of ak in the output sequence.
Clearly, lgtk and al lt ak. Consider the input
sequence xi replaced by the sequence yi,
where yi0 if xi ak, otherwise yi1. Since xi
xj ? yi yj for every i and j, the algorithm
performs the same compare-exchange operations on
the input y as it did on x. In particular, at the
position k there will be 1 and at position l
there will be 0. Contradiction with the fact that
A sorts all 0-1 sequences. a1, a2, , ak, ,
al,, an good output on
xi b1, b2, , bk, , bl,, bn
output of A on xi 0, 0, , 1, , 0,
output of A on yi
8
Shearsort correctness and complexity
Using the 0-1 sorting Lemma We show that
Shearsort sorts any 0-1 sequence in log n
iterations of the row and column sort
pair. Consider the situation after the columns
have been sorted
0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 1
0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1
upper region of all 0 rows
middle region of dirty rows
lower region of all 1 rows
At the beginning we assume all rows are dirty. We
show that after a pair of row and column sort,
the number of dirty rows is at least halved.
9
Shearsort correctness and complexity II
  • We show that from every pair of consecutive dirty
    rows, at least one row becomes clean in the
    column sort.
  • We show this for a specific way to perform the
    column sort. Since the outcome of the column sort
    does not depend on the way it is done, the result
    holds for any column sort algorithm.
  • The column sort strategy
  • compare-exchange odd rows with the consecutive
    even rows
  • move the resulting clean rows out
  • sort somehow the remaining dirty rows

10
Shearsort correctness and complexity III
Consecutive pairs of dirty rows after rows are
sorted
0 . . . . . 01 . . . 1 0 . . . 01 . . . . .
1 0 . . . . 01 . . . . 1 1 . . . 10 . . . .
. 0 1 . . . . . 10 . . . 0 1 . . . .
10 . . . . 0 (more 0s)
(more 1s) (equal number)
Consecutive pairs of dirty rows after performing
the first step of the column sort
0 . . . . . . . . . . . 0 0 . . 01 . . 10 .
. 1 0 . . . . . . . . . . . 0 1 . . 10 . .
01 . . 1 1 . . . . . . . . . . . 1 1
. . . . . . . . . . . 1 (more 0s)
(more 1s) (equal number)
11
Shearsort correctness and complexity IV
Example
Before the first step
After the first step
After moving out newly clean rows
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1
1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 0 1 1 0 0 0 0
0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 0 1 1 1 1 1
1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1
1 1 1 1 1 1 1 1 1 1 clean row 1 1 1 1 1 1
1 0 0 0 dirty row 0 0 0 0 0 0 0 0 0 0
newly cleaned row
Summing together O(?n log n) sorting
row/column number of phases)
12
Merge sort complexity
  • The divide phase
  • at best, the data are already where they are
    needed
  • The merging phase
  • merging two subsequences of size k costs O(k)
    computation and communication
  • step i of the merging phase merges subsequences
    of size 2i, until n is reached
  • total complexity O(248n) O(n)
  • not very parallel, only log n processors can be
    efficiently utilized

13
Sorting Networks
Comparators
x
xmax(x,y)
x
xmin(x,y)
y
ymin(x,y)
y
ymax(x,y)
columns of comparators
interconnection network
input wires
output wires
The rest of the sorting slides is according to
Kumars book.
14
Bitonic sort I
  • Bitonic sequence
  • A sequence (a0, a1, , an-1) is bitonic if
  • There exists i such that (a0, a1, , ai) is
    monotonically increasing and (ai, ai1, , an-1)
    is monotonically decreasing, or
  • there exists a cyclic shift of indices such that
    1. is satisfied
  • Which of these are bitonic?

15
Bitonic Sort II
  • Bitonic split
  • s1 (min(a0, an/2), min(a1, an/21), ,
    min(an/2-1, an-1))
  • s2 (max(a0, an/2), max(a1, an/21), ,
    max(an/2-1, an-1))
  • Properties of bitonic split
  • s1 and s2 are bitonic
  • every element of s1 is smaller then every
    element of s2

16
Bitonic Sort III
  • Bitonic merge
  • sorts a bitonic sequence in log n steps

3
3
3
3
0
5
5
5
0
3
8
8
8
8
5
9
9
0
5
8
10
10
10
10
9
9
10
12
12
12
14
14
14
14
12
20
0
9
12
14
95
95
35
18
18
90
90
23
20
20
60
60
18
35
23
40
40
20
23
35
35
35
95
60
40
23
23
90
40
60
18
18
60
95
90
0
20
40
90
95
17
Bitonic Sort IV
BM2
BM4
BM2
BM8
BM2
BM4
BM2
BM16
BM2
BM4
BM2
BM8
BM2
BM4
BM2
18
Bitonic Sort V
BM2s
BM4s
BM8s
10
10
5
3
20
20
9
5
5
9
10
8
9
5
20
9
3
3
14
10
8
8
12
12
12
14
8
14
14
12
3
20
90
0
0
95
0
90
40
90
60
60
60
60
40
40
90
40
23
23
95
35
35
35
35
23
95
95
23
18
18
18
18
0
19
Bitonic Sort VI
  • Implementation
  • simulate the sorting network
  • a column of comparators can be simulated in
    parallel
  • how to map processors to comparators so that
    communication is minimized?
  • Complexity (comparisons, ignoring communication)
  • BM2BM4BMn
  • Gi1 i O(log2 n)

log(n)
20
Bitonic Sort VII
Mapping Bironic Sort to a Hypercube (n
processors)
0000
1
0001
2,1
0010
1
0011
3,2,1,
0100
1
0101
2,1
0110
1
0111
4,3,2,1,
1000
1
1001
2,1
1010
1
1011
3,2,1
1100
1
1101
2,1
1110
1
1111
21
Bitonic Sort VIII
BM16 in hypercube in detail
step 1
step 2
3
4
step 4
step 3
2
1
22
Bitonic Sort VIII
Overall complexity in a hypercube n processors
Tp O(log2 n) O(log2 n)
comparisons communication
  • p processors
  • n/p comparators per process
  • use compare-and-swap operations

Tp O(n/p log(n/p)) O(n/p log2 p) O(n/p
log2 p)
local sort comparisons communication
23
Bitonic Sort IX
BM16 in mesh in detail
step 1
0000
0001
0010
0011
0100
0101
0110
0111
mapping
1000
1001
1010
1011
1100
1101
1110
1111
step 2
step 3
step 4
24
Bitonic Sort X
Overall complexity in a mesh n processors Tp
O(log2 n) O(?n)
comparisons communication
  • p processors
  • n/p comparators per process
  • use compare-and-swap operations

Tp O(n/p log(n/p)) O(n/p log2 p)
O(n/?p)
local sort comparisons communication
25
Parallel Quicksort
  • Sequential Quicksort
  • choose pivot
  • split the sequence into L (ltpivot) and R (gt
    pivot)
  • recursively sort L and R
  • Naïve Parallel Quicksort
  • parallelise only the last step
  • Complexity
  • O(nn/2n/4) O(n) (average)
  • dominated by the sequential splitting
  • only log n processors can be efficiently used

26
Parallel Quicksort II
  • Parallel Quicksort for shared memory computer
  • every processor gets n/p elements
  • repeat
  • choose pivot and broadcast it
  • each processor i splits its sequence into Li
    (ltpivot) and Ri (gt pivot)
  • collect all Lis and Ris into global L and R
  • split the processors into left and right in the
    ratio L/R
  • the left processors recursively sort L, the
    right processors R
  • until a single processor is left for the whole
    (reduced) range
  • sort your range sequentially

27
Parallel Quicksort III
First step
P0
P1
P2
P3
P4
7
13
18
2
17
1
14
20
6
10
15
9
3
16
19
4
11
12
5
8
pivot selection
pivot 7
P0
P1
P2
P3
P4
after local rearrangement
7
2
18
13
1
17
14
20
6
10
15
9
3
4
19
16
5
12
11
8
after global rearrangement
7
2
18
13
1
17
14
20
6
10
15
9
3
4
19
16
5
12
11
8
28
Parallel Quicksort IV
Second step
P0
P1
P2
P3
P4
pivot selection
7
2
18
13
1
17
14
20
6
10
15
9
3
4
19
16
5
12
11
8
pivot 5
pivot 17
P0
P1
P2
P3
P4
after local rearrangement
1
2
7
6
3
4
5
14
13
17
18
20
10
15
9
19
16
12
11
8
after global rearrangement
1
2
7
6
3
4
5
14
13
17
18
20
10
15
9
19
16
12
11
8
29
Parallel Quicksort V
Third step
P0
P1
P2
P3
P4
pivot selection
1
2
7
6
3
4
5
14
13
17
18
20
10
15
9
19
16
12
11
8
pivot 11
P0
P1
P2
P3
P4
after local rearrangement
1
2
6
7
3
4
5
10
13
17
18
19
14
15
9
20
8
12
11
16
after global rearrangement
10
13
17
14
15
9
8
12
11
16
30
Parallel Quicksort VI
Fourth step
P2
P3
after local rearrangement
10
13
17
14
15
9
8
12
11
16
Solution
P0
P1
P2
P3
P4
1
2
6
7
3
4
5
8
9
10
18
19
11
12
13
20
14
15
16
18
31
Parallel Quicksort VII
  • Complexity analysis
  • selecting pivot O(1)
  • broadcasting pivot O(log p)
  • local rearrangement O(n/p)
  • global rearrangement O(log p n/p)
  • multiply by log p iterations
  • local sequential sort O(n/p log (n/p))
  • Overall complexity
  • O(n/p log n/p) O(n/p log p) O(log2 p)

lDest Scan( Li, , )
rDest Scan( Ri, , ) copyElements(AlDest,
Li , Li,) copyElements(AlSizerDest, Ri ,
Ri,)
local sort local rearr.
broadcasting and scan()
global moving
32
Parallel Quicksort VIII
  • Message passing implementation
  • more complications with explicit moving the data
    around
  • main complication in the global rearrangement
    phase
  • each process may need to send its Li and Ri to
    several other processes
  • each process may receive its new Li and Ri from
    several other processes
  • the destination of the pieces of Li and Ri
    (where to send them in the global rearrangement)
    contains destination process and an address
    within that process
  • all-to-all communication may be necessary
  • asymptotic complexity remains the same

33
Sorting Conclusions
Parallel rank sort - the only one non
compare-exchange Odd-even transposition sort
O(n) time with n processors Shear sort 2D mesh,
O(?n log n) time with n processors 0-1 Sorting
Lemma Naïve parallel merge sort - O(n) time with
n processors Sorting Networks Bitonic sort -
O(log2 n) time with n processors, hypercube and
mesh implementations Quick Sort - O(log2 n) time
with n processors, shared memory and
message-passing implementations
Write a Comment
User Comments (0)
About PowerShow.com