Title: Parallel Prefix and Data Parallel Operations
1Parallel Prefix and Data Parallel Operations
- Motivation basic parallel operations which
occurs repeatedly. - Let ) be an associative operation.
- (a1 ) a2) ) a3 a1 ) (a2 ) a3 )
- How to compute
- (a1 ) a2 ) . ) an ) in parallel in O(logn)
time? -
-
2Approach 1
a0
a1
a2
a3
a4
a5
a6
a7
?01
?00
?12
?23
?34
?45
?56
?67
d1
?01
?00
?02
?03
?14
?25
?36
?47
d2
?01
?00
?02
?03
?04
?05
?06
?07
d4
Assume that n 2k for i 0 to k-1 for j
0 to n-1-2i do in parallel xj 2i
xj xj 2i
3How to do on Tree Architecture?
for each node if there is a signal from left and
right St lt- Sl Sr if there is a signal R,
send R to both its children if the node is a
leaf and there is a signal R, X lt- X R
4How to do on a Hypercube
A complete binary tree can be embedded into a
hypercube Simpler solution each node
computes prefix and total sum
for i 0 to k-1 for j 0 to
n-1 do in parallel xj xj
sumji if i-th bit of j 1
sumj sumj sumji, where ji and
j have the same binary number representation
except their i-th bit, where the i-th bit of ji
is the complement of the i-bit of j.
5Prefix on Hypercube
for i 0 to k-1 for j 0 to
n-1 do in parallel xj xj
sumji if i-th bit of j 1
sumj sumj sumji,
6Applications of Data Parallel Operations
- Any associative operations
- Examples
- min, max, add
- adding two binary numbers
- finite state automata
- radix sort
- segmented prefix sum
- routing
- packing
- unpacking
- broadcast (copy-scan)
- solving recurrence equations
- straight line computation (parallel arithmetic
evaluation)
7Adding two n bit numbers as parallel prefix
- a an-1 . a0
- b bn-1 . b0
- s a b
- note that si ai ? bi ? ci-1
- to compute ci define g and p as
- gi ai ? bi , pi ai ? bi
- define ? as (g,p) ? (g,p) (g ? (p ?
g), p ? p) - Then carry bit ci can be computed by
- (g,p) ? (g,p) (g ? (p ? g), p ? p)
- (Gi, Pi) (gi,pi) ? (gi-1, pi-1) ? ?
(g0,p0) - and Gi ci
8Hardware circuit of recursive look-ahead adder
9Parsing a regular language
?(q0,b) q2, ?(q0,c) q1, ?(q1,b) q0,
?(q1,c) qr, ?(q2,b) qr, ?(q2,c) q0 qr
reject state
b
10Segmented Prefix operation
11Segmented Prefix computation
Let ? be any associative operation. For
segmented operation of ?, define ? as
follows
Then ? is associative and we can compute
segmented operation in O(logn) time.
12Enumerating
Data 5 6 3 1 8 3 7 5 9 2 active
procs 1 0 1 1 0 0 1 0 1
0 enumerated 0 x 1 2 x x 3 x 4 0
13packing
- data 5 6 3 1 8 3 7 5 9 2
- active procs 1 0 1 1 0 0 1 0 1 0
- enumerated 0 x 1 2 x x 3 x 4 x
- packed data 5 3 1 7 9 x x x x x
14Packing and Unpacking on Hypercube
- Packing
- adjust bit 0
- adjust bit 1
- adjust bit 2
- ...
- adjust bit k-1
- Unpacking
- adjust bit k-1
- adjust bit k-2
- ...
- adjust bit 1
- adjust bit 0
- How about in the order of adjust bit 0, 1, ...,
k-1 for packing?
15Unpacking
Address 0 1 2 3 4 5 6 7 8
9 data 6 2 3 5 9 x x x x
x active procs 1 0 1 1 0 0 1 0 1
0 enumerated 0 x 1 2 x x 3 x 4
x destination 0 2 3 6 8 x x x x
x unpacked data 6 x 2 3 x x 5 x 9
x
16Copy Scan (broadcast)
address 0 1 2 3 4 5 6 7
8 9 data 6 2 3 5 9 4 1
7 8 10 segmented bit 1 0 1 1 0
0 1 0 1 0 result 6 6 3
5 5 5 1 1 8 8
17Radix Sort
for j k-1 to 0 // x has k
bits for all i in 0 .. n-1 do
parallel if j-th bit of xi
is 0 yi enumerate
c count if
j-th bit of xi is 1 y i lt-
enumerate c x yi x i
Radix sort another code for j k-1 to 0
// x has k bits for all i in 0
.. n-1 do parallel pack left
xi if j-th bit of xi pack
right xi if j-th bit of xi
18Quick Sort
- 1. Pick a pivot p
- 2. Broadcast p
- 3. For all PE i, compare Ai with p
- if Ai ltp, pack left Ai in the segment
- if Ai gt p, pack right Ai in the
segment -
- 4. Mark the segment boundary
- 5. Each segment, quick sort recursively
19Solving Linear Recurrence Equations
- fnan-1fn-1 an-2fn-2
- fn
- fn-1
20Pointer Jumping and Tree Computation
How to compute a prefix on a linked list?
If NEXTi ! NILL then Xi lt- Xi
XNEXTi NEXTi lt- NEXTNEXTi
How to make 1 3 6 10 15 21 28
order?
21Application Tree computation
Pre-order numbering
Can be applied to in order, post order number of
children, depth etc. Bi-component, etc also
22Recurrence Equation
- Example LU decomposition on a triangular matrix