Title: Parallel Algorithms and Parallel Computers ii IN4026 Lecture 1
1Parallel Algorithms and Parallel Computers (ii)
IN4026 Lecture 1
- Cees Witteveen
- Parallel and Distributed Systems Group
Electrical Engineering, Mathematics and
Computer Science
2Part 2 Lectures
- Topics
- algorithmic design of parallel algorithmsa
general discussion about the top level design of
parallel algorithms and the specialization to
different parallel architecturesteaching
material slides copies Jaja Parallel
Algorithms - basic techniquessome basic and general methods
to design efficient parallel algorithms
teaching material slides copies Jaja
Parallel Algorithms - linear recurrences and matrix algebrasome basic
techniques to solve linear recurrencesand
recurrences involving matrices teaching
material slides Grama et al. ch. 8 copies
Jaja Parallel Algorithms
3Part 2 Lectures
- Topics (ctd)
- sorting techniquesan overview of parallel
sorting techniques teaching material
slides Grama et al, chapter 9 - graph algorithms (if time permits)an overview of
graph algorithms for dense and sparse graphs
teaching material slides Grama et al,
chapter 10 - Examination
- written exam, about 5 open exercises
4Lab course
- AssignmentsYou have to do them on your own
- Users guide see the IN4026 blackboard site
- First assignment implement a basic parallel
algorithmic method using PVM (Parallel Virtual
Machine) - Second assignment(matrix) operations with HPF
(High Performance Fortran) - Assistant evaluation assignments supervisor
Ana Verbanescu, email in4026_at_st.ewi.tudelf
t.nl
5Algorithmic Design for Parallel Algorithms
- based on J. Jaja, An Introduction to Parallel
Algorithms(chapter 1)
6Parallel algorithmics
- goal To present a general framework to
design, present and analyze parallel algorithms. - developmenthow to find a suitable parallel
algorithm? - presentationhow to present the main ideas?
- analysishow to evaluate the quality of the
algorithm at a general machine independent level
and at a more specific lower (machine or
architecture dependent) level?
7Subjects
- presentation of algorithms
- top-level versus low(er)-level presentation
- optimality of algorithms
- strong and weak notions of optimality
- algorithmic methods (next week)
- basic methods pointer jumping, divide and
conquer, partitioning, pipelining, cascading
- special topics and low-level detailsmatrix
algorithms, sorting networks
8Underlying general architecture PRAM
(cf Gama pp 31)
Global Memory
p1
p2
p3
pn-1
pn
. . .
local memory
local memory
- uniform RAM program
- varying procs
- architecture is MIMD type
- synchronous operation mode
- no direct communication, only via global
shared memory
9Presentation of algorithms
- WT (work-time) paradigm (1)
- Presentation of algorithms based on PRAM.
- memory model synchronous shared memory
- algorithms mostly single instruction, multiple
data (SIMD) - varieties EREW exclusive read exclusive
write CREW concurrent read exclusive
write CRCW concurrent read concurrent
write subclasses common write only if
values identical arbitrary arbitrary
selection of writing proc priority
proc with minimum index writes
(see also Grama page 31)
10Presentation of Algorithms
- WT (work-time) paradigm
- describes PRAM-algorithms at 2 levels
- top-level WT (Work -Time) presentationprogr
am is a sequence of sets of concurrent
operationsabstracting from - processors
- allocation issues
- lower level Scheduling for p-PRAM
- specialize the general algorithm to an algorithm
using a given amount of processors ( p-processor
PRAM ) - specificy processor allocation and possibly also
communication details
- details of memory-access operations
11WT-presentation top level
- present the algorithm as a sequence of time
steps within each time step a number of
concurrent operations is specified. - time T(n) time steps needed
TP (Grama)work W(n) operations
performed TS (Grama) - informal presentation concurrency
- for i ? j ? k pardo lt statement gt
statements corresponding to values of j
between i and k are executed concurrently
12example find max of array
- max(A) determine the maximum of n 2k elems in
array Ainput array A1 . . n of int - output max max A i i 1, . . ., n
-
13WT-presentation example
- max(A) determine the maximum of n 2k elems in
array Ainput array A1 . . n of int - output max max A i i 1, . . ., n
- begin
- 1. for 1 ? i ? n pardo B(i) A(i)
- 2. for h 1 to log n do
- for 1 ? j ? n / 2h pardo
- if B2j ? B2j-1 then Bj B2j
- else Bj B2j-1
- 3. max B1
- end
W ?h1..logn n/2h n ?h1,2 ..1/2h
O(n)
T O(1), W O(n)
T O(log n), W O(n)
T O(1), W O(1)
Total W O(n), T O(log n)
14WT-analysis max
- top level
- no indication of processors
- no indication of allocation of operations to
processors - time time steps T(n) cost (step 1
step 2 step 3 ) c c log n c O (log
n) - work of operations performed W(n) n
2.?h 1 ... logn (n/2h) 1 n 2n 1
O(n) - result analysis ( O( n ) , O( log n ) ) -
algorithm
15WT-presentation lower level
- Suppose A is a ( W(n), T(n) ) WT-algorithm W(n)
? i 1 . . . T(n) Wi(n) Wi(n)
operations executed at time step i - Take p-PRAM number p of processors is
fixed.Every time step i can be simulated in ?
?Wi(n)/p? time steps on a p-PRAM. Total time
Tp(n) equals - Tp(n) ? i1T(n) ?Wi(n)/p? ? ? i1T(n)
(?Wi(n)/p? 1) ? ( ? i1T(n) ?Wi(n)/p? )
T(n) ?W(n) / p? T(n) - ( allocation of processors to tasks)
16Generalisation Brents principle
- Brents scheduling principle
- A parallel algorithm consisting of a sequence of
k concurrent time steps, step i taking ti time
and needing ai processors, can be executed in
O(a/p t) time with p processors,where t ?
ik ti a ? iT(n) ai x ti - Prove this principle using the previous slide !
17Lower level presentation (example)
- MAX(A) for p-PRAM with p 2m ? 2k n.
- processor allocation Let r 2k-m. Processor
Pj for j1 . . p takes care for operations on
subarray A r ? (j-1) 1 , . . ., r ? j and
thereafter on the similar subarray B r ? (j-1)
1 , . . ., r ? j of B.
18Lower level presentation (example)
- specialisation of MAX(A) for p-PRAM with p 2m
? 2k n. - processor allocation Let r 2k-m. Processor
Pj for j1 . . p takes care for operations on
subarray A r ? (j-1) 1 , . . ., r ? j and
thereafter on the similar subarray B r ? (j-1)
1 , . . ., r ? j of B. - time cost Tp Tp(n) is completely determined by
time cost of processor P1 n/p ?
n/(21.p) ? ... ? n/(2log n .p) ? ? n/p
( n/(21.p) 1) ... (n/(2log n .p) 1)
n/p (1 1/21 1/2log n ) log n O(n/p
log n) - work cost Wp W
Note that by the WT-schedulingprinciple we
cannot do better
19Work versus Cost
- Tp (W(n), T(n))-WT-algorithm costs on a
p-PRAM Tp(n) O( W(n)/p T(n) ) - Cp We define the cost of an algorithm on a
p-PRAM as Cp(n) p x Tp(n) O( W(n) p x
T(n) ) - Hence, for p O ( W(n) / T(n) ) we have W(n)
?(Cp(n)) - Definition if W(n) equals the time a best
sequential algorithm needs then - a p-PRAM algorithm is cost-optimal if p O (
W(n) / T(n) ). - Example algorithm max (A) if p ? n/ log n
cf Gramapage 204
20Optimality weak and strong
- (W(n), T(n)) weak optimalityparallel-algorithm
is weakly optimal if W(n) ?(Ts(n)) where Ts(n)
is time needed by a best sequential algorithm - (W(n), T(n)) strong optimalityW(n) ?( Ts(n)
) (weak optimality) and T(n) is minimal for all
weakly optimal parallel algorithms for the
problem.
21Max an (n2 ,O(1) )-algorithm !
PRAM model common CRCW p-PRAM
- findmax(A)input array A1. . noutput
max value of maximal elementvar
array B1..n, 1..n of boolean array M1. .n
of booleanbegin - 1. for 1 ? i, j ? n pardo if Ai ?
Aj then Bi,j 1 else Bi,j 0 2. for
1 ? i ? n pardo Mi 1 for
1jn pardo if Bi,j 0 then Mi 0
if Mi then max Ai - end
(write if all processors agree on value)
T O(1) , W O(n2)
T O(1) , W O(n2)
22Analysis O( n2,1 )-algorithm
- Note that W(n) O(n2) gt O(n) time best seq.
algorithm. So the algorithm is not weakly
optimal. - Can we do better?We will present an ( O(n1c),
O(1)) algorithm for computing the maximum, where
c is an arbitrary small constant gt 0. (Next week)
23Application to other problems
- Problem Given a boolean array B1..n find the
index of the first 1 in B in (O(n), O(1))-time
using a CRCW PRAM - Solution (sketch!)
- Divide the array in vn subarrays of size vn.
- Determine in each subarray in (O(vn), O(1))
whether the subarray contains a 1 or only zeros. - find the first subarray containing a 1 in (O(vn),
O(1)). - find the first 1 in this subarray in (O(vn),
O(1)).
Hint finding whether an array C of size vn
contains a 1 answ 0 for 1 j vn pardo
if Cj1 then answ1
24Next time
- methods to design parallel algorithms
- methods to mix algorithms to obtain algorithms
with a better (time/work) profile
See you next week !