Parallel Algorithms and Parallel Computers ii IN4026 Lecture 1 PowerPoint PPT Presentation

presentation player overlay
1 / 23
About This Presentation
Transcript and Presenter's Notes

Title: Parallel Algorithms and Parallel Computers ii IN4026 Lecture 1


1
Parallel Algorithms and Parallel Computers (ii)
IN4026 Lecture 1
  • Cees Witteveen
  • Parallel and Distributed Systems Group
    Electrical Engineering, Mathematics and
    Computer Science

2
Part 2 Lectures
  • Topics
  • algorithmic design of parallel algorithmsa
    general discussion about the top level design of
    parallel algorithms and the specialization to
    different parallel architecturesteaching
    material slides copies Jaja Parallel
    Algorithms
  • basic techniquessome basic and general methods
    to design efficient parallel algorithms
    teaching material slides copies Jaja
    Parallel Algorithms
  • linear recurrences and matrix algebrasome basic
    techniques to solve linear recurrencesand
    recurrences involving matrices teaching
    material slides Grama et al. ch. 8 copies
    Jaja Parallel Algorithms

3
Part 2 Lectures
  • Topics (ctd)
  • sorting techniquesan overview of parallel
    sorting techniques teaching material
    slides Grama et al, chapter 9
  • graph algorithms (if time permits)an overview of
    graph algorithms for dense and sparse graphs
    teaching material slides Grama et al,
    chapter 10
  • Examination
  • written exam, about 5 open exercises

4
Lab course
  • AssignmentsYou have to do them on your own
  • Users guide see the IN4026 blackboard site
  • First assignment implement a basic parallel
    algorithmic method using PVM (Parallel Virtual
    Machine)
  • Second assignment(matrix) operations with HPF
    (High Performance Fortran)
  • Assistant evaluation assignments supervisor
    Ana Verbanescu, email in4026_at_st.ewi.tudelf
    t.nl

5
Algorithmic Design for Parallel Algorithms
  • based on J. Jaja, An Introduction to Parallel
    Algorithms(chapter 1)

6
Parallel algorithmics
  • goal To present a general framework to
    design, present and analyze parallel algorithms.
  • developmenthow to find a suitable parallel
    algorithm?
  • presentationhow to present the main ideas?
  • analysishow to evaluate the quality of the
    algorithm at a general machine independent level
    and at a more specific lower (machine or
    architecture dependent) level?

7
Subjects
  • presentation of algorithms
  • top-level versus low(er)-level presentation
  • optimality of algorithms
  • strong and weak notions of optimality
  • algorithmic methods (next week)
  • basic methods pointer jumping, divide and
    conquer, partitioning, pipelining, cascading
  • special topics and low-level detailsmatrix
    algorithms, sorting networks

8
Underlying general architecture PRAM
(cf Gama pp 31)
Global Memory
p1
p2
p3
pn-1
pn
. . .
local memory
local memory
  • uniform RAM program
  • varying procs
  • architecture is MIMD type
  • synchronous operation mode
  • no direct communication, only via global
    shared memory

9
Presentation of algorithms
  • WT (work-time) paradigm (1)
  • Presentation of algorithms based on PRAM.
  • memory model synchronous shared memory
  • algorithms mostly single instruction, multiple
    data (SIMD)
  • varieties EREW exclusive read exclusive
    write CREW concurrent read exclusive
    write CRCW concurrent read concurrent
    write subclasses common write only if
    values identical arbitrary arbitrary
    selection of writing proc priority
    proc with minimum index writes

(see also Grama page 31)
10
Presentation of Algorithms
  • WT (work-time) paradigm
  • describes PRAM-algorithms at 2 levels
  • top-level WT (Work -Time) presentationprogr
    am is a sequence of sets of concurrent
    operationsabstracting from
  • processors
  • allocation issues
  • lower level Scheduling for p-PRAM
  • specialize the general algorithm to an algorithm
    using a given amount of processors ( p-processor
    PRAM )
  • specificy processor allocation and possibly also
    communication details

- details of memory-access operations
11
WT-presentation top level
  • present the algorithm as a sequence of time
    steps within each time step a number of
    concurrent operations is specified.
  • time T(n) time steps needed
    TP (Grama)work W(n) operations
    performed TS (Grama)
  • informal presentation concurrency
  • for i ? j ? k pardo lt statement gt
    statements corresponding to values of j
    between i and k are executed concurrently

12
example find max of array
  • max(A) determine the maximum of n 2k elems in
    array Ainput array A1 . . n of int
  • output max max A i i 1, . . ., n

13
WT-presentation example
  • max(A) determine the maximum of n 2k elems in
    array Ainput array A1 . . n of int
  • output max max A i i 1, . . ., n
  • begin
  • 1. for 1 ? i ? n pardo B(i) A(i)
  • 2. for h 1 to log n do
  • for 1 ? j ? n / 2h pardo
  • if B2j ? B2j-1 then Bj B2j
  • else Bj B2j-1
  • 3. max B1
  • end

W ?h1..logn n/2h n ?h1,2 ..1/2h
O(n)
T O(1), W O(n)
T O(log n), W O(n)
T O(1), W O(1)
Total W O(n), T O(log n)
14
WT-analysis max
  • top level
  • no indication of processors
  • no indication of allocation of operations to
    processors
  • time time steps T(n) cost (step 1
    step 2 step 3 ) c c log n c O (log
    n)
  • work of operations performed W(n) n
    2.?h 1 ... logn (n/2h) 1 n 2n 1
    O(n)
  • result analysis ( O( n ) , O( log n ) ) -
    algorithm

15
WT-presentation lower level
  • Suppose A is a ( W(n), T(n) ) WT-algorithm W(n)
    ? i 1 . . . T(n) Wi(n) Wi(n)
    operations executed at time step i
  • Take p-PRAM number p of processors is
    fixed.Every time step i can be simulated in ?
    ?Wi(n)/p? time steps on a p-PRAM. Total time
    Tp(n) equals
  • Tp(n) ? i1T(n) ?Wi(n)/p? ? ? i1T(n)
    (?Wi(n)/p? 1) ? ( ? i1T(n) ?Wi(n)/p? )
    T(n) ?W(n) / p? T(n)
  • ( allocation of processors to tasks)

16
Generalisation Brents principle
  • Brents scheduling principle
  • A parallel algorithm consisting of a sequence of
    k concurrent time steps, step i taking ti time
    and needing ai processors, can be executed in
    O(a/p t) time with p processors,where t ?
    ik ti a ? iT(n) ai x ti
  • Prove this principle using the previous slide !

17
Lower level presentation (example)
  • MAX(A) for p-PRAM with p 2m ? 2k n.
  • processor allocation Let r 2k-m. Processor
    Pj for j1 . . p takes care for operations on
    subarray A r ? (j-1) 1 , . . ., r ? j and
    thereafter on the similar subarray B r ? (j-1)
    1 , . . ., r ? j of B.

18
Lower level presentation (example)
  • specialisation of MAX(A) for p-PRAM with p 2m
    ? 2k n.
  • processor allocation Let r 2k-m. Processor
    Pj for j1 . . p takes care for operations on
    subarray A r ? (j-1) 1 , . . ., r ? j and
    thereafter on the similar subarray B r ? (j-1)
    1 , . . ., r ? j of B.
  • time cost Tp Tp(n) is completely determined by
    time cost of processor P1 n/p ?
    n/(21.p) ? ... ? n/(2log n .p) ? ? n/p
    ( n/(21.p) 1) ... (n/(2log n .p) 1)
    n/p (1 1/21 1/2log n ) log n O(n/p
    log n)
  • work cost Wp W

Note that by the WT-schedulingprinciple we
cannot do better
19
Work versus Cost
  • Tp (W(n), T(n))-WT-algorithm costs on a
    p-PRAM Tp(n) O( W(n)/p T(n) )
  • Cp We define the cost of an algorithm on a
    p-PRAM as Cp(n) p x Tp(n) O( W(n) p x
    T(n) )
  • Hence, for p O ( W(n) / T(n) ) we have W(n)
    ?(Cp(n))
  • Definition if W(n) equals the time a best
    sequential algorithm needs then
  • a p-PRAM algorithm is cost-optimal if p O (
    W(n) / T(n) ).
  • Example algorithm max (A) if p ? n/ log n

cf Gramapage 204
20
Optimality weak and strong
  • (W(n), T(n)) weak optimalityparallel-algorithm
    is weakly optimal if W(n) ?(Ts(n)) where Ts(n)
    is time needed by a best sequential algorithm
  • (W(n), T(n)) strong optimalityW(n) ?( Ts(n)
    ) (weak optimality) and T(n) is minimal for all
    weakly optimal parallel algorithms for the
    problem.

21
Max an (n2 ,O(1) )-algorithm !
PRAM model common CRCW p-PRAM
  • findmax(A)input array A1. . noutput
    max value of maximal elementvar
    array B1..n, 1..n of boolean array M1. .n
    of booleanbegin
  • 1. for 1 ? i, j ? n pardo if Ai ?
    Aj then Bi,j 1 else Bi,j 0 2. for
    1 ? i ? n pardo Mi 1 for
    1jn pardo if Bi,j 0 then Mi 0
    if Mi then max Ai
  • end

(write if all processors agree on value)
T O(1) , W O(n2)
T O(1) , W O(n2)

22
Analysis O( n2,1 )-algorithm
  • Note that W(n) O(n2) gt O(n) time best seq.
    algorithm. So the algorithm is not weakly
    optimal.
  • Can we do better?We will present an ( O(n1c),
    O(1)) algorithm for computing the maximum, where
    c is an arbitrary small constant gt 0. (Next week)

23
Application to other problems
  • Problem Given a boolean array B1..n find the
    index of the first 1 in B in (O(n), O(1))-time
    using a CRCW PRAM
  • Solution (sketch!)
  • Divide the array in vn subarrays of size vn.
  • Determine in each subarray in (O(vn), O(1))
    whether the subarray contains a 1 or only zeros.
  • find the first subarray containing a 1 in (O(vn),
    O(1)).
  • find the first 1 in this subarray in (O(vn),
    O(1)).

Hint finding whether an array C of size vn
contains a 1 answ 0 for 1 j vn pardo
if Cj1 then answ1
24
Next time
  • methods to design parallel algorithms
  • methods to mix algorithms to obtain algorithms
    with a better (time/work) profile

See you next week !
Write a Comment
User Comments (0)
About PowerShow.com