HPC 01 - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

HPC 01

Description:

tcomm(l) = tstartup (h-1)tstart-hop (l l0)tsend tblock ... algo./tp for real parallel algo = [t1..]/[... hbas php] (complex form -diff to use) ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 14
Provided by: abani5
Category:
Tags: hpc | algo

less

Transcript and Presenter's Notes

Title: HPC 01


1
HPC 01
  • Communication Models, Speedup and Scalability
  • Schoenauer sec 8.2,8.4

2
Message Passing Time
  • To send l bytes
  • tcomm(l) tstartup (h-1)tstart-hop(ll0)tsendt
    block
  • tstartupTotal time in setting up the
    communication
  • tstart-hop Time for switching each hop in
    wormhole routing
  • h no. of hops l no. of bytes to transfer
  • l0 extra header bytes that are also moved
  • tsend time to actually transfer 1 byte
  • tblock time used in blocked messages en route

3
Communication Model
  • Speed l/tcomm Actual ltlt Theoretical hardware
    limit advertised
  • Consequences
  • Send messages in blocks -- avoid small single
    messages
  • Arrange data distributions to get nearest
    neighbor communications e.g. use ring shift with
    direct neighbors

4
Communication Model
  • Program with logical processor numbers

5
Communication Model
  • Latency Hiding use asynchronous messaging to
    overlap communication and computation
    (MPI_ISEND,MPI_IRECV)
  • Domain decomposition in solving grid problems
    Compute with first and communicate those
    while computing

6
Amdahls Law
  • Consider the execution of a program on p
    processors -- let the part q (0ltqlt1) of each
    operation be parallelized. Maximum speedup
  • spfalse t1/tp 1/ (q/p) (1-q)
  • Indicates the rapid loss of speedup if parallel
    fraction is not high enough as p increases
  • To get 50 efficiency i.e. 256 on 512 q 0.998

7
Amdahls Law
8
Amdahls Law
  • Why False in speedup ?
  • Assumed that no. of ops are same for sequential
    and parallel -- usually algorithms and data
    structures are different
  • Did not account for parallelization cost --
    communication and synchronization costs!
  • assumed that performance does not change for
    sequential/parallel code (diff. vector length ...)

9
Speeduphonest
  • sphon t1 for best seq. algo./tp for real
    parallel algo
  • t1../...hbas php (complex form -diff to
    use)
  • hp communication time that depends on p
  • p --gt infty
  • sphon --gt0

10
Scalability
  • There is an optimal number of processors for each
    problem
  • Fixed problem size with increasing numbers of
    processors is a poor use of parallel machine

11
Scalability
  • Increasing problem size with increasing numbers
    of processors leads to better use of parallel
    machine

12
Scalability
  • Now let problem size m--gtinfty as p --gtinfty

13
Scalability
  • Thus scalability is the desired measure of a
    parallel algoritthm/code and not speedup!
  • Scalability is achieved if the quantity
  • hpp/m is constant or increases very slowly as
    p increases
Write a Comment
User Comments (0)
About PowerShow.com