Parallel Architecture Models - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Parallel Architecture Models

Description:

Dual/Quad Pentium, Cray T90, IBM Power3 Node. Distributed Memory. Cray T3E, IBM SP2, Network of Workstations. Distributed-Shared Memory ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 18
Provided by: vishsubr
Category:

less

Transcript and Presenter's Notes

Title: Parallel Architecture Models


1
Parallel Architecture Models
  • Shared Memory
  • Dual/Quad Pentium, Cray T90, IBM Power3 Node
  • Distributed Memory
  • Cray T3E, IBM SP2, Network of Workstations
  • Distributed-Shared Memory
  • SGI Origin 2000, Convex Exemplar

2
Shared Memory Systems (SMP)
- Any processor can access any memory location at
equal cost (Symmetric Multi-Processor) - Tasks
communicate by writing/reading common
locations - Easier to program - Cannot scale
beyond around 30 PE's (bus bottleneck) - Most
workstation vendors make SMP's today (SGI, Sun,
HP Digital Pentium) -Cray Y-MP, C90, T90
(cross-bar between PE's and memory)
3
Cache Coherence in SMPs
- Each procs cache holds most recently accessed
values - If multiply cached word is modified,
need to make all copies consistent - Bus-based
SMPs use an efficient mechanism snoopy bus -
Snoopy bus monitors all writes marks other
copies invalid - When proc finds invalid cache
word, fetches copy from SM
4
Distributed Memory Systems
M Memory c Cache P Processor NIC Network
Interface Card
Interconnection Network
- Each processor can only access its own memory -
Explicit communication by sending and receiving
messages - More tedious to program - Can scale to
hundreds/thousands of processors - Cache
coherence is not needed - Examples IBM SP-2,
Cray T3E, Workstation Clusters
5
Distributed Shared Memory
- Each processor can directly access any memory
location - Physically distributed memory many
simultaneous accesses - Non-uniform memory access
costs - Examples Convex Exemplar, SGI Origin
2000 - Complex hardware and high cost for cache
coherence - Software DSM systems (e.g.
Treadmarks) implement shared memory abstraction
on top of Distributed Memory Systems
6
Parallel Programming Models
  • Shared-Address Space Models
  • BSP (Bulk Synchronous Parallel model)
  • HPF (High Performance Fortran)
  • OpenMP
  • Message Passing
  • Partitioned address space PVM, MPI Ch.8,
    I.Fosters book Designing and Building Parallel
    Programs (available online)
  • Higher Level Programming Environments
  • PETSc Portable Extensible Toolkit for Scientific
    computation
  • POOMA Parallel Object-Oriented Methods and
    Applications

7
OpenMP
  • Standard sequential Fortran/C model
  • Single global view of data
  • Automatic parallelization by compiler
  • User can provide loop-level directives
  • Easy to program
  • Only available on Shared-Memory Machines

8
High Performance Fortran
  • Global shared address space, similar to
    sequential programming model
  • User provides data mapping directives
  • User can provide information on loop-level
    parallelism
  • Portable available on all three types of
    architectures
  • Compiler automatically synthesizes
    message-passing code if needed
  • Restricted to dense arrays and regular
    distributions
  • Performance is not consistently good

9
Message Passing
  • Program is a collection of tasks
  • Each task can only read/write its own data
  • Tasks communicate data by explicitly
    sending/receiving messages
  • Need to translate from global shared view to
    local partitioned view in porting a sequential
    program
  • Tedious to program/debug
  • Very good performance

10
Illustrative Example
Real a(n,n),b(n,n) Do k 1,NumIter Do i
2,n-1 Do j 2,n-1 a(i,j)(b(i-1,j)b(i,j-1)
b(i1,j)b(i,j1))/4 End
Do End Do Do i 2,n-1 Do j 2,n-1
b(i,j) a(i,j) End Do End Do End Do
a(20,20)
b(20,20)
11
Example OpenMP
Real a(n,n),b(n,n) comp parallel shared(a,b,k)
private(i,j) Do k 1,NumIter comp do Do i
2,n-1 Do j 2,n-1 a(i,j)(b(i-1,j)b(i,j-1)
b(i1,j)b(i,j1))/4 End Do
End Do comp do Do i 2,n-1 Do j 2,n-1
b(i,j) a(i,j) End Do End Do End Do
Global shared view of data
a(20,20)
b(20,20)
12
Example HPF (1D partition)
Real a(n,n),b(n,n) chpf Distribute a(block,),
b(block,) Do k 1,NumIter chpf independent,
new(i) Do i 2,n-1 Do j 2,n-1
a(i,j)(b(i-1,j)b(i,j-1)
b(i1,j)b(i,j1))/4 End Do End Do chpf
independent , new(i) Do i 2,n-1 Do j
2,n-1 b(i,j) a(i,j) End Do End Do End Do
Global shared view of data
P0
P1
P2
P3
a(20,20)
b(20,20)
13
Example HPF (2D partition)
Real a(n,n),b(n,n) chpf Distribute
a(block,block) chpf Distribute b(block,block) Do
k 1,NumIter chpf independent, new(i) Do i
2,n-1 Do j 2,n-1 a(i,j)(b(i-1,j)b(i,j-1)
b(i1,j)b(i,j1))/4 End Do
End Do chpf independent , new(i) Do i 2,n-1
Do j 2,n-1 b(i,j) a(i,j) End Do End
Do End Do
Global shared view of data
a(20,20)
b(20,20)
14
Message Passing Local View
communication required
bl(5,20)
Global shared view
Local partitioned view
15
Example Message Passing
Real al(NdivP,n),bl(0NdivP1,n) me
get_my_procnum() Do k 1,NumIter if
(meP-1) send(me1,bl(NdivP,1n)) if (me0)
recv(me-1,bl(0,1n)) if (me0)
send(me-1,bl(1,1n)) if (meP-1)
recv(me1,bl(NdivP1,1n)) if (me0) then i12
else i11 if (meP-1) then i2NdivP-1 else
i2NdivP Do i i1,i2 Do j 2,n-1
a(i,j)(b(i-1,j)b(i,j-1)
b(i1,j)b(i,j1))/4 End Do End Do
...
al(5,20)
ghost cells are communicated by message-passing
Local partitioned view with ghost cells
16
Comparison of Models
  • Program Porting/Development Effort
  • OpenMP HPF ltlt MPI
  • Portability across systems
  • HPF MPI gtgt OpenMP (only shared-memory)
  • Applicability
  • MPI OpenMP gtgt HPF (only dense arrays)
  • Performance
  • MPI gt OpenMP gtgt HPF

17
PETSc
  • Higher level parallel programming model
  • Aims to provide both ease of use and high
    performance for numerical PDE solution
  • Uses efficient message-passing implementation
    underneath but
  • Provides global view of data arrays
  • System takes care of needed message-passing
  • Portable across shared distributed memory
    systems
Write a Comment
User Comments (0)
About PowerShow.com