Title: CSL718 : Multiprocessors
1CSL718 Multiprocessors
- Introduction
- 13th April, 2006
2Parallel Architectures
Flynns Classification 1966
Architecture Categories
3MIMD
M
C
P
IS
IS
DS
C
P
IS
IS
DS
4Parallel Architectures
Simas Classification
Parallel architectures PAs
5Function Parallel Architectures
Function-parallel architectures
Built using general purpose processors
6Issues from users perspective
- Specification / Program design
- explicit parallelism or
- implicit parallelism parallelizing compiler
- Partitioning / mapping to processors
- Scheduling / mapping to time instants
- static or dynamic
- Communication and Synchronization
7Parallelizing example
- m0
- for (i0 iltn i)
- m m3
- ai (amam1am2)/3
-
- Can all iterations be done in parallel?
- Dependence 1 m m 3
- Dependence 2
- a1 (a3a4a5)/3
- a4 (a12a13a14)/3
8Parallelizing example - contd.
- Eliminate dependence based on induction variable
- for (i0 iltn i)
- m i3
- ai (amam1am2)/3
9Parallelizing example - contd.
- Eliminate forward dependency using double buffer
- for (i0 iltn i)
- m i3
- aai (amam1am2)/3
-
- barrier( )
- for (i0 iltn i)
- ai aai
10Parallelizing example - contd.
- Parallelization using dynamic thread creation and
scheduling - schedule(0)
- for (i0 iltn i)
- wait_till_scheduled(i)
- m i3
- ai (amam1am2)/3
- if (i?0)schedule(3i)
- schedule(3i1)
- schedule(3i2)
11Grain size and performance
Overhead limited
load imbalance and parallelism limited
Speed up
Fine grain
Opt grain size
Coarse grain
12Speed up and efficiency
13Amdahls Law
14Generalization
15Shared Memory Architecture
16Design Space
- Design Space of Shared Memory Architectures
- Extent of address space sharing
- Location of memory modules
- Uniformity of memory access
17Address Space
Each processor sees an exclusive address space
Each processor sees partly exclusive and
partly shared address space
Each processor sees same shared address space
18Location of Memory
19Clustered Architecture
M
M
M
M
M
M
M
M
P
P
P
P
P
P
P
P
Interconnection Network
Interconnection Network
M
M
M
M
M
M
Global Interconnection Network
M
M
M
20Uniformity of Access
- UMA (Uniform Memory Access)
- Uniformity across memory address space
- Uniformity across processors
- NUMA (Non-Uniform Memory Access)
- CC-NUMA (Cache Coherent NUMA)
- COMA (Cache Only Memory Architecture)
- UMA
- Symmetrical Shared Memory Multiprocessor (SMP)
- NUMA
- Distributed Shared Memory Multiprocessor
21Location and Sharing
SHARING full partial none
UMA
centralized
mixed
LOCATION
NUMA
distributed
22Shared Memory with Caches
- Multiple copies of data may exist
- ? Problem of cache coherence
- Cache coherence protocols
- What action is taken?
- Which processors/caches communicate?
- Status of each block?
23What action is taken?
- Invalidate other caches and/or memory
- send a signal/message immediately, copy
information only when unavoidable - similar to write back policy
- Update other caches and/or memory
- write simultaneously at all places (send
modifications immediately) - similar to write through policy
24Which procs/caches communicate?
- Snoopy protocol
- broadcast invalidate or update messages
- all processors snoop on the bus
- Directory based protocol
- maintain directory - list of copies
- communicate selectively
- directory - centralized (memory) or distributed
(caches)
25Status of each cache block?
- valid/invalid private/shared clean/dirty
- Simplest protocol (3 states)
- Invalid, (shared) clean, private dirty
- Berkeley protocol (4 states)
- Invalid, (shared) clean, private dirty, shared
dirty - Illinois, Firefly protocols (4 states)
- Invalid, shared clean, private clean, private
dirty - Dragon protocols (5 states)
- Invalid, shared clean/dirty private clean/dirty
26Simplest invalidation protocol
- Use 3 states Invalid, shared clean, private
dirty
invalid
clean shared?
dirty
27Simplest invalidation protocol
- Use 3 states Invalid, shared clean, private
dirty
RD miss
invalid
clean shared?
WR
RD miss
WR miss
dirty
CPU event
BUS event
28Simplest invalidation protocol
- Use 3 states Invalid, shared clean, private
dirty
invalid
clean shared?
WR miss, INV
RD miss
WR miss, INV
dirty
CPU event
BUS event
29Simplest invalidation protocol
- Use 3 states Invalid, shared clean, private
dirty
RD miss
invalid
clean shared?
WR miss, INV
RD miss
WR miss, INV
WR
RD miss
WR miss
dirty
CPU event
BUS event