CSL718 : Multiprocessors - PowerPoint PPT Presentation

About This Presentation
Title:

CSL718 : Multiprocessors

Description:

Shared Memory with Caches. Multiple copies of data may exist. Problem of cache coherence ... a signal/message immediately, copy information only when unavoidable ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 30
Provided by: anshul8
Category:

less

Transcript and Presenter's Notes

Title: CSL718 : Multiprocessors


1
CSL718 Multiprocessors
  • Introduction
  • 13th April, 2006

2
Parallel Architectures
Flynns Classification 1966
Architecture Categories
3
MIMD
M
C
P
IS
IS
DS
C
P
IS
IS
DS
4
Parallel Architectures
Simas Classification
Parallel architectures PAs
5
Function Parallel Architectures
Function-parallel architectures
Built using general purpose processors
6
Issues from users perspective
  • Specification / Program design
  • explicit parallelism or
  • implicit parallelism parallelizing compiler
  • Partitioning / mapping to processors
  • Scheduling / mapping to time instants
  • static or dynamic
  • Communication and Synchronization

7
Parallelizing example
  • m0
  • for (i0 iltn i)
  • m m3
  • ai (amam1am2)/3
  • Can all iterations be done in parallel?
  • Dependence 1 m m 3
  • Dependence 2
  • a1 (a3a4a5)/3
  • a4 (a12a13a14)/3

8
Parallelizing example - contd.
  • Eliminate dependence based on induction variable
  • for (i0 iltn i)
  • m i3
  • ai (amam1am2)/3

9
Parallelizing example - contd.
  • Eliminate forward dependency using double buffer
  • for (i0 iltn i)
  • m i3
  • aai (amam1am2)/3
  • barrier( )
  • for (i0 iltn i)
  • ai aai

10
Parallelizing example - contd.
  • Parallelization using dynamic thread creation and
    scheduling
  • schedule(0)
  • for (i0 iltn i)
  • wait_till_scheduled(i)
  • m i3
  • ai (amam1am2)/3
  • if (i?0)schedule(3i)
  • schedule(3i1)
  • schedule(3i2)

11
Grain size and performance
Overhead limited
load imbalance and parallelism limited
Speed up
Fine grain
Opt grain size
Coarse grain
12
Speed up and efficiency
13
Amdahls Law
14
Generalization
15
Shared Memory Architecture
16
Design Space
  • Design Space of Shared Memory Architectures
  • Extent of address space sharing
  • Location of memory modules
  • Uniformity of memory access

17
Address Space
Each processor sees an exclusive address space
Each processor sees partly exclusive and
partly shared address space
Each processor sees same shared address space
18
Location of Memory
19
Clustered Architecture
M
M
M
M
M
M
M
M
P
P
P
P
P
P
P
P
Interconnection Network
Interconnection Network
M
M
M
M
M
M
Global Interconnection Network
M
M
M
20
Uniformity of Access
  • UMA (Uniform Memory Access)
  • Uniformity across memory address space
  • Uniformity across processors
  • NUMA (Non-Uniform Memory Access)
  • CC-NUMA (Cache Coherent NUMA)
  • COMA (Cache Only Memory Architecture)
  • UMA
  • Symmetrical Shared Memory Multiprocessor (SMP)
  • NUMA
  • Distributed Shared Memory Multiprocessor

21
Location and Sharing
SHARING full partial none
UMA
centralized
mixed
LOCATION
NUMA
distributed
22
Shared Memory with Caches
  • Multiple copies of data may exist
  • ? Problem of cache coherence
  • Cache coherence protocols
  • What action is taken?
  • Which processors/caches communicate?
  • Status of each block?

23
What action is taken?
  • Invalidate other caches and/or memory
  • send a signal/message immediately, copy
    information only when unavoidable
  • similar to write back policy
  • Update other caches and/or memory
  • write simultaneously at all places (send
    modifications immediately)
  • similar to write through policy

24
Which procs/caches communicate?
  • Snoopy protocol
  • broadcast invalidate or update messages
  • all processors snoop on the bus
  • Directory based protocol
  • maintain directory - list of copies
  • communicate selectively
  • directory - centralized (memory) or distributed
    (caches)

25
Status of each cache block?
  • valid/invalid private/shared clean/dirty
  • Simplest protocol (3 states)
  • Invalid, (shared) clean, private dirty
  • Berkeley protocol (4 states)
  • Invalid, (shared) clean, private dirty, shared
    dirty
  • Illinois, Firefly protocols (4 states)
  • Invalid, shared clean, private clean, private
    dirty
  • Dragon protocols (5 states)
  • Invalid, shared clean/dirty private clean/dirty

26
Simplest invalidation protocol
  • Use 3 states Invalid, shared clean, private
    dirty

invalid
clean shared?
dirty
27
Simplest invalidation protocol
  • Use 3 states Invalid, shared clean, private
    dirty

RD miss
invalid
clean shared?
WR
RD miss
WR miss
dirty
CPU event
BUS event
28
Simplest invalidation protocol
  • Use 3 states Invalid, shared clean, private
    dirty

invalid
clean shared?
WR miss, INV
RD miss
WR miss, INV
dirty
CPU event
BUS event
29
Simplest invalidation protocol
  • Use 3 states Invalid, shared clean, private
    dirty

RD miss
invalid
clean shared?
WR miss, INV
RD miss
WR miss, INV
WR
RD miss
WR miss
dirty
CPU event
BUS event
Write a Comment
User Comments (0)
About PowerShow.com