Chapter 6: Multiprocessors Part I - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Chapter 6: Multiprocessors Part I

Description:

Serial time fixed (at s) Parallel time proportional to problem size (truth more complicated) ... Key: which communication model does hardware support best? ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 28

Provided by: sari158

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 6: Multiprocessors Part I

1
Chapter 6 Multiprocessors Part I

Introduction (Section 6.1)
What is a parallel or multiprocessor system?
Why parallel architecture?
Performance potential
Flynn classification
Communication models (Section 6.1)
Architectures (Section 6.1)
Centralized sharedmemory (Section 6.3)
Distributed sharedmemory (Section 6.5)
More in Part II

2
What is a parallel or multiprocessor system?

Multiple processor units working together to
solve the same problem
Key architectural issue Communication model

3
Why parallel architectures?

Absolute performance
Scientific computing
Generalpurpose computing
Technology and architecture trends in
highperformance computing
of transistors on chip growing rapidly
Clock rates expected to go up but slowly
Instructionlevel parallelism valuable but
limited
Complex architectures
? Coarserlevel parallelism as in MPs
Trend seen in products from AMD, Compaq, HP, IBM,
Intel, SGI, SUN,

4
Why parallel architectures (Cont.)?

Costperformance
uPs have made massive gains in performance
clock rates, ILP, caches
These commodity uPs are cheap, many more are sold
as compared to supercomputers
? Multiprocessors made from uPs replacing
traditional supercomputers

5
Performance Potential

Amdahl's Law is pessimistic
Let s be the serial part
Let p be the part that can be parallelized n ways
Serial SSPPPPPP
6 processors SSP
P
P
P
P
P
Speedup 8/3 2.67
T(n)
As n ? ?, T(n) ?
Pessimistic

1 sp/n
1 s
6
Performance Potential (Cont.)

Gustafson's Corollary
Amdahl's law holds if run same problem size on
larger machines
But, in practice, people run larger problems and
''wait'' the same time

7
Performance Potential (Cont.)

Gustafson's Corollary (Cont.)
Assume for larger problem sizes
Serial time fixed (at s)
Parallel time proportional to problem size (truth
more complicated)
Old Serial SSPPPPPP
6 processors SSPPPPPP
PPPPPP
PPPPPP
PPPPPP
PPPPPP
PPPPPP
Hypothetical Serial
SSPPPPPP PPPPPP PPPPPP PPPPPP PPPPPP PPPPPP
Speedup (856)/8 4.75
T'(n) s np T'(?) ? ?!!!!
How does your algorithm ''scale up''?

8
Flynn classification

SingleInstruction SingleData (SISD)
SingleInstruction MultipleData (SIMD)
MultipleInstruction SingleData (MISD)
MultipleInstruction MultipleData (MIMD)

9
Communication models

Sharedmemory
Message passing
Data parallel

10
Communication Models SharedMemory
P
P
P
interconnect
MMMMMMM

Each node a processor that runs a process
One shared memory
Accessible by any processor
The same address on two different processors
refers to the same datum
Therefore, write and read memory to
Store and recall data
Communicate, Synchronize (coordinate)

11
Communication Models Message Passing
P M
P M
P M
interconnect

Each node a computer
Processor runs its own program (like SM)
Memory local to that node, unrelated to other
memory
Add messages for internode communication, send
and receive like mail

12
Communication Models Data Parallel
P M
P M
P M
interconnect

Virtual processor per datum
Write sequential programs with ''conceptual PC''
and let parallelism be within the data (e.g.,
matrices)
C A B
Typically SIMD architecture, but MIMD can be as
effective

13
Architectures

All mechanisms can usually be synthesized by all
hardware
Key which communication model does hardware
support best?
All smallscale systems are sharedmemory

14
Which is Best Communication Model to Support?

Sharedmemory
Used in smallscale systems
Easier to program for dynamic data structures
Lower overhead communication for small data
Implicit movement of data with caching
Hard to build?
Messagepassing
Communication explicit harder to program?
Larger overheads in communication OS
intervention?
Easier to build?

15
SharedMemory Architecture
The model
PROC
PROC
PROC
INTERCONNECT
MEMORY

For now, assume interconnect is a bus
centralized architecture

16
Centralized SharedMemory Architecture
PROC
PROC
PROC
BUS
MEMORY
17
Centralized SharedMemory Architecture (Cont.)

For higher bandwidth (throughput)
For lower latency
Problem?

18
Centralized SharedMemory Architecture (Cont.)

For higher bandwidth (throughput)
For lower latency
Problem?

PROC
PROC
PROC
BUS
MEMORY
MEMORY
MEMORY
19
Centralized SharedMemory Architecture (Cont.)

For higher bandwidth (throughput)
For lower latency
Problem?

PROC
PROC
PROC
BUS
MEMORY
MEMORY
MEMORY
PROC
PROC
PROC
CACHE
CACHE
CACHE
BUS
MEMORY
MEMORY
MEMORY
20
Cache Coherence Problem
PROC 2
PROC 1
PROC n
CACHE
A
BUS
MEMORY
MEMORY
A
21
Cache Coherence Solutions