Chapter 6: Multiprocessors Part I - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Chapter 6: Multiprocessors Part I

Description:

Serial time fixed (at s) Parallel time proportional to problem size (truth more complicated) ... Key: which communication model does hardware support best? ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 28
Provided by: sari158
Category:

less

Transcript and Presenter's Notes

Title: Chapter 6: Multiprocessors Part I


1
Chapter 6 Multiprocessors Part I
  • Introduction (Section 6.1)
  • What is a parallel or multiprocessor system?
  • Why parallel architecture?
  • Performance potential
  • Flynn classification
  • Communication models (Section 6.1)
  • Architectures (Section 6.1)
  • Centralized sharedmemory (Section 6.3)
  • Distributed sharedmemory (Section 6.5)
  • More in Part II

2
What is a parallel or multiprocessor system?
  • Multiple processor units working together to
    solve the same problem
  • Key architectural issue Communication model

3
Why parallel architectures?
  • Absolute performance
  • Scientific computing
  • Generalpurpose computing
  • Technology and architecture trends in
    highperformance computing
  • of transistors on chip growing rapidly
  • Clock rates expected to go up but slowly
  • Instructionlevel parallelism valuable but
    limited
  • Complex architectures
  • ? Coarserlevel parallelism as in MPs
  • Trend seen in products from AMD, Compaq, HP, IBM,
    Intel, SGI, SUN,

4
Why parallel architectures (Cont.)?
  • Costperformance
  • uPs have made massive gains in performance
  • clock rates, ILP, caches
  • These commodity uPs are cheap, many more are sold
    as compared to supercomputers
  • ? Multiprocessors made from uPs replacing
    traditional supercomputers

5
Performance Potential
  • Amdahl's Law is pessimistic
  • Let s be the serial part
  • Let p be the part that can be parallelized n ways
  • Serial SSPPPPPP
  • 6 processors SSP
  • P
  • P
  • P
  • P
  • P
  • Speedup 8/3 2.67
  • T(n)
  • As n ? ?, T(n) ?
  • Pessimistic

1 sp/n
1 s
6
Performance Potential (Cont.)
  • Gustafson's Corollary
  • Amdahl's law holds if run same problem size on
    larger machines
  • But, in practice, people run larger problems and
    ''wait'' the same time

7
Performance Potential (Cont.)
  • Gustafson's Corollary (Cont.)
  • Assume for larger problem sizes
  • Serial time fixed (at s)
  • Parallel time proportional to problem size (truth
    more complicated)
  • Old Serial SSPPPPPP
  • 6 processors SSPPPPPP
  • PPPPPP
  • PPPPPP
  • PPPPPP
  • PPPPPP
  • PPPPPP
  • Hypothetical Serial
  • SSPPPPPP PPPPPP PPPPPP PPPPPP PPPPPP PPPPPP
  • Speedup (856)/8 4.75
  • T'(n) s np T'(?) ? ?!!!!
  • How does your algorithm ''scale up''?

8
Flynn classification
  • SingleInstruction SingleData (SISD)
  • SingleInstruction MultipleData (SIMD)
  • MultipleInstruction SingleData (MISD)
  • MultipleInstruction MultipleData (MIMD)

9
Communication models
  • Sharedmemory
  • Message passing
  • Data parallel

10
Communication Models SharedMemory
P
P
P
interconnect
MMMMMMM
  • Each node a processor that runs a process
  • One shared memory
  • Accessible by any processor
  • The same address on two different processors
    refers to the same datum
  • Therefore, write and read memory to
  • Store and recall data
  • Communicate, Synchronize (coordinate)

11
Communication Models Message Passing
P M
P M
P M
interconnect
  • Each node a computer
  • Processor runs its own program (like SM)
  • Memory local to that node, unrelated to other
    memory
  • Add messages for internode communication, send
    and receive like mail

12
Communication Models Data Parallel
P M
P M
P M
interconnect
  • Virtual processor per datum
  • Write sequential programs with ''conceptual PC''
    and let parallelism be within the data (e.g.,
    matrices)
  • C A B
  • Typically SIMD architecture, but MIMD can be as
    effective

13
Architectures
  • All mechanisms can usually be synthesized by all
    hardware
  • Key which communication model does hardware
    support best?
  • All smallscale systems are sharedmemory

14
Which is Best Communication Model to Support?
  • Sharedmemory
  • Used in smallscale systems
  • Easier to program for dynamic data structures
  • Lower overhead communication for small data
  • Implicit movement of data with caching
  • Hard to build?
  • Messagepassing
  • Communication explicit harder to program?
  • Larger overheads in communication OS
    intervention?
  • Easier to build?

15
SharedMemory Architecture
The model
PROC
PROC
PROC
INTERCONNECT
MEMORY
  • For now, assume interconnect is a bus
    centralized architecture

16
Centralized SharedMemory Architecture
PROC
PROC
PROC
BUS
MEMORY
17
Centralized SharedMemory Architecture (Cont.)
  • For higher bandwidth (throughput)
  • For lower latency
  • Problem?

18
Centralized SharedMemory Architecture (Cont.)
  • For higher bandwidth (throughput)
  • For lower latency
  • Problem?

PROC
PROC
PROC
BUS
MEMORY
MEMORY
MEMORY
19
Centralized SharedMemory Architecture (Cont.)
  • For higher bandwidth (throughput)
  • For lower latency
  • Problem?

PROC
PROC
PROC
BUS
MEMORY
MEMORY
MEMORY
PROC
PROC
PROC
CACHE
CACHE
CACHE
BUS
MEMORY
MEMORY
MEMORY
20
Cache Coherence Problem
PROC 2
PROC 1
PROC n
CACHE
A
BUS
MEMORY
MEMORY
A
21
Cache Coherence Solutions
  • Snooping
  • Problem with centralized architecture

PROC 2
PROC 1
PROC n
CACHE
A
BUS
MEMORY
MEMORY
A
22
Distributed SharedMemory (DSM) Architecture
  • Use a higherbandwidth interconnection network
  • Uniform memory access architecture (UMA)

PROC 2
PROC 1
PROC n
CACHE
CACHE
CACHE
GENERAL INTERCONNECT
MEMORY
MEMORY
MEMORY
23
Distributed SharedMemory (DSM) - Cont.
  • For lower latency NonUniform Memory Access
    architecture (NUMA)

24
Distributed SharedMemory (DSM) -- Cont.
  • For lower latency NonUniform Memory Access
    architecture (NUMA)

PROC
PROC
PROC
MEM
MEM
MEM
CACHE
CACHE
CACHE
SWITCH/NETWORK
25
NonBus Interconnection Networks
  • Example interconnection networks

26
Distributed SharedMemory - Coherence Problem
  • Directory scheme
  • Level of indirection!

PROC
PROC
PROC
MEM
MEM
MEM
CACHE
CACHE
CACHE
SWITCH/NETWORK
27
Distributed SharedMemory - Coherence Problem
  • Directory scheme
  • Level of indirection!

PROC
PROC
PROC
MEM
MEM
MEM
CACHE
CACHE
CACHE
DIR
DIR
DIR
SWITCH/NETWORK
Write a Comment
User Comments (0)
About PowerShow.com