CS184c: Computer Architecture [Parallel and Multithreaded] - PowerPoint PPT Presentation

About This Presentation

Title:

CS184c: Computer Architecture [Parallel and Multithreaded]

Description:

... 2 FSMs = 16 state composite FSM Why? Scalablity compose more capable machine from building blocks compose from modular building blocks multiple chips Why? – PowerPoint PPT presentation

Number of Views:94

Avg rating:3.0/5.0

Slides: 48

Provided by: AndreD153

Learn more at: http://courses.cms.caltech.edu

Category:

more less

Transcript and Presenter's Notes

Title: CS184c: Computer Architecture [Parallel and Multithreaded]

1
CS184cComputer ArchitectureParallel and
Multithreaded

Day 1 April 3, 2001
Overview and Message Passing

2
Today

This Class
Why/Overview
Message Passing

3
CS184 Sequence

A - structure and organization
raw components, building blocks
design space
B - single threaded architecture
emphasis on abstractions and optimizations
including quantification
C - multithreaded architecture

4
Architecture
CS184b

attributes of a system as seen by the
programmer
conceptual structure and functional behavior
Defines the visible interface between the
hardware and software
Defines the semantics of the program (machine
code)

5
Conventional, Single-Threaded Abstraction
CS184b

Single, large, flat memory
sequential, control-flow execution
instruction-by-instruction sequential execution
atomic instructions
single-thread owns entire machine
byte addressability
unbounded memory, call depth

6
This Term

Different models of computation
different microarchitectures
Big Difference Parallelism
previously model was sequential
Mostly
Multiple Program Counters
threads of control

7
Architecture Instruction Taxonomy
CS184a
8
Why?

Why do we need a different model?
Different architecture?

9
Why?

Density
Superscalars scaling super-linear with increasing
instructions/cycle
cost from maintaining sequential model
dependence analysis
renaming/reordering
single memory/RF access
VLIW lack of model/scalability problem
Maybe theres a better way?

10
Consider
CS184a

Two network data ports
states idle, first-datum, receiving, closing
data arrival uncorrelated between ports

11
Instruction Control
CS184a

If FSMs advance orthogonally
(really independent control)
context depth gt product of states
for full partition
I.e. w/ single controller (PC)
must create product FSM
which may lead to state explosion
N FSMs, with S states gt SN product states
This example
4 states, 2 FSMs gt 16 state composite FSM

12
Why?

Scalablity
compose more capable machine from building
blocks
compose from modular building blocks
multiple chips

13
Why?

Expose/exploit parallelism better
saw non-local parallelism when looking at IPC
saw need for large memory to exploit

14
Models?

Message Passing (week 1)
Dataflow (week 2)
Shared Memory (week 3)
Data Parallel (week 4)
Multithreaded (week 5)
Interface Special and Heterogeneous functional
units (week 6)

15
Additional Key Issues

How Interconnect? (week 7-8)
Cope with defects and Faults? (week 9)

16
Message Passing
17
Message Passing

Simple extension to Models
Compute Model
Programming Model
Architecture
Low-level

18
Message Passing Model

Collection of sequential processes
Processes may communicate with each other
(messages)
send
receive
Each process runs sequentially
has own address space
Abstraction is each process gets own processor

19
Programming for MP

Have a sequential language
C, C, Fortran, lisp
Add primitives (system calls)
send
receive
spawn

20
Architecture for MP

Sequential Architecture for processing node
add network interfaces
process have own address space
Add network connecting
minimally sufficient...

21
MP Architecture Virtualization

Processes virtualize nodes
size independent/scalable
Virtual connections between processes
placement independent communication

22
MP Example and Performance Issues
23
N-Body Problem

Compute pairwise gravitational forces
Integrate positions

24
Coding

// params position, mass.
F0
For I 1 to N
send my params to pbodyI
get params from pbodyI
Fforce(my params, params)
Update pos, velocity
Repeat

25
Performance

Body Work cN
Cycle work cN2
Ideal Np processors cN2/Np

26
Performance Sequential

Body work
read N values
compute N force updates
compute pos/vel from F and params
ct(read value) t(compute force)

27
Performance MP

Body work
send N messages
receive N messages
compute N force updates
compute pos/vel from F and params
ct(send message) t(receive message)
t(compute force)

28
Send/receive

t(receive)
wait on message delivery
swap to kernel
copy data
return to process
t(send)
similar
t(send), t(receive) gtgt t(read value)

29
Sequential vs. MP

Tseq cseq N2
TmpcmpN2/Np
Speedup Tseq/Tmp cseq ? Np /cmp
Assuming no waiting
cseq /cmp t(read value) / (t(send) t(rcv))

30
Waiting?

Shared bus interconnect
wait O(N) time for N sends (receives) across the
machine
Non-blocking interconnect
wait L(net) time after message send to receive
if insufficient parallelism
latency dominate performance

31
Dertouzous Latency Bound

Speedup Upper Bound
processes / Latency

32
Waiting data availability

Also wait for data to be sent

33
Coding/Waiting

For I 1 to N
send my params to pbodyI
get params from pbodyI
Fforce(my params, params)
How long processsor I wait for first datum?
Parallelism profile?

34
More Parallelism

For I 1 to N
send my params to pbodyI
For I 1 to N
get params from pbodyI
Fforce(my params, params)

35
Queuing?

For I 1 to N
send my params to pbodyI
get params from pbodyI
Fforce(my params, params)
No queuing?
Queuing?

36
Dispatching

Multiple processes on node
Who to run?
Can a receive block waiting?

37
Dispatching

Abstraction is each process gets own processor
If receive blocks (holds processor)
may prevent another process from running upon
which it depends
Consider 2-body problem on 1 node

38
Seitz Coding

see reading

39
MP Issues
40
Expensive Communication

Process to process communication goes through
operating system
system call, process switch
exit processor, network, enter processor
system call, processes switch
Milliseconds?
Thousands of cycles...

41
Why OS involved?

Protection/Isolation
can this process send/receive with this other
process?
Translation
where does this message need to go?
Scheduling
who can/should run now?

42
Issues

Process Placement
locality
load balancing
Cost for excessive parallelism
E.g. N-body on Np lt N processor ?
Message hygiene
ordering, single delivery, buffering
Deadlock
user introduce, system introduce

43
Low-Level Model

Places burden on user too much
decompose problem explicitly
sequential chunk size not abstract
scale weakness in architecture
guarantee correctness in face of non-determinism
placement/load-balancing
in some systems
Gives considerable explicit control

44
Low-Level Primitives

Has the necessary primitives for multiprocessor
cooperation
Maybe an appropriate compiler target?
Architecture model, but not programming/compute
model?

45
Announcements

Note CS25 next Monday/Tuesday
Seitz speaking on Tuesday
Dally speaking on Monday
(also Mead)
even DeHon -)
Changing schedule (already)
Network Interface bumped up to next Mon.
von Eicken et. Al., Active Messages
Henry and Joerg, Tightly couple P-NI

46
Big Ideas

Value of Architectural Abstraction
Sequential abstraction
limits implementation freedom
requires large cost to support
semantic mismatch between model and execution
Parallel models expose more opportunities

47
Big Ideas

MP has minimal primitives
appropriate low-level model
too raw/primitive for user model
Communication essential component
can be expensive
doing well is necessary to get good performance
(come out ahead)
watch OS cost...

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

CS184c: Computer Architecture [Parallel and Multithreaded] PowerPoint PPT Presentation

CS184c: Computer Architecture [Parallel and Multithreaded] - basic op is single cycle: expfu (rfuop) no state. could conceivably have multiple PFUs? ... controls a number of more basic operations. Some difference in expectation ... | PowerPoint PPT presentation | free to view

CS184c: Computer Architecture [Parallel and Multithreaded] - Intra-Frame Scheduling. Simple (local) stack of pending threads. Fork places new PC on stack ... Run each round-robin. CALTECH cs184c Spring2001 -- DeHon. HEP Pipeline ... | PowerPoint PPT presentation | free to view

CS 258 Parallel Computer Architecture Lecture 3 Introduction to Scalable Interconnection Network Design PowerPoint PPT Presentation

CS 258 Parallel Computer Architecture Lecture 3 Introduction to Scalable Interconnection Network Design - Parallel Computer Architecture. Lecture 3 ... phit (physical unit) data transferred per cycle. flit - basic unit of flow-control ... | PowerPoint PPT presentation | free to view

CSE 502 Graduate Computer Architecture Lec 10 11 PowerPoint PPT Presentation

CSE 502 Graduate Computer Architecture Lec 10 11 - Computer Architecture Lec 10+11 More Instruction Level Parallelism Via Speculation Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu ... | PowerPoint PPT presentation | free to view

CS184a: Computer Architecture (Structures and Organization) PowerPoint PPT Presentation

CS184a: Computer Architecture (Structures and Organization) - CS184a: Computer Architecture (Structures and Organization) Day1: September 25, 2000 Introduction and Overview Today Matter Computes Architecture Matters This Course ... | PowerPoint PPT presentation | free to view

William Stallings Computer Organization and Architecture 6th Edition PowerPoint PPT Presentation

William Stallings Computer Organization and Architecture 6th Edition - William Stallings Computer Organization and Architecture 6th Edition Chapter 1 Introduction Architecture & Organization 1 Architecture is those attributes visible to ... | PowerPoint PPT presentation | free to view

CS 252 Graduate Computer Architecture Lecture 5: Instruction-Level Parallelism (Part 2) PowerPoint PPT Presentation

CS 252 Graduate Computer Architecture Lecture 5: Instruction-Level Parallelism (Part 2) - Title: EECS 252 Graduate Computer Architecture Lec XX - TOPIC Last modified by: Krste Asanovic Created Date: 2/8/2005 3:17:21 AM Document presentation format | PowerPoint PPT presentation | free to view

CS 252 Graduate Computer Architecture Lecture 7: Vector Computers PowerPoint PPT Presentation

CS 252 Graduate Computer Architecture Lecture 7: Vector Computers - Title: EECS 252 Graduate Computer Architecture Lec XX - TOPIC Last modified by: Krste Asanovic Created Date: 2/8/2005 3:17:21 AM Document presentation format | PowerPoint PPT presentation | free to view

CSE 8383 - Advanced Computer Architecture PowerPoint PPT Presentation

CSE 8383 - Advanced Computer Architecture - Title: Introduction To Parallel Processors Author: rewini Last modified by: rewini Created Date: 3/5/2001 10:21:45 PM Document presentation format | PowerPoint PPT presentation | free to view

William Stallings Computer Organization and Architecture 8th Edition PowerPoint PPT Presentation

William Stallings Computer Organization and Architecture 8th Edition - William Stallings Computer Organization and Architecture 8th Edition Chapter 14 Instruction Level Parallelism and Superscalar Processors What is Superscalar? | PowerPoint PPT presentation | free to view

William Stallings Computer Organization and Architecture 6th Edition - William Stallings Computer Organization and Architecture 6th Edition Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data ... | PowerPoint PPT presentation | free to view

New Directions in Computer Architecture PowerPoint PPT Presentation

New Directions in Computer Architecture - Outline Desktop/Server Microprocessor State of the Art Mobile Multimedia Computing as New ... Trends Affecting New ... edu/papers/direction/paper ... | PowerPoint PPT presentation | free to view

From EARTH to HTMT: The Evolution of a Multithreaded Architecture Model PowerPoint PPT Presentation

From EARTH to HTMT: The Evolution of a Multithreaded Architecture Model - From EARTH to HTMT: The Evolution of a Multithreaded Architecture Model Guang R. Gao Computer Architecture & Parallel Systems Laboratory (CAPSL) University of Delaware | PowerPoint PPT presentation | free to view

CSCI 4717/5717 Computer Architecture PowerPoint PPT Presentation

CSCI 4717/5717 Computer Architecture - CSCI 4717/5717 Computer Architecture Topic: Storage Media Reading: Stallings, Chapter 6 Types of External Memory Magnetic Disk RAID Removable Optical CD-ROM CD ... | PowerPoint PPT presentation | free to view

Exploiting Multithreaded Architectures to Improve Data Management Operations PowerPoint PPT Presentation

Exploiting Multithreaded Architectures to Improve Data Management Operations - Exploiting Multithreaded Architectures to Improve Data Management Operations Layali Rashid The Advanced Computer Architecture Group @ U of C (ACAG) | PowerPoint PPT presentation | free to view

CMPT 886: Special Topics in Operating Systems and Computer Architecture PowerPoint PPT Presentation

CMPT 886: Special Topics in Operating Systems and Computer Architecture - School of Computing Science. SFU. SYNARSystems Networking and Architecture Group. Meet the Instructor. ... The Multicore Revolution. Most new processors are multicore. | PowerPoint PPT presentation | free to view

15-740/18-740 Computer Architecture Lecture 4: Pipelining PowerPoint PPT Presentation

15-740/18-740 Computer Architecture Lecture 4: Pipelining - 15-740/18-740 Computer Architecture Lecture 4: Pipelining Prof. Onur Mutlu Carnegie Mellon University | PowerPoint PPT presentation | free to view

A Parallel Architecture for the Generalized Traveling Salesman Problem PowerPoint PPT Presentation

A Parallel Architecture for the Generalized Traveling Salesman Problem - A Parallel Architecture for the Generalized Traveling Salesman Problem Max Scharrenbroich AMSC 663 Mid-Year Progress Report Advisor: Dr. Bruce L. Golden | PowerPoint PPT presentation | free to view

EECS 252 Graduate Computer Architecture Lec 12 PowerPoint PPT Presentation

EECS 252 Graduate Computer Architecture Lec 12 - Lec 12 [Removed: Vector Wrap-up] Multiprocessor Introduction David Patterson Electrical Engineering and Computer Sciences University of California, Berkeley | PowerPoint PPT presentation | free to view

EECS 252 Graduate Computer Architecture Lec 12 - Lec 12 Vector Wrap-up and Multiprocessor Introduction David Patterson Electrical Engineering and Computer Sciences University of California, Berkeley | PowerPoint PPT presentation | free to view

EECS 252 Graduate Computer Architecture Lec 15 PowerPoint PPT Presentation

EECS 252 Graduate Computer Architecture Lec 15 - Lec 15 T1 ( Niagara ) and Papers Discussion David Patterson Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs ... | PowerPoint PPT presentation | free to view

IIT CS570 Graduate Advenced Computer Architecture PowerPoint PPT Presentation

IIT CS570 Graduate Advenced Computer Architecture - Title: IIT CS570 Graduate Advenced Computer Architecture Author: David Last modified by: sun Created Date: 2/8/2005 3:17:21 AM Document presentation format | PowerPoint PPT presentation | free to view

Introduction to Computer Systems and Performance PowerPoint PPT Presentation

Introduction to Computer Systems and Performance - Chapter 1 Introduction to Computer Systems and Performance CS.216 Computer Architecture and Organization | PowerPoint PPT presentation | free to view

Caches for Parallel Architectures (Coherence) PowerPoint PPT Presentation

Caches for Parallel Architectures (Coherence) - Caches for Parallel Architectures (Coherence) Figures, examples Parallel Computer Architecture: A Hardware/Software Approach, D. E. Culler, J. P. Singh, Morgan ... | PowerPoint PPT presentation | free to view

CSE 502 Graduate Computer Architecture Lec 16 17,19 20 PowerPoint PPT Presentation

CSE 502 Graduate Computer Architecture Lec 16 17,19 20 - Lec 16+17,19+20 Symmetric MultiProcessing Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from David ... | PowerPoint PPT presentation | free to view

CSE 502 Graduate Computer Architecture Lec 16-18 PowerPoint PPT Presentation

CSE 502 Graduate Computer Architecture Lec 16-18 - Lec 16-18 Symmetric MultiProcessing Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from David ... | PowerPoint PPT presentation | free to view

18-742 Parallel Computer Architecture Lecture 11: Caching in Multi-Core Systems PowerPoint PPT Presentation

18-742 Parallel Computer Architecture Lecture 11: Caching in Multi-Core Systems - 18-742 Parallel Computer Architecture Lecture 11: Caching in Multi-Core Systems Prof. Onur Mutlu and Gennady Pekhimenko Carnegie Mellon University | PowerPoint PPT presentation | free to view