COMP 206: Computer Architecture and Implementation - PowerPoint PPT Presentation

About This Presentation

Title:

COMP 206: Computer Architecture and Implementation

Description:

single CPU: executing multiple processes ('threads' ... Threads: multiple processes that share code and data ... slows down the execution of individual threads ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 16

Provided by: Montek5

Learn more at: http://www.cs.unc.edu

Category:

Tags: comp | architecture | computer | implementation | threads

Transcript and Presenter's Notes

Title: COMP 206: Computer Architecture and Implementation

1
COMP 206Computer Architecture and Implementation

Montek Singh
Mon, Dec 5, 2005
Topic Intro to Multiprocessors and Thread-Level
Parallelism

2
Outline

Motivation
Multiprocessors
SISD, SIMD, MIMD, and MISD
Memory organization
Communication mechanisms
Multithreading
Reading HP3 6.1, 6.3 (snooping), and 6.9

3
Motivation

Instruction-Level Parallelism (ILP) What all we
have covered so far
simple pipelining
dynamic scheduling scoreboarding and Tomasulos
alg.
dynamic branch prediction
multiple-issue architectures superscalar, VLIW
hardware-based speculation
compiler techniques and software approaches
Bottom line There just arent enough
instructions that can actually be executed in
parallel!
instruction issue limit on maximum issue count
branch prediction imperfect
registers finite
functional units limited in number
data dependencies hard to detect dependencies
via memory

4
So, What do we do?

Key Idea Increase number of running processes
multiple processes at a given point in time
i.e., at the granularity of one (or a few) clock
cycles
not sufficient to have multiple processes at the
OS level!
Two Approaches
multiple CPUs each executing a distinct
process
Multiprocessors or Parallel Architectures
single CPU executing multiple processes
(threads)
Multi-threading or Thread-level parallelism

5
Taxonomy of Parallel Architectures

Flynns Classification
SISD Single instruction stream, single data
stream
uniprocessor
SIMD Single instruction stream, multiple data
streams
same instruction executed by multiple processors
each has its own data memory
Ex multimedia processors, vector architectures
MISD Multiple instruction streams, single data
stream
successive functional units operate on the same
stream of data
rarely found in general-purpose commercial
designs
special-purpose stream processors (digital
filters etc.)
MIMD Multiple instruction stream, multiple data
stream
each processor has its own instruction and data
streams
most popular form of parallel processing
single-user high-performance for one
application
multiprogrammed running many tasks
simultaneously (e.g., servers)

6
Multiprocessor Memory Organization

Centralized, shared-memory multiprocessor
usually few processors
share single memory bus
use large caches

7
Multiprocessor Memory Organization

Distributed-memory multiprocessor
can support large processor counts
cost-effective way to scale memory bandwidth
works well if most accesses are to local memory
node
requires interconnection network
communication between processors becomes more
complicated, slower

8
Multiprocessor Hybrid Organization

Use distributed-memory organization at top level
Each node itself may be a shared-memory
multiprocessor (2-8 processors)

9
Communication Mechanisms

Shared-Memory Communication
around for a long time, so well understood and
standardized
memory-mapped
ease of programming when communication patterns
are complex or dynamically varying
better use of bandwidth when items are small
Problem cache coherence harder
use Snoopy and other protocols
Message-Passing Communication
simpler hardware because keeping caches coherent
is easier
communication is explicit, simpler to understand
focusses programmer attention on communication
synchronization naturally associated with
communication
fewer errors due to incorrect synchronization

10
Multithreading

Threads multiple processes that share code and
data (and much of their address space)
recently, the term has come to include processes
that may run on different processors and even
have disjoint address spaces, as long as they
share the code
Multithreading exploit thread-level parallelism
within a processor
fine-grain multithreading
switch between threads on each instruction!
coarse-grain multithreading
switch to a different thread only if current
thread has a costly stall
E.g., switch only on a level-2 cache miss

11
Multithreading

Fine-grain multithreading
switch between threads on each instruction!
multiple threads executed in interleaved manner
interleaving is usually round-robin
CPU must be capable of switching threads on every
cycle!
fast, frequent switches
main disadvantage
slows down the execution of individual threads
that is, traded off latency for better throughput

12
Multithreading

Coarse-grain multithreading
switch only if current thread has a costly stall
E.g., level-2 cache miss
can accommodate slightly costlier switches
less likely to slow down an individual thread
a thread is switched off only when it has a
costly stall
main disadvantage
limited in ability to overcome throughput losses
shorter stalls are ignored, and there may be
plenty of those
issues instructions from a single thread
every switch involves emptying and restarting the
instruction pipeline

13
Simultaneous Multithreading (SMT)

Example new Pentium with Hyperthreading
Key Idea Exploit ILP across multiple threads!
i.e., convert thread-level parallelism into more
ILP
exploit following features of modern processors
multiple functional units
modern processors typically have more functional
units available than a single thread can utilize
register renaming and dynamic scheduling
multiple instructions from independent threads
can co-exist and co-execute!

14
SMT Illustration (Fig. 6.44 HP3)

A superscalar processor with no multithreading
A superscalar processor with coarse-grain
multithreading
A superscalar processor with fine-grain
multithreading
A superscalar processor with simultaneous
multithreading (SMT)

15
SMT Design Challenges

Dealing with a large register file
needed to hold multiple contexts
Maintaining low overhead on clock cycle
fast instruction issue choosing what to issue
instruction commit choosing what to commit
keeping cache conflicts within acceptable bounds

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - Title: Lecture 2 Author: Montek Singh Last modified by: Montek Singh Created Date: 3/13/2000 2:52:39 AM Document presentation format: Letter Paper (8.5x11 in) | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - Title: Lecture 8 Author: Montek Singh Last modified by: Montek Singh Created Date: 3/13/2000 2:52:39 AM Document presentation format: Letter Paper (8.5x11 in) | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - Title: Lecture 6 Author: Montek Singh Last modified by: Montek Singh Created Date: 3/13/2000 2:52:39 AM Document presentation format: Letter Paper (8.5x11 in) | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - Control and buffers distributed with Function Units versus centralized in ... Registers in instructions replaced by pointers to reservation station buffer ... | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - Break 20-bit VPN into two 10-bit parts. VPN ... Dirty Bits and TLB: Two Solutions. TLB is ... Dirty bit present in both TLB and page table in MM. On first ... | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - Data copied immediately (through register bus) into reservation station. Tag field of RS set to 0 ... Tomasulo Example Cycle 0. System is quiescent. 11 ... | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - Title: Lecture 11 Author: Montek Singh Last modified by: Dept of Computer Science Created Date: 3/13/2000 2:52:39 AM Document presentation format | PowerPoint PPT presentation | free to view

COMP 206 Computer Architecture and Implementation Unit 8b: Cache Misses PowerPoint PPT Presentation

COMP 206 Computer Architecture and Implementation Unit 8b: Cache Misses - Fall 2000. Siddhartha Chatterjee. 15 ... Loop Fusion. Combine two independent loops that have same looping and some variables overlap ... | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - Title: Lecture 4 Author: Montek Singh Last modified by: UNC-CS Created Date: 3/13/2000 2:52:39 AM Document presentation format: Letter Paper (8.5x11 in) | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 10, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Tomasulo s Algorithm) | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - uses this information to dynamically schedule instructions ... blank when no pending instructions will write that register. 9. Scoreboard Example Cycle 0 ... | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - Mainframes typically have 3.7 GB of disk storage per MIPS ... Reliability, availability, dependability, etc. are the key terms ... | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - 'Make the common case fast' ... 'Make The Common Case Fast' ... Let's say that we want to make the same relative change to one or the ... | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - Amdahl's law (make the common case fast) Performance Metrics. MIPS, FLOPS, and all that... Amdahl was demonstrating 'the continued validity of the single processor ... | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - MM copy stale while cache copy Dirty. Inconsistency of no concern if no one reads/writes MM copy ... Evict dirty block to MM. Put Bus Read Miss on bus ... | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - 1. COMP 206: Computer Architecture and Implementation ... Rb. Rw. RegWr. ExtOp=1. Exec. Unit. busA. busB. Imm16. ALUOp=Add. ALUSrc=1. Mux. 1. 0. MemtoReg ... | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 7, 2005 Lecture 3 Outline Quantitative Principles of Computer Design Amdahl s law (make ... | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - That is, benchmarks should not be individually normalized first.' 20 ... If benchmarks/summary inadequate, then choose between improving product for real ... | PowerPoint PPT presentation | free to view

COMP 206 Computer Architecture and Implementation Unit 8b: Cache Misses PowerPoint PPT Presentation

COMP 206 Computer Architecture and Implementation Unit 8b: Cache Misses - COMP 206 Computer Architecture and Implementation Unit 8b: Cache Misses Siddhartha Chatterjee Fall 2000 Cache Performance Block Size Tradeoff In general, larger block ... | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - RAID. Redundant Array of Inexpensive Disks (original 1988 acronym) ... RAID-4. Coarse-grained striping with parity. Unlike RAID-3, not all disks need to be ... | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - COMP 206: Computer Architecture and Implementation | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - COMP 206: Computer Architecture and Implementation | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - That is, benchmarks should not be individually normalized first.' 21 ... If benchmarks/summary inadequate, then choose between improving product for real ... | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - 'Make the common case fast' ... 'Make The Common Case Fast' ... Let's say that we want to make the same relative change to one or the ... | PowerPoint PPT presentation | free to view

COMP 206 Computer Architecture and Implementation Unit 8a: Basics of Caches PowerPoint PPT Presentation

COMP 206 Computer Architecture and Implementation Unit 8a: Basics of Caches - Siddhartha Chatterjee. 2. The Five Classic Components of a Computer. This unit: Memory System ... Siddhartha Chatterjee. 4. The Principle of Locality ... | PowerPoint PPT presentation | free to view

Security Across the Computer Science Curriculum PowerPoint PPT Presentation

Security Across the Computer Science Curriculum - Security Across the Computer Science Curriculum L. Felipe Perrone perrone@bucknell.edu Dept. of Computer Science Bucknell University | PowerPoint PPT presentation | free to view

COMP 206: Computer Architecture and Implementation PowerPoint PPT Presentation

COMP 206: Computer Architecture and Implementation - Loop Fusion. Combine two independent loops that have same looping and some variables overlap ... Loop Fusion Example. Before: 2 misses per access to a and c ... | PowerPoint PPT presentation | free to view