COMP 206: Computer Architecture and Implementation - PowerPoint PPT Presentation

About This Presentation
Title:

COMP 206: Computer Architecture and Implementation

Description:

single CPU: executing multiple processes ('threads' ... Threads: multiple processes that share code and data ... slows down the execution of individual threads ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 16
Provided by: Montek5
Learn more at: http://www.cs.unc.edu
Category:

less

Transcript and Presenter's Notes

Title: COMP 206: Computer Architecture and Implementation


1
COMP 206Computer Architecture and Implementation
  • Montek Singh
  • Mon, Dec 5, 2005
  • Topic Intro to Multiprocessors and Thread-Level
    Parallelism

2
Outline
  • Motivation
  • Multiprocessors
  • SISD, SIMD, MIMD, and MISD
  • Memory organization
  • Communication mechanisms
  • Multithreading
  • Reading HP3 6.1, 6.3 (snooping), and 6.9

3
Motivation
  • Instruction-Level Parallelism (ILP) What all we
    have covered so far
  • simple pipelining
  • dynamic scheduling scoreboarding and Tomasulos
    alg.
  • dynamic branch prediction
  • multiple-issue architectures superscalar, VLIW
  • hardware-based speculation
  • compiler techniques and software approaches
  • Bottom line There just arent enough
    instructions that can actually be executed in
    parallel!
  • instruction issue limit on maximum issue count
  • branch prediction imperfect
  • registers finite
  • functional units limited in number
  • data dependencies hard to detect dependencies
    via memory

4
So, What do we do?
  • Key Idea Increase number of running processes
  • multiple processes at a given point in time
  • i.e., at the granularity of one (or a few) clock
    cycles
  • not sufficient to have multiple processes at the
    OS level!
  • Two Approaches
  • multiple CPUs each executing a distinct
    process
  • Multiprocessors or Parallel Architectures
  • single CPU executing multiple processes
    (threads)
  • Multi-threading or Thread-level parallelism

5
Taxonomy of Parallel Architectures
  • Flynns Classification
  • SISD Single instruction stream, single data
    stream
  • uniprocessor
  • SIMD Single instruction stream, multiple data
    streams
  • same instruction executed by multiple processors
  • each has its own data memory
  • Ex multimedia processors, vector architectures
  • MISD Multiple instruction streams, single data
    stream
  • successive functional units operate on the same
    stream of data
  • rarely found in general-purpose commercial
    designs
  • special-purpose stream processors (digital
    filters etc.)
  • MIMD Multiple instruction stream, multiple data
    stream
  • each processor has its own instruction and data
    streams
  • most popular form of parallel processing
  • single-user high-performance for one
    application
  • multiprogrammed running many tasks
    simultaneously (e.g., servers)

6
Multiprocessor Memory Organization
  • Centralized, shared-memory multiprocessor
  • usually few processors
  • share single memory bus
  • use large caches

7
Multiprocessor Memory Organization
  • Distributed-memory multiprocessor
  • can support large processor counts
  • cost-effective way to scale memory bandwidth
  • works well if most accesses are to local memory
    node
  • requires interconnection network
  • communication between processors becomes more
    complicated, slower

8
Multiprocessor Hybrid Organization
  • Use distributed-memory organization at top level
  • Each node itself may be a shared-memory
    multiprocessor (2-8 processors)

9
Communication Mechanisms
  • Shared-Memory Communication
  • around for a long time, so well understood and
    standardized
  • memory-mapped
  • ease of programming when communication patterns
    are complex or dynamically varying
  • better use of bandwidth when items are small
  • Problem cache coherence harder
  • use Snoopy and other protocols
  • Message-Passing Communication
  • simpler hardware because keeping caches coherent
    is easier
  • communication is explicit, simpler to understand
  • focusses programmer attention on communication
  • synchronization naturally associated with
    communication
  • fewer errors due to incorrect synchronization

10
Multithreading
  • Threads multiple processes that share code and
    data (and much of their address space)
  • recently, the term has come to include processes
    that may run on different processors and even
    have disjoint address spaces, as long as they
    share the code
  • Multithreading exploit thread-level parallelism
    within a processor
  • fine-grain multithreading
  • switch between threads on each instruction!
  • coarse-grain multithreading
  • switch to a different thread only if current
    thread has a costly stall
  • E.g., switch only on a level-2 cache miss

11
Multithreading
  • Fine-grain multithreading
  • switch between threads on each instruction!
  • multiple threads executed in interleaved manner
  • interleaving is usually round-robin
  • CPU must be capable of switching threads on every
    cycle!
  • fast, frequent switches
  • main disadvantage
  • slows down the execution of individual threads
  • that is, traded off latency for better throughput

12
Multithreading
  • Coarse-grain multithreading
  • switch only if current thread has a costly stall
  • E.g., level-2 cache miss
  • can accommodate slightly costlier switches
  • less likely to slow down an individual thread
  • a thread is switched off only when it has a
    costly stall
  • main disadvantage
  • limited in ability to overcome throughput losses
  • shorter stalls are ignored, and there may be
    plenty of those
  • issues instructions from a single thread
  • every switch involves emptying and restarting the
    instruction pipeline

13
Simultaneous Multithreading (SMT)
  • Example new Pentium with Hyperthreading
  • Key Idea Exploit ILP across multiple threads!
  • i.e., convert thread-level parallelism into more
    ILP
  • exploit following features of modern processors
  • multiple functional units
  • modern processors typically have more functional
    units available than a single thread can utilize
  • register renaming and dynamic scheduling
  • multiple instructions from independent threads
    can co-exist and co-execute!

14
SMT Illustration (Fig. 6.44 HP3)
  • A superscalar processor with no multithreading
  • A superscalar processor with coarse-grain
    multithreading
  • A superscalar processor with fine-grain
    multithreading
  • A superscalar processor with simultaneous
    multithreading (SMT)

15
SMT Design Challenges
  • Dealing with a large register file
  • needed to hold multiple contexts
  • Maintaining low overhead on clock cycle
  • fast instruction issue choosing what to issue
  • instruction commit choosing what to commit
  • keeping cache conflicts within acceptable bounds
Write a Comment
User Comments (0)
About PowerShow.com