Multithreaded Microprocessors and Multiprocessor SoCs - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Multithreaded Microprocessors and Multiprocessor SoCs

Description:

We would prefer to scale as we always have, if we could. Most programmers are not skilled in ... Building faster clocked logic is getting exponentially harder ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 19
Provided by: Adm952
Category:

less

Transcript and Presenter's Notes

Title: Multithreaded Microprocessors and Multiprocessor SoCs


1
Multithreaded Microprocessorsand Multiprocessor
SoCs
  • Sam Sandbote
  • CSE 8383 Advanced Computer Architecture
  • February 23, 2006

2
Topics
  1. Technical Drivers
  2. Simultaneous Multithreading
  3. Alternative Perspectives
  4. What is an SoC, Anyway?

3
Topics
  1. Technical Drivers
  2. Simultaneous Multithreading
  3. Alternative Perspectives
  4. What is an SoC, Anyway?

4
Why Consider MP on Chip?
  • The industry does not fundamentally change unless
    it is forced against a wall
  • We would prefer to scale as we always have, if we
    could
  • Most programmers are not skilled in the art of
    parallel programming
  • Confluence of 3 trends has forced the industry to
    go MP
  • Architectural tricks to speed up single programs
    have limits
  • Locality of reference (cache size)
  • ILP (superscalar issue width, window size)
  • Building faster clocked logic is getting
    exponentially harder
  • Process tech still shrinking designs must use
    that area!

5
Topics
  1. Technical Drivers
  2. Simultaneous Multithreading
  3. Alternative Perspectives
  4. What is an SoC, Anyway?

6
Simultaneous MT
  • Concept multiplex the execution of 2 or more
    threads
  • Each maintains its own architectural register
    state
  • PC, R0-Rn, SP, CC, etc these are maintained
    per-thread
  • What happens when we mix two instruction streams?
  • They are guaranteed not to have any data
    dependencies between them
  • Even for memory addresses!
  • Only register dependencies are considered by
    out-of-order machines
  • Conceptually, available ILP is doubled
  • We have enough unrelated instructions from a
    second thread to fill in pipeline bubbles left by
    the first

7
Multithreaded Usage Models
  • Coarse Application-Level Parallelism
  • Each context corresponds to a process under OS
    control
  • Make the OS believe two processors exist
  • Still hard to implement Intel took 2 years to
    get the bugs out of Pentium 4 HT
  • Fine Native Multithreaded ISA
  • Constructs fork, join, quit are machine
    instructions
  • What happens when we fork more threads than
    hardware supports?
  • Ultra-Fine Well, basically same as ILP

8
What Can Be Shared, at What Cost?
Resource Impact on Single-Thread Performance Notes
Fetch Bandwidth High
Instruction Cache Medium Must support hit-under-miss
Branch Predictor State Medium
Exec Units None Small and cheap to replicate
Data Cache Very High Must support hit-under-miss
9
Athlon64 Die Photo
10
Proliferation of Context Arbitration
  • Sharing implies
  • Programmer declares a QoS for a thread upon its
    startup
  • This QoS must be distributed
  • Arbitration must exist for
  • Fetch bandwidth
  • Dispatch and/or Issue
  • Cache and/or Branch Predictor Utilization
  • This is a very good area for research
  • External Access BW/latency
  • Additional Pipeline Cycles for Arbitration
    Introduced
  • This is BAD!

11
Why Simultaneous MT, Then?
  • Most efficient in terms of aggregate IPC
  • Consider 4 threads each with a typical
    instruction mix
  • 20 loads, 10 stores
  • 20 branching
  • 50 in-CPU instructions ADD, MOV, etc.
  • Using 4 superscalar speculative processors
  • 4 processors, each IPC around 0.8
  • Using a 4-way multithreaded processors
  • 1 (larger) processor with IPC 1.4 or better

12
Topics
  1. Technical Drivers
  2. Simultaneous Multithreading
  3. Alternative Perspectives
  4. What is an SoC, Anyway?

13
Alternative The Fast Context Switch
  • Argument Arbitration does not add value and
    detracts from performance
  • Only fill in the giant bubbles when a cache
    misses or when a process is swapped out.
  • Support register state for two or more processes
  • OS may see that 2 processes may be running at the
    same time
  • No arbitration at all - only executing one at a
    time
  • These are the first commercial attempts at
    multithreading, because verification is much
    easier

14
Alternative Multiprocessor-on-Chip
  • Argument Benefit of resource sharing does not
    outweigh cost of performance degradation on a
    single thread
  • CTO of Intel plans to step-and-repeat a smaller,
    simpler core such as Centrino
  • Each processor will have independent L1 D and I
  • May or may not share very large central L2
  • DRAM controller has long since been integrated
  • For the foreseeable future the model will be
    tight SMP
  • Processors connected to DRAM controller via their
    old front side bus, which is morphed into a
    cache-coherent switch fabric.

15
Mainstream CPU of 2008/2009 (45nm)
16
Alternative Heterogeneous MP
  • Argument Most systems can benefit from having
    several different types of processors.
  • TI wireless OMAP chips are necessarily
    heterogeneous. Multiple tasks are very
    different
  • QoS requirements
  • MIPS requirements
  • Memory bandwidth and access patterns
  • Word width (some custom hardware for deframing)
  • and some analog sugar sprinkles, too

17
Topics
  1. Technical Drivers
  2. Simultaneous Multithreading
  3. Alternative Perspectives
  4. What is an SoC, Anyway?

18
BYOD Bring Your Own Definition
  • Embedded memory?
  • Embedded processor?
  • Just a big ASIC?
  • Just another buzz-word
  • SSI
  • MSI
  • LSI
  • VLSI
  • ULSI were tired, here. Lets just call them
    SoCs.
Write a Comment
User Comments (0)
About PowerShow.com