Title: Multithreaded Microprocessors and Multiprocessor SoCs
1Multithreaded Microprocessorsand Multiprocessor
SoCs
- Sam Sandbote
- CSE 8383 Advanced Computer Architecture
- February 23, 2006
2Topics
- Technical Drivers
- Simultaneous Multithreading
- Alternative Perspectives
- What is an SoC, Anyway?
3Topics
- Technical Drivers
- Simultaneous Multithreading
- Alternative Perspectives
- What is an SoC, Anyway?
4Why Consider MP on Chip?
- The industry does not fundamentally change unless
it is forced against a wall - We would prefer to scale as we always have, if we
could - Most programmers are not skilled in the art of
parallel programming - Confluence of 3 trends has forced the industry to
go MP - Architectural tricks to speed up single programs
have limits - Locality of reference (cache size)
- ILP (superscalar issue width, window size)
- Building faster clocked logic is getting
exponentially harder - Process tech still shrinking designs must use
that area!
5Topics
- Technical Drivers
- Simultaneous Multithreading
- Alternative Perspectives
- What is an SoC, Anyway?
6Simultaneous MT
- Concept multiplex the execution of 2 or more
threads - Each maintains its own architectural register
state - PC, R0-Rn, SP, CC, etc these are maintained
per-thread - What happens when we mix two instruction streams?
- They are guaranteed not to have any data
dependencies between them - Even for memory addresses!
- Only register dependencies are considered by
out-of-order machines - Conceptually, available ILP is doubled
- We have enough unrelated instructions from a
second thread to fill in pipeline bubbles left by
the first
7Multithreaded Usage Models
- Coarse Application-Level Parallelism
- Each context corresponds to a process under OS
control - Make the OS believe two processors exist
- Still hard to implement Intel took 2 years to
get the bugs out of Pentium 4 HT - Fine Native Multithreaded ISA
- Constructs fork, join, quit are machine
instructions - What happens when we fork more threads than
hardware supports? - Ultra-Fine Well, basically same as ILP
8What Can Be Shared, at What Cost?
Resource Impact on Single-Thread Performance Notes
Fetch Bandwidth High
Instruction Cache Medium Must support hit-under-miss
Branch Predictor State Medium
Exec Units None Small and cheap to replicate
Data Cache Very High Must support hit-under-miss
9Athlon64 Die Photo
10Proliferation of Context Arbitration
- Sharing implies
- Programmer declares a QoS for a thread upon its
startup - This QoS must be distributed
- Arbitration must exist for
- Fetch bandwidth
- Dispatch and/or Issue
- Cache and/or Branch Predictor Utilization
- This is a very good area for research
- External Access BW/latency
- Additional Pipeline Cycles for Arbitration
Introduced - This is BAD!
11Why Simultaneous MT, Then?
- Most efficient in terms of aggregate IPC
- Consider 4 threads each with a typical
instruction mix - 20 loads, 10 stores
- 20 branching
- 50 in-CPU instructions ADD, MOV, etc.
- Using 4 superscalar speculative processors
- 4 processors, each IPC around 0.8
- Using a 4-way multithreaded processors
- 1 (larger) processor with IPC 1.4 or better
12Topics
- Technical Drivers
- Simultaneous Multithreading
- Alternative Perspectives
- What is an SoC, Anyway?
13Alternative The Fast Context Switch
- Argument Arbitration does not add value and
detracts from performance - Only fill in the giant bubbles when a cache
misses or when a process is swapped out. - Support register state for two or more processes
- OS may see that 2 processes may be running at the
same time - No arbitration at all - only executing one at a
time - These are the first commercial attempts at
multithreading, because verification is much
easier
14Alternative Multiprocessor-on-Chip
- Argument Benefit of resource sharing does not
outweigh cost of performance degradation on a
single thread - CTO of Intel plans to step-and-repeat a smaller,
simpler core such as Centrino - Each processor will have independent L1 D and I
- May or may not share very large central L2
- DRAM controller has long since been integrated
- For the foreseeable future the model will be
tight SMP - Processors connected to DRAM controller via their
old front side bus, which is morphed into a
cache-coherent switch fabric.
15Mainstream CPU of 2008/2009 (45nm)
16Alternative Heterogeneous MP
- Argument Most systems can benefit from having
several different types of processors. - TI wireless OMAP chips are necessarily
heterogeneous. Multiple tasks are very
different - QoS requirements
- MIPS requirements
- Memory bandwidth and access patterns
- Word width (some custom hardware for deframing)
- and some analog sugar sprinkles, too
17Topics
- Technical Drivers
- Simultaneous Multithreading
- Alternative Perspectives
- What is an SoC, Anyway?
18BYOD Bring Your Own Definition
- Embedded memory?
- Embedded processor?
- Just a big ASIC?
- Just another buzz-word
- SSI
- MSI
- LSI
- VLSI
- ULSI were tired, here. Lets just call them
SoCs.