Title: Future of the Microprocessors
1 Future of the Microprocessors
- Billion-Transistor Architectures
- IEEE Computer, September 1997
2Billion-Transistor Architectures
- Future Trends
- Hardware trends and physical limits
- In the 1994 road map, the Semiconductor Industry
Association predicted - by 2010, 800 million Trs with thousands of pins,
1000-bit bus, and clock speeds over 2 GHz - 180 W
- On-chip wires are becoming much slower relative
to logic gates - impossible to maintain one global clock over the
entire chip - sending signals across a billion trs as many as
20 cycles - System software
- Whether will HW alone continue to extract
parallelism? - Compatibility with legacy softwares
3- Future workloads
- Architectural design is driven by the dominant
anticipated workload - multimedia workloads
- Design, verification, and testing
- complex hundreds of engineers
- validation and testing 40 to 50 of an Intel
chips design cost and 6 of the transistors - Economies of scale
- Fabrication plants 2 billion (a factor of ten
more than a decade ago) - need larger markets mass marketing of computer
chips
4Future Architectures
Advanced superscalar processors
Simultaneous multithreaded processors
Vector IRAM processors
Raw(configurable) processors
Superspeculative processors
Trace(multiscalar) processors
Chip multiprocessors
Wire delays become dominant, forcing HW to be
more distributed System software(compilers)
becomes better at exploiting parallelism Workloads
come to contain more exploitable
parallelism Design and validation costs become
more limiting
Trends
5Advanced Superscalar
- One Billion Transistors, One Uniprocessor, One
Chip - U of Michigan
- Billion transistor processors will be much as
they are today - Bigger, faster, and wider
- Out-of-order fetching, Multi-Hybrid branch
predictors, and trace caches - Large, out-of-order-issue instruction window
(2,000 instructions), clustered banks of
functional units - The current uniprocessor model can provide
sufficient performance and use a billion
transistors effectively without the programming
model or discarding software compatibility.
6One Billion Transistors, One Uniprocessor, One
Chip
60 M for execution core 240 M for trace cache 48
M for branch predictor 32 M for data cache 640 M
for L2 cache
7Superspeculative
- Superspeculative Microarchitecture for Beyond AD
2000 - CMU
- Billion-transistor uniprocessor
- Massive speculation at all levels to improve
performance - Trace caches and advanced branch prediction
- Without this much speculation, future processors
will be limited by true dependences - Their investigations discovered large speedups on
code that have traditionally not been ameanable
to finding ILP
8Superspeculative Microarchitecture for Beyond AD
2000
9Simultaneous Multithreading
- Simultaneous Multi-Threading(SMT) Processor
- Wide-issue superscalar Multithreaded Processor
- multiple issues per cycle
- HW for multithreads (registers, PC, and so on)
- Exploit all types of parallelism
- Within a thread
- Among threads
10Trace
- Trace Processors Moving to Fourth-Generation
Microarchitectures - U of Wisconsin-Madison
- Multiple, distributed on-chip processor cores
- Each of the cores simultaneously executes a
different trace - All but one core executes the traces
speculatively, having used branch prediction to
select traces that follow the one executing - It does not require explicit compiler supports
- Rely heavily on replication, hierarchy, and
prediction
11(No Transcript)
12Vector IRAM
- Vector IRAM
- U of California, Berkeley
- Intelligent RAM(IRAM)
- To increase the on-chip memory capacity by using
DRAM instead of SRAM - The resultant on-chip memory capacity
- High memory bandwidth
- cost-effective vector processing
13(No Transcript)
14A Single-Chip Multiprocessor
- Single-Chip Multiprocessor
- Stanford University
- Multiple (four to 16) simple, fast processors on
one chip - each processor is tightly coupled to a small,
fast, level-one cache - all processor share a larger level-two cache
- a parallel job or independent tasks
- Simpler design, faster validation, cleaner
functional partitioning, and higher theoretical
peak performance - Compilers will have to make code explicitly
parallel - Old ISAs will be incompatible with this
architecture
15(No Transcript)
16Raw Processor
- Baring It All to Software RAW Machines
- MIT
- The most radical architecture
- Highly parallel architectures with hundreds of
very simple processors coupled to a small portion
of the one-chip memory - Each processor or tile
- a small bank of configurable logic, allowing
synthesis of complex operations directly in
configurable HW - Compilers efficacy
- does not use a traditional instruction set
architecture - all units are told explicitly what to do by the
compiler - the compiler even schedules most of the intertile
communication
17(No Transcript)
18(No Transcript)