Future of the Microprocessors - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Future of the Microprocessors

Description:

Whether will HW alone continue to extract parallelism? Compatibility with legacy softwares ... better at exploiting parallelism. Workloads come to contain ... – PowerPoint PPT presentation

Number of Views:186

Avg rating:3.0/5.0

Slides: 19

Provided by: mae74

Category:

more less

Transcript and Presenter's Notes

Title: Future of the Microprocessors

1
Future of the Microprocessors

Billion-Transistor Architectures
IEEE Computer, September 1997

2
Billion-Transistor Architectures

Future Trends
Hardware trends and physical limits
In the 1994 road map, the Semiconductor Industry
Association predicted
by 2010, 800 million Trs with thousands of pins,
1000-bit bus, and clock speeds over 2 GHz
180 W
On-chip wires are becoming much slower relative
to logic gates
impossible to maintain one global clock over the
entire chip
sending signals across a billion trs as many as
20 cycles
System software
Whether will HW alone continue to extract
parallelism?
Compatibility with legacy softwares

Future workloads
Architectural design is driven by the dominant
anticipated workload
multimedia workloads
Design, verification, and testing
complex hundreds of engineers
validation and testing 40 to 50 of an Intel
chips design cost and 6 of the transistors
Economies of scale
Fabrication plants 2 billion (a factor of ten
more than a decade ago)
need larger markets mass marketing of computer
chips

4
Future Architectures
Advanced superscalar processors
Simultaneous multithreaded processors
Vector IRAM processors
Raw(configurable) processors
Superspeculative processors
Trace(multiscalar) processors
Chip multiprocessors
Wire delays become dominant, forcing HW to be
more distributed System software(compilers)
becomes better at exploiting parallelism Workloads
come to contain more exploitable
parallelism Design and validation costs become
more limiting
Trends
5
Advanced Superscalar

One Billion Transistors, One Uniprocessor, One
Chip
U of Michigan
Billion transistor processors will be much as
they are today
Bigger, faster, and wider
Out-of-order fetching, Multi-Hybrid branch
predictors, and trace caches
Large, out-of-order-issue instruction window
(2,000 instructions), clustered banks of
functional units
The current uniprocessor model can provide
sufficient performance and use a billion
transistors effectively without the programming
model or discarding software compatibility.

6
One Billion Transistors, One Uniprocessor, One
Chip
60 M for execution core 240 M for trace cache 48
M for branch predictor 32 M for data cache 640 M
for L2 cache
7
Superspeculative

Superspeculative Microarchitecture for Beyond AD
2000
CMU
Billion-transistor uniprocessor
Massive speculation at all levels to improve
performance
Trace caches and advanced branch prediction
Without this much speculation, future processors
will be limited by true dependences
Their investigations discovered large speedups on
code that have traditionally not been ameanable
to finding ILP

8
Superspeculative Microarchitecture for Beyond AD
2000
9
Simultaneous Multithreading

Simultaneous Multi-Threading(SMT) Processor
Wide-issue superscalar Multithreaded Processor
multiple issues per cycle
HW for multithreads (registers, PC, and so on)
Exploit all types of parallelism
Within a thread
Among threads

10
Trace

Trace Processors Moving to Fourth-Generation
Microarchitectures
U of Wisconsin-Madison
Multiple, distributed on-chip processor cores
Each of the cores simultaneously executes a
different trace
All but one core executes the traces
speculatively, having used branch prediction to
select traces that follow the one executing
It does not require explicit compiler supports
Rely heavily on replication, hierarchy, and
prediction

11
(No Transcript)
12
Vector IRAM

Vector IRAM
U of California, Berkeley
Intelligent RAM(IRAM)
To increase the on-chip memory capacity by using
DRAM instead of SRAM
The resultant on-chip memory capacity
High memory bandwidth
cost-effective vector processing

13
(No Transcript)
14
A Single-Chip Multiprocessor

Single-Chip Multiprocessor
Stanford University
Multiple (four to 16) simple, fast processors on
one chip
each processor is tightly coupled to a small,
fast, level-one cache
all processor share a larger level-two cache
a parallel job or independent tasks
Simpler design, faster validation, cleaner
functional partitioning, and higher theoretical
peak performance
Compilers will have to make code explicitly
parallel
Old ISAs will be incompatible with this
architecture

15
(No Transcript)
16
Raw Processor

Baring It All to Software RAW Machines
MIT
The most radical architecture
Highly parallel architectures with hundreds of
very simple processors coupled to a small portion
of the one-chip memory
Each processor or tile
a small bank of configurable logic, allowing
synthesis of complex operations directly in
configurable HW
Compilers efficacy
does not use a traditional instruction set
architecture
all units are told explicitly what to do by the
compiler
the compiler even schedules most of the intertile
communication