New Trends in Designing Processors presentation

About This Presentation

Transcript and Presenter's Notes

Title: New Trends in Designing Processors

1
New Trends in Designing Processors

ASIC Seminar
Instructor Dr S.M. Fakhraie
Presented by Amir Naghdinezhad
Spring 2006

This is a class presentation. All data are copy
righted to respective authors as listed in the
references and have been used here for
educational purpose only
2
Outline

Introduction
Raw
Imagine
Smart memories
Trips
Conclusion
References

3
Introduction

In the 1970s
Memory was expensive
So CISC architectures with
Dense instruction encoding
Variable-length instructions
Small numbers of registers
In 1980s
An entire RISC processor could fit on a single
chip
RISC processors attained high performance despite
reduction in complexity

4
Introduction

Past 20 years
Aggressive pipelining and compiler scheduling
40 per year performance scaling10
With only small ISA changes
Future architectures
Pipeline depth limits
Acceleration of clock speeds and power limits
Increasing delays through global on-chip wires
So new designs are required

5
Raw

At MIT university
A general-purpose architecture with interrupts,
caches and context switches
Attacks the wire-delay problem
Enables the programmer or compiler to directly
program the wiring resources
Composed of 16 identical programmable tiles
All signals registered at tile boundaries
One clock cycle delay
No global signals

6
Raw Architecture
The Raw microprocessor 1
7
Raw Tile Architecture

A tile contains
An 8-stage in-order single-issue MIPS-style
processing pipeline
A 4-stage single-precision pipelined FPU
A 32KB data cache
96KB of instruction caches
Two types of communication routers static and
dynamic

8
Raw Tile Interconnections

Four 32bit full duplex on-chip networks
Two static
To route operands among local and remote ALUs
To route data streams among tiles, DRAM, I/O
ports
Two dynamic
Cache misses, interrupts and dynamic messages
Each tile is connected only to its four neighbors

Raw tile architecture 2
9
Raw Compute Processor
Raw compute processor pipeline 1
10
Raw Performance Survey
Performance 3
11
Raw Fabrication

180 nm, 6-metal copper ASIC process
3.6 GFLOPS peak
18.23mm x 18.23mm
Clock
420MHz (actual)
Power
10 watts (power save mode)
18 watts typical
35 watts max

Raw die layout 3
12
Imagine A Stream Processor

At Stanford University
A programmable stream processor for media
applications
Imagine is controlled by a host processor
A peak performance of 20 GFLOPS5
With
128-Kbyte stream register file
48 floating-point arithmetic units in eight
arithmetic clusters
A streaming memory system with four SDRAM
channels
A microcontroller, a network interface and a
stream controller

13
Imagine Stream Processors

A bridge between inflexible special purpose and
programmable architectures
Are DSPs, targeted at high-performance embedded
applications.
Contain clusters of functional units, supporting
hundreds of arithmetic units.
Exploit
Instruction Level Parallelism (ILP)
Data Parallelism (DP)
Task parallelism (TP) (kernel execution and
stream data transfers)

14
Imagine Stream Processors

The idea is organizing an application into
streams and kernels
A stream contains a set of elements of the same
type.
Simple or complex.
A kernel is the computational unit that works on
streams.
Can have one or more input and output streams
Complex calculations ranging from a few to
thousands of operations per input element

15
Imagine Architecture
There are eight VLIW computation clusters
arranged in a SIMD array.
The Imagine chip is controlled by a host
processor.
Streams of data are stored in Stream Register
File (SRF), which can transfer data to and from
LRFs.
Operands for arithmetic operations are kept
locally in Local Register Files (LRFs) near the
ALUs.
Global data is stored on off-chip memory.
Each Imagine chip has a network interface to
allow high speed communication among Imagine
chips.
The memory system of Imagine allows multiple
streaming memory accesses to occur simultaneously.
Imagine Architecture 57
16
Imagine Fabrication

150 nm, static CMOS standard-cell technology.
Die size of 1.44 cm2
2.8 and 6.2billion operations per second
Clock
500 MHz
2.4 Gflops per watt4
Pentium 4 achieves a peak performance of 12
Gflops at 80 watts4

Imagine die layout 4
17
Smart Memories

At Stanford university
A multiprocessor system
Processing units are in form of Tiles
64 tiles on a chip
A group of four tiles, forms a Quad
Reduces the number of global network interfaces
The memories, the wires, and the computational
model can all be altered to match the
applications.

18
Smart Memories Architecture
Smart memories chips 8
19
Smart Memories Tile Architecture

A reconfigurable memory system
16 independent 8KB(102464b) mat
Each 64b word
Has an extra valid bit and a 4-bit configurable
control field
Is dual ported to allow read-modify-write
operations each cycle
Can be flash cleared via special opcodes
Contains logic in the output read path for
comparisons

20
Smart Memories Tile Architecture

A processor core
A 64-bit processing engine
Two integer clusters
An ALU, register file, and load/store unit for
each
One floating point (FP) cluster
A quad network interface
Connects the different memory mats to processor
Supports up to eight concurrent references

21
Smart Memories Tile Architecture
Smart memories tile 8
22
Smart Memories Latency and Bandwidth

Peak bandwidth with 1GHz clock9
To/from tile memories
16GB/s per mat
128GB/s per tile memory system
To/from tile
64GB/s
Quad network bandwidth
64GB/s

23
Trips

Tera-op, Reliable, Intelligently adaptive
Processing System
At Austin university
An Edge (Explicit Data Graph Execution)
Architecture
Conveys the compile-time dependence graph through
the ISA
Direct instruction communication
The hardware delivers a producer instructions
output directly as an input to a consumer
instruction
Eliminates the majority of a conventional
processors register writes
More energy-efficient delivery from producing to
consuming instructions
The compiler groups instructions into blocks of
instructions

24
Trips Architecture

Two processing cores
Each is a 16-wide out-of-order issue
A 4 4 ALUs with buffers
Four register file banks
Four instruction cache banks
Four data Instruction banks
Four ports into the L2 cache network
Up to eight blocks executing concurrently
2 Mbytes of integrated L2 cache
Organized as 32 banks
Connected with a routing network.

Trips architecture 10
25
Trips Architecture
Trips processor core architecture10
26
Trips Fabrication

IBM CU-11 process (130nm)
18x18 mm chip area
533MHz clock rate
5 TFLOPS in 35nm, 32 GFLOPS in a 130nm

Trips die layout 11
27
Conclusion
28
References

M. B. Taylor, et al. The Raw Microprocessor A
Computational Fabric for Software Circuits and
General-Purpose Programs. IEEE Micro (Mar 2002),
pp. 25--35.
M. B. Taylor, W. Lee, J. Miller, D. Wentzlaff,
I. Bratt, B. Greenwald, H. Hoffmann, P. Johnson,
J. Kim, J. Psota, A. Saraf, N. Shnidman, V.
Strumpen, M.I. Frank, S. Amarasinghe and A.
Agarwal, Evaluation of the Raw Microprocessor An
Exposed-Wire-Delay Architecture for ILP and
Streams, Proceedings of the International
Symposium on Computer Architecture (ISCA), June,
2004.
http//www.cag.csail.mit.edu/raw/
U. J. Kapasi, S. Rixner, W. J. Dally, B.
Khailany, J. H. Ahn, P. Mattson, and J. D. Owens,
"Programmable stream processors," IEEE Computer,
vol. 36, no. 8, pp. 54--62, August 2003.
Brucek Khailany, William J. Dally, Scott Rixner,
Ujval J. Kapasi, Peter Mattson, Jin Namkoong,
John D. Owens, Brian Towles, and Andrew Chang.
"Imagine Media Processing with Streams." IEEE
Micro, Mar/April 2001
http//cva.stanford.edu/imagine/

29
References

S. Sardashti, Designing a Stream Processor,
seminar report, university of Tehran, June 2005.
K. Mai, , et al., "Smart Memories A Modular
Reconfigurable Architecture," Proc. 27th Int'l
Symp. Computer Architecture (ISCA 00), ACM Press,
2000, pp. 161-171.
http//www-vlsi.stanford.edu/smart_memories/
Doug Burger, Stephen W. Keckler, Kathryn S.
McKinley, Michael Dahlin, Lizy Kurian John,
Calvin Lin, Charles R. Moore, James Burrill,
Robert G. McDonald, William Yode Scaling to the
End of Silicon with EDGE Architectures. IEEE
Computer 37(7) 44-55 (2004)
http//www.cs.utexas.edu/users/cart/trips/

Write a Comment

User Comments (0)

About PowerShow.com

New Trends in Designing Processors PowerPoint PPT Presentation