voor dia serie SNSUtrecth't Gooi - PowerPoint PPT Presentation

1 / 55

About This Presentation

Title:

voor dia serie SNSUtrecth't Gooi

Description:

But, some ASICs can be pipelined! ... Long wires in ASICs due to poor final placement of modules ... Can ASICs improve floorplanning? Use good ASIC floorplanning tools ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 56

Provided by: carl296

Category:

more less

Transcript and Presenter's Notes

Title: voor dia serie SNSUtrecth't Gooi

1
(No Transcript)
2
Closing the Gap BetweenASIC and CustomAn ASIC
Perspective

David ChinneryKurt Keutzer
EECSUniversity of California at Berkeley

3
Our questions

How big is the speed gap between ASIC and custom?
Where does the speed go?
How can we close the speed gap?

4
How much is on the table?

manual layout
How big is the gap between ASIC and custom
circuits?
tiling
?
automated place route

RTL synthesis
gate array
standard cell 2 sizes
standard cell 6 sizes
arbitrary circuits
4
5
0.25 um Design Examples

Very high speed custom designs
Alpha 21264A, 750 MHz
Out-of-order execution of instructions
IBM PowerPC 1.0 GHz integer processor,not
commercial
In order execution
ASIC
Tensilica Xtensa processor, 150 MHz worst case
In order execution
Average ASIC, estimated 120 to 150 MHz

5
6
The Gap
PowerPC 1 GHz

manual layout
How big is the gap between ASIC and custom
circuits?
tiling
automated place route

Average ASIC 120 MHz
RTL synthesis
gate array
standard cell 2 sizes
standard cell 6 sizes
arbitrary circuits
6
7
An interesting data point
PowerPC 1 GHz

manual layout
How big is the gap between ASIC and custom
circuits?
tiling
automated place route
Tensilica Xtensa 150 MHz

Average ASIC 120 MHz
RTL synthesis
gate array
standard cell 2 sizes
standard cell 6 sizes
arbitrary circuits
7
8
Observed Gap
PowerPC 1 GHz

manual layout
How big is the gap between ASIC and custom
circuits?
6-8 speed
tiling
automated place route

Average ASIC 120 MHz
RTL synthesis
gate array
standard cell 2 sizes
standard cell 6 sizes
arbitrary circuits
8
9
Where does all that speed go?
PowerPC 1 GHz

manual layout
6-8 speed
tiling
automated place route

Average ASIC 120 MHz
RTL synthesis
gate array
standard cell 2 sizes
standard cell 6 sizes
arbitrary circuits
9
10
Where does all that speed go?
PowerPC 1 GHz

Custom prejudice
ASIC designers are bad
ASIC CAD tools are worse

manual layout
tiling
automated place route

Average ASIC 120 MHz
RTL synthesis
gate array
standard cell 2 sizes
standard cell 6 sizes
arbitrary circuits
10
11
Where does all that speed go?
PowerPC 1 GHz

manual layout

Whats the reality?
Lets take a quick look

tiling
automated place route

Average ASIC 120 MHz
RTL synthesis
gate array
standard cell 2 sizes
standard cell 6 sizes
arbitrary circuits
11
12
Where does the speed go?

Maximum contribution
4.20 architecture
Architecture
Reducing critical path length by inserting
registers or latches

instruction fetch
instruction decode
write
ALU
instruction fetch
instruction decode
ALU
write
12
13
Where does the speed go?

Maximum contribution
1.20 logic design and clock skew
Reducing levels of logic through complex
functions
reduces area and sometimes reduces speed
less overhead due to guard bands and signal wires
Generally worse clock skew in ASICs

VDD
GND
GND
13
14
Where does the speed go?

Maximum contribution
1.25 good floorplanning and placement
Reduce wire lengths by placing connected modules
nearby

14
15
Where does the speed go?

Maximum contribution
1.25 clever sizing of transistors and wires

15
16
Where does the speed go?

Maximum contribution
1.50 through use of dynamic logic on critical
paths
Avoid slow p-transistor chains, reduced area

16
17
Where does the speed go?

Maximum contribution
2.00 due to process variation and accessibility

ASICworst case, worst process
fastest custom bin
produced
2.0
ASIC libraries may lag technology improvements
speed
17
18
Full Range from ASIC to Custom

Maximum contribution summary
4.20 architecture
1.20 logic design and clock skew
1.25 good floorplanning and placement
1.25 clever sizing of transistors and wires
1.50 through dynamic logic on critical paths
2.00 due to process variation and accessibility
Good custom might be 23.6 better than bad ASIC.
Your mileage may vary!

23.6
18
19
Full Range from ASIC to Custom

Maximum contribution summary
4.20 architecture
1.20 logic design and clock skew
1.25 good floorplanning and placement
1.25 clever sizing of transistors and wires
1.50 through dynamic logic on critical paths
2.00 due to process variation and accessibility
Good custom might be 23.6 better than bad ASIC.
Lets look at all that more carefully

23.6
19
20
First the facts Critical Path Delay

Delay is a function of
Gate and wire delays

data
Tclock1
Q2
Q1
critical path, 5 logic levels
Tclock1
Tclock2
clock
20
21
Critical Path Delay

Delay is a function of
Gate and wire delays
Data stable during
Setup time, before clock

data
Tclock1
Q2
Q1
critical path, 5 logic levels
Tclock1
Tclock2
clock
21
22
Critical Path Delay

Delay is also a function of
Clock skew

data
Tclock1
Tclock2
Q2
Q2
clock skew
Q2
Q1
Tclock1
Tclock2
clock
22
23
Critical Path Delay

Delay is also a function of
Clock skew
Clock-to-Q

data
Tclock1
Tclock2
Q2
Q2
Q1
Tclock1
Tclock2
clock
23
24
1. Architecture

Increase speed by reducing the critical path
length
Pipeline add latches between gates
Must balance pipeline stages to maximize gain

instruction fetch
instruction decode
write
ALU
instruction fetch
instruction decode
ALU
write
If we add 5 stages, why is speed-up 4 and not 5?
24
25
Pipelining Comparison

Compare in-order execution
estimate latch, clock skew overheads of
20(overhead in 0.35 um Alpha 21264)
ASIC, Xtensa
Pipelined 4.0 1.0 ns overhead 5ns cycle
Unpiplined 5 4.0 1.0 (overhead) 21.0 ns
cycle
Creating five pipeline stages in Xtensa gives
4.2
Speed-up is less due to pipelining overheads
Latch delay
Clock skew
Limited number of pipeline stages
More stages increases cost of branch
misprediction, stalls

25
26
Can we improve the architecture
andmicro-architecture of ASICs?

Not always Fundamental problem in some
applications
PCI Bus interface has cycle-to-cycle dependency
No opportunity for pipelining
Bottom line
Unpipelined ASICs lose factor of 4.20
Compared with custom and pipelined ASICs

23.6
26
27
But, some ASICs can be pipelined!

If we can perform instructions in parallel for
application, then pipeline
Five stages in Xtensa
4.2X speed-up

23.6
4.20
27
28
2. Better Logic Design in Custom

Custom designs can have specially designed
structures
Reducing levels of logic through complex
functions
Reduces area and sometimes reduces speed
Less overhead as less guard banding and signal
wires
Superior design of regular logic like adders,
multipliers
Incorporate logic in latches
Reduce latch overhead

28
29
Clock Skew, Latch Design Comparison

Greater clock skew in ASICs, contributing 1.10
Best ASIC 5, 250 ps at 250 MHz
Xtensa in 0.25 um (at typical speeds, typical
process)
Custom 5, 75 ps at 600 MHz
Alpha 21264 in 0.35 um
Better latch design would also impact pipelining
E.g. if could have 0.2 ns custom overhead (1.0 ns
for ASIC)
ASIC 5.0 ns cycle ? 4.2 ns cycle
1.20 is due to logic design and clock skew

clock skew
29
30
Can we improve ASIC logic design?

Add custom macros to ASIC library
Drawback takes time to design macros
Reuse, amortizing design time
Limited by design overhead of macros that wont
be reused
Designer must ensure ASIC description invokes
predefined macros

custom ingredients
23.6
adder
barrel shifter
register file
MAC
30
31
Can we improve ASIC logic design?

Add custom macros to ASIC library
Drawback takes time to design macros
Reuse, amortizing design time
Limited by design overhead of macros that wont
be reused
Designer must ensure ASIC description invokes
predefined macros

custom ingredients
23.6
adder
1.20
barrel shifter
register file
MAC
31
32
3. Floorplanning and Placement

Increase speed by avoiding cross-chip critical
path wires
Place interconnected modules nearby
Long wires in ASICs due to poor final placement
of modules
Impact of long wires BACPAC 0.25 um ASIC, 12
million transistors
With shorter wires design would be 1.25 faster

32
33
Can ASICs improve floorplanning?

Use good ASIC floorplanning tools
Improve tool recognition of similar structures
that can be abutted and tiled
Do about as well as custom in this regard

23.6
33
34
Can ASICs improve floorplanning?

Use good ASIC floorplanning tools
Improve tool recognition of similar structures
that can be abutted and tiled
Do about as well as custom 1.25x

23.6
1.25
34
35
4. Transistor and Wire Sizing

Reduce gate and wire delays on critical path
Size gate output to drive load of fan-out gates
and wires
Size up transistors to drive large loads

Widen wires to decrease resistance
Delay is proportional to resistance capacitance

35
36
Impact of poor library design

ASIC standard cell library has discrete gate
sizes
Some libraries used with insufficient range of
gate drives
One or two sizes per cell
Few inverters, few buffers
Single gate polarities (less compact)
Tools for sizing wires in ASIC designs not
available
After layout, resizing transistors knowing layout
can give up to 20 improvement Gavrilov 97
Custom gains about a factor of 1.25 due to these
problems.

36
37
Can we improve ASIC sizing?
ASIC libraries can be improved

Use library with dual polarities,several (e.g.
6) drive strengths per cell

23.6
37
38
Can we improve ASIC sizing?
ASIC libraries can be improved

Use library with dual polarities,several (e.g.
6) drive strengths per cell

23.6
1.20

Custom still about 1.05 better.
Iterative transistor sizing and resynthesis
Can improve speed by up to 20 Gavrilov ICCAD97

38
39
5. Dynamic Logic

Using dynamic logic on critical paths
Avoids slow p-transistor chains
Higher speed
Reduces area
Only pull down network, and charging transistors
Dynamic logic increases speed by about
1.50Nowka ICCD98

slow p-chain
VDD
GND
GND
clock
domino logic
static CMOS
39
40
Dynamic Logic in ASICs?

Dynamic logic requires careful design
Glitching causes incorrect result
More susceptible to noise
Precharge power spike
Careful design of power supply for dynamic
Static CMOS is lower power
ASIC tools are unable to support dynamic logic
Dynamic logic libraries not available
Unable to use library driven static timing
analysis
Interface of dynamic and static logic is
complicated

23.6
Custom remains 1.50 better.
40
41
Dynamic Logic in ASICs?

Dynamic logic requires careful design
Glitching causes incorrect result
More susceptible to noise
Precharge power spike
Careful design of power supply for dynamic
Static CMOS is lower power
ASIC tools are unable to support dynamic logic
Dynamic logic libraries not available
Unable to use library driven static timing
analysis
Interface of dynamic and static logic is
complicated

23.6
cant improve
Custom remains 1.50 better.
41
42
But Dynamic Logic in Custom?

Dynamic logic problems more pronounced in deep
submicron
Power dissipation
Power consumption limited by supply
Heat dissipation limited by packaging
More noise
Higher frequencies cause more noise
More cross-talk noise as wires are closer
Longer design times than static CMOS
Prohibitive for progressively larger designs
Dynamic logic likely to lose its advantages by
100 nm.
(Sorry Mark )

23.6
42
43
Dynamic Logic in Custom?

Dynamic logic problems more pronounced in deep
submicron
Power dissipation
Power consumption limited by supply
Heat dissipation limited by packaging
More noise
Higher frequencies cause more noise
More cross-talk noise as wires are closer
Longer design times than static CMOS
Prohibitive for progressively larger designs
Dynamic logic likely to lose its advantages by
100 nm.
(Sorry Mark )

15.7
43
44
6. Process Variation and Accessibility

ASIC libraries calculate worst case speeds for
process
Speeds off a line may vary by 20 to 40
Less variation in a mature process
Custom designs can down-bin the slower chips

fast custom, rest slower
good yield
ASICworst case, worst process
produced
1.2
1.4
speed
44
45
Process Variation and Accessibility

ASIC libraries calculate worst case speeds for
process
Speeds off a line vary by 20 to 40
Less variation in a mature process
Custom designs can down bin the slower chips
Could run ASICs faster than worst case speeds,
with high yield

acceptable ASIC yield
ASICworst case, worst process
produced
1.2
speed
45
46
Process Variation and Accessibility

ASIC libraries calculate worst case speeds for
process
Speeds off a line vary by 20 to 40
Less variation in a mature process
Custom designs can down bin the slower chips
Could run ASICs faster than worst case speeds,
with high yield
Fabrication plants vary in speed by up to 25
Tensilica Xtensa modeling

ASIC worst case
Fab A
Fab B
produced
1.2
speed
46
47
Process Variation and Accessibility

ASIC libraries may not keep up with process
improvements
Technology improvements
Intel 0.25 um 856 process had 18 speed
improvement, over the life of the process
generation

acceptableASIC yield
ASIC worst case
Improved process
Fab A
Fab B
produced
1.2 x 1.18
1.4
speed
ASIC libraries may lag technology improvements
47
48
Process Variation and Accessibility

Total difference of 2.00 between
worst case ASIC speeds on worst process, with
original library (lagging process improvements)
and fast custom with fully up-to-date technology

acceptableASIC yield
fast customs,rest slower
fastcustoms,restslower
ASIC worst case
Fab A
Fab B
produced
2.0
speed
ASIC libraries may lag technology improvements
48
49
Process Variation and Accessibility

Can run ASICs faster than worst case speeds
Test what speed can run at with high yield
Improve speed by 30 to 40
Xtensa can run at 250 MHz in 0.25 um
Choose good fabrication company
May be more expensive
20 better than worst processes in technology
Tensilica Xtensa modeling
Bottom line
ASICs in a slow process, at worst case speeds,
lose factor of 2.00

23.6
49
50
Process Variation and Accessibility

Can run ASICs faster than worst case speeds
Test what speed can run at with high yield
Improve speed by 30 to 40
Xtensa can run at 250 MHz in 0.25 um
Choose good fabrication company
May be more expensive
20 better than worst processes in technology
Tensilica Xtensa modeling
Bottom line
ASICs in a slow process, at worst case speeds,
lose factor of 2.00
But ASICs in a good process, running at better
than worst case speeds, get within 20 of custom
(gain 1.66x relative through best practice)

1.66
23.6
50
51
Custom advantages over best ASIC practice

1.20 logic design and clock skew
1.05 clever sizing of transistors and wires
1.50 (today) dynamic logic on critical paths
1.20 process variation and accessibility
Custom still 2.3 better!

2.3
51
52
Custom advantages over best ASIC practice

Custom advantages relative to best ASIC methods
1.20 logic design and clock skew
1.05 clever sizing of transistors and wires
1.50 (today) dynamic logic on critical paths
1.20 process variation and accessibility

2.3
But custom only 1.5 faster at 100 nm if
dynamic logic not viable.
1.5
52
53
Another look (NB big is BAD!)
4.20
2.00
1.50
1.25
53
54
Punch Line

ASIC performance lags custom by 8
Attention typically focused on detailed circuit
design and layout as primary reason
Our work indicates that
Architecture, logic design and clock skew1.20
to 5.00,
And processing 1.20 to 2.00
play a much larger role
and custom circuit design and layout offer only
about 1.30
Dynamic logic is one other significant factor in
why custom designs can do better 1.50
ASIC/custom gap will narrow further (x 1.20
1.50) if custom loses dynamic logic advantage in
small geometries
Response to Bill Yes, its easy to make really
slow ASICs if you have a critical path with long,
unbuffered wires, even if you have a good
architecture manufactured in a fast process.