voor dia serie SNSUtrecth't Gooi - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

voor dia serie SNSUtrecth't Gooi

Description:

But, some ASICs can be pipelined! ... Long wires in ASICs due to poor final placement of modules ... Can ASICs improve floorplanning? Use good ASIC floorplanning tools ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 56
Provided by: carl296
Category:
Tags: asics | dia | gooi | serie | snsutrecth | voor

less

Transcript and Presenter's Notes

Title: voor dia serie SNSUtrecth't Gooi


1
(No Transcript)
2
Closing the Gap BetweenASIC and CustomAn ASIC
Perspective
  • David ChinneryKurt Keutzer
  • EECSUniversity of California at Berkeley

3
Our questions
  • How big is the speed gap between ASIC and custom?
  • Where does the speed go?
  • How can we close the speed gap?

4
How much is on the table?

manual layout
How big is the gap between ASIC and custom
circuits?
tiling
?
automated place route

RTL synthesis
gate array
standard cell 2 sizes
standard cell 6 sizes
arbitrary circuits
4
5
0.25 um Design Examples
  • Very high speed custom designs
  • Alpha 21264A, 750 MHz
  • Out-of-order execution of instructions
  • IBM PowerPC 1.0 GHz integer processor,not
    commercial
  • In order execution
  • ASIC
  • Tensilica Xtensa processor, 150 MHz worst case
  • In order execution
  • Average ASIC, estimated 120 to 150 MHz

5
6
The Gap
PowerPC 1 GHz

manual layout
How big is the gap between ASIC and custom
circuits?
tiling
automated place route

Average ASIC 120 MHz
RTL synthesis
gate array
standard cell 2 sizes
standard cell 6 sizes
arbitrary circuits
6
7
An interesting data point
PowerPC 1 GHz

manual layout
How big is the gap between ASIC and custom
circuits?
tiling
automated place route
Tensilica Xtensa 150 MHz

Average ASIC 120 MHz
RTL synthesis
gate array
standard cell 2 sizes
standard cell 6 sizes
arbitrary circuits
7
8
Observed Gap
PowerPC 1 GHz

manual layout
How big is the gap between ASIC and custom
circuits?
6-8 speed
tiling
automated place route

Average ASIC 120 MHz
RTL synthesis
gate array
standard cell 2 sizes
standard cell 6 sizes
arbitrary circuits
8
9
Where does all that speed go?
PowerPC 1 GHz

manual layout
6-8 speed
tiling
automated place route

Average ASIC 120 MHz
RTL synthesis
gate array
standard cell 2 sizes
standard cell 6 sizes
arbitrary circuits
9
10
Where does all that speed go?
PowerPC 1 GHz
  • Custom prejudice
  • ASIC designers are bad
  • ASIC CAD tools are worse


manual layout
tiling
automated place route

Average ASIC 120 MHz
RTL synthesis
gate array
standard cell 2 sizes
standard cell 6 sizes
arbitrary circuits
10
11
Where does all that speed go?
PowerPC 1 GHz

manual layout
  • Whats the reality?
  • Lets take a quick look

tiling
automated place route

Average ASIC 120 MHz
RTL synthesis
gate array
standard cell 2 sizes
standard cell 6 sizes
arbitrary circuits
11
12
Where does the speed go?
  • Maximum contribution
  • 4.20 architecture
  • Architecture
  • Reducing critical path length by inserting
    registers or latches

instruction fetch
instruction decode
write
ALU
instruction fetch
instruction decode
ALU
write
12
13
Where does the speed go?
  • Maximum contribution
  • 1.20 logic design and clock skew
  • Reducing levels of logic through complex
    functions
  • reduces area and sometimes reduces speed
  • less overhead due to guard bands and signal wires
  • Generally worse clock skew in ASICs

VDD
GND
GND
13
14
Where does the speed go?
  • Maximum contribution
  • 1.25 good floorplanning and placement
  • Reduce wire lengths by placing connected modules
    nearby

14
15
Where does the speed go?
  • Maximum contribution
  • 1.25 clever sizing of transistors and wires

15
16
Where does the speed go?
  • Maximum contribution
  • 1.50 through use of dynamic logic on critical
    paths
  • Avoid slow p-transistor chains, reduced area

16
17
Where does the speed go?
  • Maximum contribution
  • 2.00 due to process variation and accessibility

ASICworst case, worst process
fastest custom bin
produced
2.0
ASIC libraries may lag technology improvements
speed
17
18
Full Range from ASIC to Custom
  • Maximum contribution summary
  • 4.20 architecture
  • 1.20 logic design and clock skew
  • 1.25 good floorplanning and placement
  • 1.25 clever sizing of transistors and wires
  • 1.50 through dynamic logic on critical paths
  • 2.00 due to process variation and accessibility
  • Good custom might be 23.6 better than bad ASIC.
  • Your mileage may vary!

23.6
18
19
Full Range from ASIC to Custom
  • Maximum contribution summary
  • 4.20 architecture
  • 1.20 logic design and clock skew
  • 1.25 good floorplanning and placement
  • 1.25 clever sizing of transistors and wires
  • 1.50 through dynamic logic on critical paths
  • 2.00 due to process variation and accessibility
  • Good custom might be 23.6 better than bad ASIC.
  • Lets look at all that more carefully

23.6
19
20
First the facts Critical Path Delay
  • Delay is a function of
  • Gate and wire delays

data
Tclock1
Q2
Q1
critical path, 5 logic levels
Tclock1
Tclock2
clock
20
21
Critical Path Delay
  • Delay is a function of
  • Gate and wire delays
  • Data stable during
  • Setup time, before clock

data
Tclock1
Q2
Q1
critical path, 5 logic levels
Tclock1
Tclock2
clock
21
22
Critical Path Delay
  • Delay is also a function of
  • Clock skew

data
Tclock1
Tclock2
Q2
Q2
clock skew
Q2
Q1
Tclock1
Tclock2
clock
22
23
Critical Path Delay
  • Delay is also a function of
  • Clock skew
  • Clock-to-Q

data
Tclock1
Tclock2
Q2
Q2
Q1
Tclock1
Tclock2
clock
23
24
1. Architecture
  • Increase speed by reducing the critical path
    length
  • Pipeline add latches between gates
  • Must balance pipeline stages to maximize gain

instruction fetch
instruction decode
write
ALU
instruction fetch
instruction decode
ALU
write
If we add 5 stages, why is speed-up 4 and not 5?
24
25
Pipelining Comparison
  • Compare in-order execution
  • estimate latch, clock skew overheads of
    20(overhead in 0.35 um Alpha 21264)
  • ASIC, Xtensa
  • Pipelined 4.0 1.0 ns overhead 5ns cycle
  • Unpiplined 5 4.0 1.0 (overhead) 21.0 ns
    cycle
  • Creating five pipeline stages in Xtensa gives
    4.2
  • Speed-up is less due to pipelining overheads
  • Latch delay
  • Clock skew
  • Limited number of pipeline stages
  • More stages increases cost of branch
    misprediction, stalls

25
26
Can we improve the architecture
andmicro-architecture of ASICs?
  • Not always Fundamental problem in some
    applications
  • PCI Bus interface has cycle-to-cycle dependency
  • No opportunity for pipelining
  • Bottom line
  • Unpipelined ASICs lose factor of 4.20
  • Compared with custom and pipelined ASICs

23.6
26
27
But, some ASICs can be pipelined!
  • If we can perform instructions in parallel for
    application, then pipeline
  • Five stages in Xtensa
  • 4.2X speed-up

23.6
4.20
27
28
2. Better Logic Design in Custom
  • Custom designs can have specially designed
    structures
  • Reducing levels of logic through complex
    functions
  • Reduces area and sometimes reduces speed
  • Less overhead as less guard banding and signal
    wires
  • Superior design of regular logic like adders,
    multipliers
  • Incorporate logic in latches
  • Reduce latch overhead

28
29
Clock Skew, Latch Design Comparison
  • Greater clock skew in ASICs, contributing 1.10
  • Best ASIC 5, 250 ps at 250 MHz
  • Xtensa in 0.25 um (at typical speeds, typical
    process)
  • Custom 5, 75 ps at 600 MHz
  • Alpha 21264 in 0.35 um
  • Better latch design would also impact pipelining
  • E.g. if could have 0.2 ns custom overhead (1.0 ns
    for ASIC)
  • ASIC 5.0 ns cycle ? 4.2 ns cycle
  • 1.20 is due to logic design and clock skew

clock skew
29
30
Can we improve ASIC logic design?
  • Add custom macros to ASIC library
  • Drawback takes time to design macros
  • Reuse, amortizing design time
  • Limited by design overhead of macros that wont
    be reused
  • Designer must ensure ASIC description invokes
    predefined macros

custom ingredients
23.6
adder
barrel shifter
register file
MAC
30
31
Can we improve ASIC logic design?
  • Add custom macros to ASIC library
  • Drawback takes time to design macros
  • Reuse, amortizing design time
  • Limited by design overhead of macros that wont
    be reused
  • Designer must ensure ASIC description invokes
    predefined macros

custom ingredients
23.6
adder
1.20
barrel shifter
register file
MAC
31
32
3. Floorplanning and Placement
  • Increase speed by avoiding cross-chip critical
    path wires
  • Place interconnected modules nearby
  • Long wires in ASICs due to poor final placement
    of modules
  • Impact of long wires BACPAC 0.25 um ASIC, 12
    million transistors
  • With shorter wires design would be 1.25 faster

32
33
Can ASICs improve floorplanning?
  • Use good ASIC floorplanning tools
  • Improve tool recognition of similar structures
    that can be abutted and tiled
  • Do about as well as custom in this regard

23.6
33
34
Can ASICs improve floorplanning?
  • Use good ASIC floorplanning tools
  • Improve tool recognition of similar structures
    that can be abutted and tiled
  • Do about as well as custom 1.25x

23.6
1.25
34
35
4. Transistor and Wire Sizing
  • Reduce gate and wire delays on critical path
  • Size gate output to drive load of fan-out gates
    and wires
  • Size up transistors to drive large loads
  • Widen wires to decrease resistance
  • Delay is proportional to resistance capacitance

35
36
Impact of poor library design
  • ASIC standard cell library has discrete gate
    sizes
  • Some libraries used with insufficient range of
    gate drives
  • One or two sizes per cell
  • Few inverters, few buffers
  • Single gate polarities (less compact)
  • Tools for sizing wires in ASIC designs not
    available
  • After layout, resizing transistors knowing layout
    can give up to 20 improvement Gavrilov 97
  • Custom gains about a factor of 1.25 due to these
    problems.

36
37
Can we improve ASIC sizing?
ASIC libraries can be improved
  • Use library with dual polarities,several (e.g.
    6) drive strengths per cell

23.6
37
38
Can we improve ASIC sizing?
ASIC libraries can be improved
  • Use library with dual polarities,several (e.g.
    6) drive strengths per cell

23.6
1.20
  • Custom still about 1.05 better.
  • Iterative transistor sizing and resynthesis
  • Can improve speed by up to 20 Gavrilov ICCAD97

38
39
5. Dynamic Logic
  • Using dynamic logic on critical paths
  • Avoids slow p-transistor chains
  • Higher speed
  • Reduces area
  • Only pull down network, and charging transistors
  • Dynamic logic increases speed by about
    1.50Nowka ICCD98

slow p-chain
VDD
GND
GND
clock
domino logic
static CMOS
39
40
Dynamic Logic in ASICs?
  • Dynamic logic requires careful design
  • Glitching causes incorrect result
  • More susceptible to noise
  • Precharge power spike
  • Careful design of power supply for dynamic
  • Static CMOS is lower power
  • ASIC tools are unable to support dynamic logic
  • Dynamic logic libraries not available
  • Unable to use library driven static timing
    analysis
  • Interface of dynamic and static logic is
    complicated

23.6
Custom remains 1.50 better.
40
41
Dynamic Logic in ASICs?
  • Dynamic logic requires careful design
  • Glitching causes incorrect result
  • More susceptible to noise
  • Precharge power spike
  • Careful design of power supply for dynamic
  • Static CMOS is lower power
  • ASIC tools are unable to support dynamic logic
  • Dynamic logic libraries not available
  • Unable to use library driven static timing
    analysis
  • Interface of dynamic and static logic is
    complicated

23.6
cant improve
Custom remains 1.50 better.
41
42
But Dynamic Logic in Custom?
  • Dynamic logic problems more pronounced in deep
    submicron
  • Power dissipation
  • Power consumption limited by supply
  • Heat dissipation limited by packaging
  • More noise
  • Higher frequencies cause more noise
  • More cross-talk noise as wires are closer
  • Longer design times than static CMOS
  • Prohibitive for progressively larger designs
  • Dynamic logic likely to lose its advantages by
    100 nm.
  • (Sorry Mark )

23.6
42
43
Dynamic Logic in Custom?
  • Dynamic logic problems more pronounced in deep
    submicron
  • Power dissipation
  • Power consumption limited by supply
  • Heat dissipation limited by packaging
  • More noise
  • Higher frequencies cause more noise
  • More cross-talk noise as wires are closer
  • Longer design times than static CMOS
  • Prohibitive for progressively larger designs
  • Dynamic logic likely to lose its advantages by
    100 nm.
  • (Sorry Mark )

15.7
43
44
6. Process Variation and Accessibility
  • ASIC libraries calculate worst case speeds for
    process
  • Speeds off a line may vary by 20 to 40
  • Less variation in a mature process
  • Custom designs can down-bin the slower chips

fast custom, rest slower
good yield
ASICworst case, worst process
produced
1.2
1.4
speed
44
45
Process Variation and Accessibility
  • ASIC libraries calculate worst case speeds for
    process
  • Speeds off a line vary by 20 to 40
  • Less variation in a mature process
  • Custom designs can down bin the slower chips
  • Could run ASICs faster than worst case speeds,
    with high yield

acceptable ASIC yield
ASICworst case, worst process
produced
1.2
speed
45
46
Process Variation and Accessibility
  • ASIC libraries calculate worst case speeds for
    process
  • Speeds off a line vary by 20 to 40
  • Less variation in a mature process
  • Custom designs can down bin the slower chips
  • Could run ASICs faster than worst case speeds,
    with high yield
  • Fabrication plants vary in speed by up to 25
    Tensilica Xtensa modeling

ASIC worst case
Fab A
Fab B
produced
1.2
speed
46
47
Process Variation and Accessibility
  • ASIC libraries may not keep up with process
    improvements
  • Technology improvements
  • Intel 0.25 um 856 process had 18 speed
    improvement, over the life of the process
    generation

acceptableASIC yield
ASIC worst case
Improved process
Fab A
Fab B
produced
1.2 x 1.18
1.4
speed
ASIC libraries may lag technology improvements
47
48
Process Variation and Accessibility
  • Total difference of 2.00 between
  • worst case ASIC speeds on worst process, with
    original library (lagging process improvements)
  • and fast custom with fully up-to-date technology

acceptableASIC yield
fast customs,rest slower
fastcustoms,restslower
ASIC worst case
Fab A
Fab B
produced
2.0
speed
ASIC libraries may lag technology improvements
48
49
Process Variation and Accessibility
  • Can run ASICs faster than worst case speeds
  • Test what speed can run at with high yield
  • Improve speed by 30 to 40
  • Xtensa can run at 250 MHz in 0.25 um
  • Choose good fabrication company
  • May be more expensive
  • 20 better than worst processes in technology
    Tensilica Xtensa modeling
  • Bottom line
  • ASICs in a slow process, at worst case speeds,
    lose factor of 2.00

23.6
49
50
Process Variation and Accessibility
  • Can run ASICs faster than worst case speeds
  • Test what speed can run at with high yield
  • Improve speed by 30 to 40
  • Xtensa can run at 250 MHz in 0.25 um
  • Choose good fabrication company
  • May be more expensive
  • 20 better than worst processes in technology
    Tensilica Xtensa modeling
  • Bottom line
  • ASICs in a slow process, at worst case speeds,
    lose factor of 2.00
  • But ASICs in a good process, running at better
    than worst case speeds, get within 20 of custom
    (gain 1.66x relative through best practice)

1.66
23.6
50
51
Custom advantages over best ASIC practice
  • 1.20 logic design and clock skew
  • 1.05 clever sizing of transistors and wires
  • 1.50 (today) dynamic logic on critical paths
  • 1.20 process variation and accessibility
  • Custom still 2.3 better!

2.3
51
52
Custom advantages over best ASIC practice
  • Custom advantages relative to best ASIC methods
  • 1.20 logic design and clock skew
  • 1.05 clever sizing of transistors and wires
  • 1.50 (today) dynamic logic on critical paths
  • 1.20 process variation and accessibility

2.3
But custom only 1.5 faster at 100 nm if
dynamic logic not viable.
1.5
52
53
Another look (NB big is BAD!)
4.20
2.00
1.50
1.25
53
54
Punch Line
  • ASIC performance lags custom by 8
  • Attention typically focused on detailed circuit
    design and layout as primary reason
  • Our work indicates that
  • Architecture, logic design and clock skew1.20
    to 5.00,
  • And processing 1.20 to 2.00
  • play a much larger role
  • and custom circuit design and layout offer only
    about 1.30
  • Dynamic logic is one other significant factor in
    why custom designs can do better 1.50
  • ASIC/custom gap will narrow further (x 1.20
    1.50) if custom loses dynamic logic advantage in
    small geometries
  • Response to Bill Yes, its easy to make really
    slow ASICs if you have a critical path with long,
    unbuffered wires, even if you have a good
    architecture manufactured in a fast process.

54
55
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com