Master slide - PowerPoint PPT Presentation

About This Presentation
Title:

Master slide

Description:

14 26.00 1 16 24105 0.2999 0.2114. 15 29.00 1 13 25139 0.3000 0.2124. 16 32.00 1 11 26800 0.2997 0 ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 40
Provided by: Nudjarin5
Category:
Tags: master

less

Transcript and Presenter's Notes

Title: Master slide


1
A Comprehensive Look at System Level Modeling

Ken Rose, Bibiche Geuskens, Ramon Mangaser,
Christopher Mark
Center for Integrated Electronics and Electronics
Manufacturing Department of Electrical, Computer
and Systems Engineering Rensselaer Polytechnic
Institute Troy, NY 12180-3590 rosek_at_rpi.edu 518.27
6.2981
2
RIPE
Rensselaer Interconnect Performance Estimator
RIPE 3.0 models are described in Modeling
Microprocessor Performance by B. Geuskens and K.
Rose, Kluwer, 1998. It is available for use on
line at http//latte.cie.rpi.edu/ripe.html RIPE
was developed with partial support from IBM and
SRC.
3
Co-Authors
  • Bibiche Geuskens (RIPE 1.0, 2.0, 3.0) PhD. June
    1997

Intel Corporation, Hillsboro, Oregon
  • Ramon Mangaser (RIPE 3.1, 4.0, 4.1) PhD. Nov.
    1999

Sun Microsystems, Chelmsford, Massachusetts
  • Christopher Mark (RIPE 4.2) PhD. Sep. 2000

Intel Corporation, Hillsboro, Oregon
4
RIPE Genesis
  • H.B. Bakoglu
  • Circuits, Interconnections, and Packaging for
    VLSI
  • Addison-Wesley, 1990.
  • SUSPENS model coded in RIPE 1.0
  • G. A. Sai-Halasz
  • Proc. IEEE, 83/1, p. 20, 1995.
  • Basis for RIPE 2.0

5
RIPE 3.0 Inputs and Outputs
6
RIPE 3.0 Sample Benchmark (DEC Alpha 21164)
RIPE INPUTS
System Parameters
Technology Parameters
Chip Area cm2 2.99 Number of Transistors M
9.3 SRAM KBytes 112 Signal I/O 294 (Logic
Depth 14, 15)
Feature Size mm 0.5 Number of Wire Levels
4 Power Supply V 3.3
Data W.J. Bowhill et al., Dig. Tech. Journal
ISSCC 1996
7
Cycle Time Estimation Model (Ch. 7)
Sai-Halasz (1995) Sakurai (1993)
8
RC Interconnect Parameters (Ch. 3)
Interconnect Resistance (3.1) R reff lint /A
wint2 A Aspect Ratio
Interconnect Capacitance (3.2) C 2(CV CL)
l 2eeff e0 lint wint (1/TILD A/Swire) TILD
Thickness of Interlevel Dielectric Swire
Spacing between wires Yang (1998)
9
Transistor Count and Area Models (Ch. 4)
Processor Logic, Memory, and I/O Buffers are
treated separately
Transistors Area Alpha 21164 9.3 M 299
mm2 Memory 6.7 101 I/O -------- 17
Random Logic 2.6 M 181 mm2
Gates
Transistors
Average Logic Gate Size
Logic Area
10
Logic Wireability (Ch. 5)
R(Ng ,p) average interconnect length in gate
pitches Based on Rents rule for the number of
pins, Np Kp (Ng)p lw long wire length 2
(Alogic)1/2 Nw number of long wires
fg/(fg1) Nptotal where Nptotal is the total
number of pins for functional blocks and fg is
the average logic gate fanout.
11
Device Parameters (Ch. 6)
We need to have values for transistor resistors
and capacitors, Rdr and Cdr . These have been
superseded in RIPE 4.0.
Cycle Time Estimation Model (Ch. 7)
Tcycle (fld 1) Tgavg 2Tginv
time_of_flight where fld is the logic depth
12
Power Dissipation (Ch. 8)
  • Ptot fd Ctot Vdd Vswing fc Isc Vdd Ileak
    Vdd
  • l Si (fdi Csw,i) Vdd2 fc
  • where fd is the activity factor.
  • 1. random logic fd Csw,rl
  • 2. clock distribution fd,clk Csw,clk
  • 3. memory fd Csw,mem
  • 4. interconnections fd Csw,int
  • 5. off-chip drivers fd Csw,dr
  • For the Alpha 21164 fd,clk 0.75, fd 0.15
    based on published details.

13
RIPE 3.0 Sample Benchmark (DEC Alpha 21164)
RIPE Results Al/SiO2
RIPE Results Cu/SiO2
Actual
Memory Transistors 6.73 M 7.2M 6.73 M Area
memory 1.01 cm2 1.02 cm2 1.01 cm2 Pad ring
area 0.16 cm2 0.17 cm2 0.16 cm2 Clock
frequency 291 MHz 300 MHz 373 MHz Power
Dissipation 52 W 50 W 66 W Power clock
distribution 21 W 20 W 27 W
14
RIPE 3.0 Benchmark Results
Processor Chip Parameters Actual RIPE
Alpha 21164 (0.5 mm CMOS) Clock frequency (MHz) 300 290
Alpha 21164 (0.5 mm CMOS) Power dissipation (W) 50 52
Alpha 21164 (0.5 mm CMOS) Number of metal levels 4 4
Pentium (0.6 mm BiCMOS) Clock frequency (MHz) 150 152
Pentium (0.6 mm BiCMOS) Power dissipation (W) 15-20 19
Pentium (0.6 mm BiCMOS) Number of metal levels 4 4
PowerPC 604 (0.5 mm static CMOS) Clock frequency (MHz) 150 150
PowerPC 604 (0.5 mm static CMOS) Power dissipation (W) 18 18
PowerPC 604 (0.5 mm static CMOS) Number of metal levels 4 4
15
RIPE Simulation Modes RIPE 3.0 to RIPE 4.0
Performance Estimator
Clock
Frequency,
Wiring
-n and -d modes
RIPE 3.0
Power,
Strategy
Wireability
Wiring Allocator
Wiring
Clock
-aw mode
RIPE 4.0
Strategy
Frequency
16
Intel Wiring Distribution Model
  • Nets / D Nets l B Lnetsb , b -1.65
  • Nets l A (Transistors), A l 0.25
  • S. Yang, MRS Symposium on Advanced Interconnects,
    April 1998.
  • Nets B/(b 1) Lmaxb 1 - Lminb1
  • Demand B/(b 2) Lmaxb 2 - Lminb2
  • We have taken Lmax 2 (Logic_Area)1/2
  • and solve the above equations for B and Lmin .

17
Algorithm for RIPE 4.0 Cycle-Time Based Wiring
Allocation
  1. Set the input clock frequency and logic depth.
  2. Use RIPEs critical path model to estimate total
    average delay, including gate and wire delay.
  3. Determine the maximum allowable long wire delay
    by subtracting the total average delay from the
    target cycle time.
  4. Allocate wires using this maximum total long wire
    delay as a constraint, but allowing a maximum
    number of repeaters.

18
Modifying the Cycle-Time Model for RIPE 4.0
Tcycle fld Tavg Tlong time_of_flight Tavg
0.377(rint cint lint2) 0.693Rgout (Cgout
fg Cgin) Rgout (fg 1)/2 cint lint rint
(fg 1)/2 lint Cgin Tlong 0.377(rint cint
llong2) 0.693Rgout (Cgout Cgin) Rgout
cint llong rint llong Cgin
19
RIPE 4.0 Benchmark Results
Processor Chip Parameters Actual RIPE
Alpha 21164 (0.5 mm CMOS) Clock frequency (MHz) 300 278
Alpha 21164 (0.5 mm CMOS) Power dissipation (W) 50 57
Alpha 21164 (0.5 mm CMOS) Number of metal levels 4 4
Pentium (0.6 mm BiCMOS) Clock frequency (MHz) 100 113
Pentium (0.6 mm BiCMOS) Power dissipation (W) 15-20 19
Pentium (0.6 mm BiCMOS) Number of metal levels 4 4
PowerPC 604 (0.5 mm static CMOS) Clock frequency (MHz) 133 134
PowerPC 604 (0.5 mm static CMOS) Power dissipation (W) 18 20
PowerPC 604 (0.5 mm static CMOS) Number of metal levels 4 4
20
Katmai Wiring Strategy Calculated by RIPE 4.0
Level Pitch rint cint Lmax x0.64?m
?/cm pF/cm mm 1 1.0
3451 2.37 0.006 2-3
1.45 891 2.61 4.4 4
2.5 365 2.40
12.3 5 4.0 158 2.34
20.5 Level Repeaters Level Wiring Total
Wiring for Lmax Efficiency
Efficiency 1 0 0.02
0.02 2-3 0
0.30 0.18 4 2
0.50 0.23 5 3
0.52 0.25
21
RIPE Inclusions
  • BEOL Yield
  • Signal Integrity
  • Electromigration
  • Cache Memory Performance
  • Repeater Insertion
  • Interconnect Inductance
  • Accurate MOSFET Models

22
BEOL Yield in RIPE
  • Critical Area
  • Cube law distribution of defect sizes
  • Poisson distribution of faults
  • Ytotal e-lopen e-lshort

23
Katmai (250 nm Pentium III) Transition to 180nm
Technology
  • Katmai Shrink (Katmai-180)
  • number of transistors 9.5M
  • chip size 1.23 0.62 cm2
  • clock frequency 600 850 MHz
  • metal layers 5 6
  • 4 wiring domains
  • Katmai Shrink and Doubling (Katmai2)
  • number of transistors 19M
  • chip size 1.24 cm2
  • clock frequency 850 MHz
  • metal layers 10
  • 9 wiring domains

24
Contributions of Different Metal Levels to Random
Defect Yields for Katmai and Katmai2
25
Signal Integrity Limits
Sakurai (1993)
26
Vp Comparison between SPICE, Sakurai Model, and
the Modified HP Model for Deschutes (250 nm
Pentium II)
Metal Levels Line Lengths (mm) SPICE (mV) Sakurai (mV) Error Modified HP (mV) Error
M2-M3 0.01 1.3 643 Big 1.25 4
M2-M3 6 403 643 60 436 8
M4 6 300 578 93 314 5
M4 10 369 578 57 399 8
M5 12 321 532 66 335 4
M5 21 382 532 39 411 8
27
Cache Memory Performance
We assume that the cycle time is defined by the
logic subsystem. Calculated cache access times
greater than this cycle time will be flagged and
reported by RIPE. RIPE will then assume that the
cache requires multiple clock cycles for proper
operation. RIPE 4.1 implements the model of Wada
et al. (1992) IEEE JSSC, 27, p. 1147. It can be
linked to the more accurate CACTI model of Wilton
and Jouppi (1996) IEEE JSSC, 31, p. 677.
28
Inductance in RIPE 4.2
  • RIPE has good estimates of wire capacitance (per
    unit length) Geuskens and Rose, 98, Mangaser
    (Ph.D. Thesis), 99
  • Estimate wire inductance from wire capacitance
  • ?Assume homogeneous medium and TEM mode
    propagation
  • Inductance analysis performed in two steps
  • Identification of wiring levels with significant
    inductance effects
  • Incorporate Ismails formulas for an inductance
    figure of merit (FOM) to define upper and lower
    bounds for wire lengths that are susceptible to
    inductance effects on each wiring level
  • Use constant RC values to estimate rise times
    needed in FOM
  • Optimization of inductance-susceptible levels
  • Revert to wire pitch from the last, previous
    wiring level without inductance effects
  • Given long-wire delay constraint, use Ismails
    RLC-based formulas to determine maximum wire
    length (per level)

29
RIPE 4.2 wire level projections using
Cu/low-K(2)
  • Using ITRS99 scaling trends
  • Using RPI and Bohr scaling trends with ITRS99
    clock frequencies
  • ITRS99 scaling trends for MOSFETs, chip size
    and transistor counts are overly aggressive !!

30
A Constant RC Input-Signal-Transition-Inherent
(CRISTI) gate delay model
Constant RC model of an inverter chain
Vdd
Rpu1
Rpu2
Rpu3
Rpd1
Cnode1
Cnode2
Cnode3
Rpd2
Rpd3
For Inverter 2
(assuming ?r??f??)
31
Previous approaches to estimating constant RC
values
  • Resistance

(1) ,
(2)
32
Two general methods of determining constant RC
values
  • Method 1

- Given a full set of SPICE parameters, determine
R and C from SPICE simulations of inverter chains
- Use actual gates, not step or ramp inputs, to
drive inverters under investigation ? better
characterization of RC values
- Use a constant RC input-signal-transition-inhere
nt gate delay model for inverters
  • Method 2

- Given limited MOSFET information, determine R
and C from the CV/I metric
- Use this method to project RC values for deep
sub-micron CMOS technologies
33
C-IRSIM
  • CRISTI model for inverters was extended to
    multi-transistor (gt2) logic gates
  • 3-input NAND gates used initially
  • Focus placed on transistors in series stacks
  • ?Relative topological position and relative
    turn-on order
  • These combined features determine the appropriate
    R and C value for each transistor in a series
    stack
  • Ignoring these features leads to significant
    errors in delay estimation relative to SPICE
  • Elmore delay terms included with ?RC term to
    account for distributed RC effects in complex
    gates
  • CRISTI incorporated into IRSIM ? C-IRSIM

34
C-IRSIM simulation examples
  • 1056-transistor, 6-bit DADDA multiplier circuit
    in 0.18?m technology

35
Significance of good device models
  • Selected cycle-time components from RIPE 4.2
  • Fraction of cycle time consumed by total logic
    delay can be relatively large (0.5-0.66) !! ?
    Devices cannot be neglected altogether
  • Small change in device delay ? potentially big
    change in total wiring levels

36
Conclusions
  • Reasonable estimates can be made of
    microprocessor performance on the basis of
    limited information.
  • Models should be robust with a limited
    number of arbitrary fitting parameters.
  • Interconnect limitations constrain design
    and manufacture.

37
RIPE 4.0 Sample Benchmark
Intels Deschutes (Pentium II) processor
RIPE INPUTS
System Parameters Technology Parameters
Wire Parameters
Circuit Area (mm2) 1.31 Technology
Generation Pitch (mm) 0.64, 0.93 Number of
Transistors (mm) 0.25 0.93, 1.60,
2.56 (M) 7.5 LGATE(mm) 0.18 rint
(?/cm) 3451, 891, SRAM cells (mm2) 10.26
Num. of wire levels 5 891, 365, 158 SRAM
(Kbytes) 32 (Aluminum) cint
(pF/cm) 2.4, 2.6, Signal I/O 242 Core
Supply (V) 1.8 2.6, 2.4, 2.3
RIPE RESULTS ACTUAL Clock Frequency (MHz)
459 450 Power Dissipation (W) 18.7
18.9
38
Wiring strategy results from RIPE 4.1 for a
100nm, Cu/low-K(2) technology using
RPI/Bohr/ITRS99 scaling
  • No inductance analysis
  • Repeaters chosen to maximize chip wireability

39
Wiring strategy results from RIPE 4.2 for a
100nm, Cu/low-K(2) technology using
RPI/Bohr/ITRS99 scaling
  • Inductance analysis performed
  • Repeaters again chosen to maximize chip
    wireability
  • Compromise between maximizing chip wireability
    and minimizing RLC delay
  • Wire inductance reduces the effect of wire
    resistance
  • Smaller wire pitches but longer wire lengths
  • Reduction in total number of wire levels
Write a Comment
User Comments (0)
About PowerShow.com