Prediction of High-Performance On-Chip Global Interconnection - PowerPoint PPT Presentation

About This Presentation
Title:

Prediction of High-Performance On-Chip Global Interconnection

Description:

Synthesized compact circuit model [Kopcsay02] - Study signal integrity issue. 2D-C Extraction Template 2D-R(f)L(f) ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 38
Provided by: kuan
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Prediction of High-Performance On-Chip Global Interconnection


1
Prediction of High-Performance On-Chip Global
Interconnection
  • Yulei Zhang1, Xiang Hu1, Alina Deutsch2, A. Ege
    Engin3
  • James F. Buckwalter1, and Chung-Kuan Cheng1
  • 1Dept. of ECE, UC San Diego, La Jolla, CA
  • 2IBM T. J. Watson Research Center, Yorktown
    Heights, NY
  • 3Dept. of ECE, San Diego State Univ., San Diego,
    CA

2
Outline
  • Introduction
  • Technology trend
  • Current approaches
  • On-Chip Global Interconnection
  • Overview structures, tradeoffs
  • Interconnect schemes
  • Global wire modeling
  • Performance analysis
  • Design Methodologies for T-line schemes
  • Prediction of Performance Metrics
  • Experimental settings
  • Performance metrics comparison and scaling trend
  • Latency
  • Energy per bit
  • Throughput
  • Signal Integrity
  • Conclusion

3
Introduction Performance Impact
  • Interconnect delay determines the system
    performance ITRS08
  • 542ps for 1mm minimum pitch Cu global wire w/o
    repeater _at_ 45nm
  • 150ps for 10 level FO4 delay _at_ 45nm

Ho2001 Future of Wire
4
Introduction Power Dissipation
  • Interconnects consume a significant portion of
    power
  • 1-2 order larger in magnitude compared with gates
  • Half of the dynamic power dissipated on repeaters
    to minimize latency Zhang07
  • Wires consume 50 of total dynamic power for a
    0.13um microprocessor Magen04
  • About 1/3 burned on the global wires.

5
Introduction Different Approaches and Our
Contributions
  • Different Approaches
  • Repeater Insertion Approach
  • Pros High throughput density.
  • Cons Overhead in terms of power consumption and
    wiring complexity.
  • T-line Approach Zhang09
  • Pros Low latency.
  • Cons low throughput density due to low bandwidth
    and large wire dimension
  • Equalized T-line Approach Zhang08
  • Pros Low power, Low noise, Higher throughput
    than single-ended.
  • Cons The area overhead brought by passive
    components.
  • We explore different global interconnection
    structures and compare their performance metrics
    across multiple technology nodes.
  • Contributions
  • A simple linear model
  • A general design framework
  • A complete prediction and comparison

6
Organization of On-Chip Global Interconnections
7
Multi-Dimensional Design Consideration
  • Preliminary analysis results assuming 65nm CMOS
    process.
  • Application-oriented choice
  • Low Latency
  • T-TL or UT-TL -gt Single-Ended T-lines
  • High Throughput
  • R-RC
  • Low Power
  • PE-TL or UE-TL
  • Low Noise
  • PE-TL or UE-TL
  • Low Area/Cost
  • R-RC

Differential T-lines
For each architecture, the more area the pentagon
covers, the better overall performance is
achieved.
8
On-Chip Global Interconnect Schemes (1)
  • R-RC structure
  • Repeater size/Length of segments
  • Adopt previous design methodology Zhang07
  • UT-TL structure
  • Full swing at wire-end
  • Tapered inverter chain as TX
  • T-TL structure
  • Optimize eye-height at wire-end
  • Non-Tapered inverter chain as TX

Repeated RC wires (R-RC)
Un-Terminated and Terminated T-Line (UT-TL and
T-TL)
9
On-Chip Global Interconnect Schemes (2)
Un-Equalized and Passive-Equalized T-Line (UE-TL
and PE-TL)
  • Driver side Tapered differential driver
  • Receiver side Termination resistance,
    Sense-Amplifier (SA) inverter chain
  • Passive equalizer parallel RC network
  • Design Constraint enough eye-opening (50mV)
    needed at the wire-end

10
Global Wire Modeling Single-Ended
Differential On-Chip T-lines
  • Orthogonal layers replaced by ground planes -gt 2D
    cap extraction, accurate when loading density is
    high.
  • Top-layer thick wires used -gt dimension maintains
    as technology scales.
  • LC-mode behavior dominant

Determine the bit rate
  • Smallest wire dimensions that satisfy eye
    constraint
  • Notice PE-TL needs narrower wire -gt Equalization
    helps to increase density.

11
Global Wire Modeling RC wires and T-lines
  • RC wire modeling
  • T-line 2D-R(f)L(f)C parameter extraction
  • T-line Modeling
  • R(f)L(f)C Tabular model -gt Transient simulation
    to estimate eye-height.
  • Synthesized compact circuit model Kopcsay02 -gt
    Study signal integrity issue.
  • Distributed ? model composed of wire resistance
    and capacitance
  • Closed-form equations Sim03 to calculate 2D
    wire capacitance

2D-C Extraction Template
2D-R(f)L(f) Extraction Template
12
Performance Analysis Definitions
  • Normalized delay (unit ps/mm)
  • Propagation delay includes wire delay and gate
    delay.
  • Normalized energy per bit (unit pJ/m)
  • Bit rate is assumed to be the inverse of
    propagation delay for RC wires
  • Normalized throughput (unit Gbps/um)

13
Performance Analysis Latency
  • Variables technology-defined parameters
  • Supply voltage Vdd (unit V)
  • Dielectric constant
  • Min-sized inverter FO4 delay (unit ps)
  • R-RC structure (min-d)
  • is roughly constant
  • FO4 delay scales w/ scaling factor S
  • T-line structures
  • Sum of wire delay and TX delay
  • Wire delay
  • TX delay improved w/ FO4 delay

Decreasing w/ technology scaling!
Increasing w/ technology scaling!
14
Performance Analysis Energy per Bit
  • Same variables defined before

Constant !
  • R-RC structure (min-d)
  • Vdd reduces as technology scales
  • reduces as technology scales
  • T-line structures
  • Sum of power consumed on wire and TX.
  • Power of T-line
  • Power of TX circuit
  • FO4 delay reduces exponentially

Energy decreases w/ technology scaling!
Energy decreases w/ larger slope!!
15
Performance Analysis Throughput
  • Same variables defined before
  • R-RC structure (min-d)
  • Assuming wire pitch
  • FO4 delay reduces exponentially
  • T-line structures
  • TX bandwidth
  • Neglect the minor change of wire pitch
  • K1 0, for UT-TL
  • FO4 delay reduces exponentially

Throughput increases by 20 per generation!
Throughput increases by 43 per generation !!
16
Design Framework for On-Chip T-line Schemes
  • Proposed framework can be applied to design
    UT-TL/T-TL/UE-TL/PE-TL by changing wire
    configuration and circuit structure.
  • Different optimization routines (LP/ILP/SQP, etc)
    can be adopted according to the problem
    formulation.

17
Experimental Settings
  • Design objective min-d
  • Technology nodes 90nm-22nm
  • Five different global interconnection structures
  • Wire length 5mm
  • Parameter extraction
  • 2D field solver CZ2D from EIP tool suite of IBM
  • Tabular model or synthesized model
  • Transistor models
  • Predictive transistor model from Uemura06
  • Synopsys level 3 MOSFET model tuned according to
    ITRS roadmap
  • Simulation
  • HSPICE 2005
  • Modeling and Optimization
  • Linear or non-linear regression/SQP routine
  • MATLAB 2007

18
Performance Metric Normalized Delay Results
and Comparison
  • Technology trends
  • R-RC ?
  • T-line schemes ?
  • T-line structures
  • Outperform R-RC beyond 90nm
  • Single-ended lowest delay
  • At 22nm node
  • R-RC 55ps/mm
  • T-lines 8ps/mm (85 reduction)
  • Speed of light 5ps/mm
  • Linear model
  • lt 6 average percent error

19
Performance Metric Normalized Energy per Bit
Results and Comparison
  • Technology trends
  • R-RC and T-lines ?
  • T-lines reduce more quickly
  • T-line structures
  • Outperform R-RC beyond 45nm
  • Differential lowest energy.
  • Single-ended similar to R-RC.
  • T-TL gt UT-TL
  • At 22nm node
  • R-RC 100pJ/m
  • Single-ended 60 reduction
  • Differential 96 reduction
  • Linear model
  • lt 12 average percent error
  • Error for T-TL and PE-TL
  • RL and passive equalizers.

20
Performance Metric Normalized Throughput
Results and Comparison
  • Technology trends
  • R-RC and T-lines ?
  • T-lines increase more quickly
  • T-line structures
  • Outperform R-RC beyond 32nm
  • Differential better than single-ended
  • At 22nm node
  • R-RC 12Gbps/um
  • T-TL 30 improvement
  • UE-TL 75 improvement
  • PE-TL 2X of R-RC
  • Linear model
  • lt 7 average percent error

21
Signal Integrity single-ended T-lines
Worst-case switching pattern for peak noise
simulation
Using w.c. pattern
Using single or multiple PRBS patterns
  • UT-TL structure
  • 380mV peak noise at 1V supply voltage w/ 7ps rise
    time
  • SI could be a big issue as supply voltage drops
  • T-TL less sensitive to noise
  • At the same rise time, 50 reduction of peak
    noise
  • Peak noise ? as technology scales

22
Signal Integrity differential T-lines
Worst-case switching pattern for peak noise
simulation
  • More reliable
  • Termination resistance
  • Common-mode noise reduction
  • Peak noise
  • Within 10mV range
  • Eye-Heights
  • UE-TL
  • Eye reduces as bit rate ?
  • Harder to meet constraint.
  • PE-TL
  • gt 70mV eye even at 22nm node
  • Equalization does help!

23
Conclusion
  • Compare five different global interconnections in
    terms of latency, energy per bit, throughput and
    signal integrity from 90nm to 22nm.
  • A simple linear model provided to link
  • Architecture-level performance metrics
  • Technology-defined parameters
  • Some observations from experimental results
  • T-line structures have potential to replace R-RC
    at future node
  • Differential T-lines are better than single-ended
  • Low-power/High-throughput/Low-noise
  • Equalization could be utilized for on-chip global
    interconnection
  • Higher throughput density, improve signal
    integrity
  • Even w/ lower energy dissipation (passive
    equalizations)

24
  • Thank you!
  • Q A

25
  • Back Up Slides

26
Introduction Technology Trend
  • On-Chip Interconnect Scaling
  • Dimension shrinks
  • Wire resistance increases -gt RC delay
  • Increasing capacitive coupling -gt delay, power,
    noise, etc.
  • Performance of global wires decreases w/
    technology scaling.

Wire Category Wire Category Technology Node Technology Node Technology Node
Wire Category Wire Category 90nm 45nm 22nm
M1 Wire Rw(kohm/mm) 1.914 8.860 34.827
M1 Wire Cw(pF/mm) 0.183 0.157 0.129
Global Wire Rw(kohm/mm) 0.532 2.970 11.000
Global Wire Cw(pF/mm) 0.205 0.179 0.151
Scaling trend of PUL wire resistance and
capacitance
Copper resistivity versus wire width
27
Design methodology single-ended T-lines
2D frequency-dependent tabular Model
Inverter size, number of stages, Rload (if any)
Single-ended Inverter chains
SPICE simulation
SPICE simulation to evaluate. Optimization
Routine 1. Optimal cycle time 2. Sweep for
optimal inverter chain
SPICE simulation to check in-plane crosstalk, etc
28
Design methodology differential T-lines
2D frequency-dependent Tabular Model
Wire width Driver impedance RC equalizer (if
any) Termination resistance.
Differential lines SA-based TX
Closed-form equation-based model
Evaluation based on models. Optimization
Routine 1. Binary search for wire width 2. SQP
for other var. optimization
SPICE simulation to check in-plane crosstalk, etc
29
Effects of driver impedance and termination
resistance
  • Lowering driver impedance improves eye
  • Eye reduces as frequency goes up
  • Optimal termination resistance.

30
Effects of driver impedance and termination
resistance on step response
  • Optimal Rload
  • Larger driver impedance leads to slower rise edge
    and lower saturation voltage
  • Larger termination resistance causes sharper rise
    edge but with larger reflection

31
Crosstalk effects
  • Three different PRBS input patterns, min-ddp
    solutions
  • T-line Scheme A Delay increased by 9.6, Power
    increased by 37
  • T-line Scheme B Delay increased by 2, Power
    increased by 25.7

32
Transceiver Design
  • Sense amplifier (SA)
  • Double-tail latch-type Schinkel 07
  • Optimize sizing to minimize SA delay
  • Inverter chain
  • Number of stage
  • Fixed to 6
  • Sizing of each inverter
  • RS output resistance of inverter chain
  • Sweep the 1st inverter size to minimize the total
    transceiver delay for given Veye, RS

Double-tail latch-type voltage sense amp.
_at_45nm tech node M1/M3 45nm/45nm M2/M4
250nm/45nm M5/M6 180nm/45nm M7/M8
280nm/45nm M9 495nm/45nm M10/M11
200nm/45nm M12 1.58um/45nm
33
Transceiver Modeling
  • Driver side
  • Voltage source Vs with output resistance Rs
  • Vs full-swing pulse signal with rise time
    Tr0.1Tc
  • Rs output resistance of the last inverter in the
    chain.
  • Receiver side
  • Extract look-up table for TX delay and power
  • Fit the table using non-linear closed form
    formula
  • The relative error is within 2 for fitting models

Histogram of fitting errors at 45nm node
Transceiver delay map at 45nm node
Transceiver power map at 45nm node
34
Bit-rate 50Gbps Rs11.06ohm, Rd350ohm,
Cd0.38pF, RL107.69ohm
35
Conclusion (cont)
Low-Latency Application (ps/mm)
Low-Energy Application (pJ/m)
Tech Node
Tech Node
90nm 65nm 45nm 32nm 22nm
R-RC 3/35 1/42 1/46 1/55 1/55
UT-TL 5/15 5/13 5/10 5/9 5/8
T-TL 5/15 5/13 5/10 5/9 5/8
UE-TL 1/37 3/25 3/16 3/12 5/8
PE-TL 1/37 3/25 3/16 3/12 5/8
90nm 65nm 45nm 32nm 22nm
R-RC 2/150 2/140 1/130 1/100 1/100
UT-TL 3/140 3/110 3/70 3/50 2/40
T-TL 1/260 1/200 2/100 2/60 3/40
UE-TL 4/60 4/36 4/20 4/10 5/4
PE-TL 5/26 5/16 5/8 5/5 5/2
Schemes
Schemes
High-Throughput Application (Gbps/um)
Low-Noise Application
Tech Node
90nm 65nm 45nm 32nm 22nm
R-RC 1 1 1 1 1
UT-TL 1 1 1 1 1
T-TL 3 3 3 3 3
UE-TL 5 5 4 4 4
PE-TL 4 4 5 5 5
Tech Node
90nm 65nm 45nm 32nm 22nm
R-RC 5/5 5/6 3/8 3/10 2/12
UT-TL 2/3.3 1/3.3 1/3.3 1/3.3 1/3.3
T-TL 1/3 2/3.4 2/6 2/9 3/16
UE-TL 3/3 3/5 4/9 4/13 4/21
PE-TL 4/4 4/5.3 5/9 5/15 5/24
Schemes
Schemes
Item in the table score/value. Score the
higher, the better in terms of given metric, max.
score is 5. The best structure in each column
marked using red color.
36
Future Works
  • Explore novel global signaling schemes for high
    throughput and low energy dissipation.
  • Design, optimize gt 50Gbps on-chip interconnection
    schemes
  • Architecture-level study to identify trade-offs
  • Wire configuration
  • Dimension optimization, ground plane, etc.
  • Un-interrupted architectures
  • Equalization implementation, TX/RX choice
  • Distributed architectures
  • Active or Passive compensation (RC equalizers,
    other networks, etc)
  • Novel high-speed transceiver circuitry design
  • Develop analysis and optimization capability to
    aid co-design and co-optimization of wire and
    transceiver circuit
  • Fabrication to verify analysis and demonstrate
    feasibility

37
Related Publications
Repeated RC Wire
  • L. Zhang, H. Chen, B. Yao, K. Hamilton, and C.K.
    Cheng, Repeated on-chip interconnect analysis
    and evaluation of delay, power and bandwidth
    metrics under different design goals, IEEE
    International Symposium on Quality Electronic
    Design, 2007, pp.251-256.
  • Y. Zhang, L. Zhang, A. Deutsch, G. A. Katopis, D.
    M. Dreps, J. F. Buckwalter, E. S. Kuh and C.K.
    Cheng, Design Methodology of High Performance
    On-Chip Global Interconnect Using Terminated
    Transmission-Line, IEEE International Symposium
    on Quality Electronic Design, 2009, pp.451-458.
  • Y. Zhang, L. Zhang, A. Tsuchiya, M. Hashimoto,
    and C.K. Cheng, On-chip high performance
    signaling using passive compensation, IEEE
    International Conference on Computer Design,
    2008, pp. 182-187.
  • Y. Zhang, L. Zhang, A. Deutsch, G. A. Katopis, D.
    M. Dreps, J. F. Buckwalter, E. S. Kuh, and C. K.
    Cheng, On-chip bus signaling using passive
    compensation, IEEE Electrical Performance of
    Electronic Packaging, 2008, pp. 33-36.
  • L. Zhang, Y. Zhang, A. Tsuchiya, M. Hashimoto, E.
    Kuh, and C.K. Cheng, High performance on-chip
    differential signaling using passive compensation
    for global communication, Asia and South
    Pacific Design Automation Conference, 2009, pp.
    385-390.
  • Y. Zhang, X. Hu, A. Deutsch, A. E. Engin, J. F.
    Buckwalter, and C. K. Cheng, Prediction of
    High-Performance On-Chip Global Interconnection,
    ACM workshop on System Level Interconnection
    Prediction, 2009

Un-Terminated/Terminated T-Line
Passive-Equalized T-Line
Overview and Comparison
Write a Comment
User Comments (0)
About PowerShow.com