Power Optimal DualVdd Buffered Tree Considering Buffer Stations and Blockages - PowerPoint PPT Presentation

About This Presentation

Title:

Power Optimal DualVdd Buffered Tree Considering Buffer Stations and Blockages

Description:

No existing algorithms consider dual-Vdd for buffer insertion or buffered tree generation ... Efficient algorithms for power optimality ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 27

Provided by: edaEe

Learn more at: http://eda.ee.ucla.edu

Category:

more less

Transcript and Presenter's Notes

Title: Power Optimal DualVdd Buffered Tree Considering Buffer Stations and Blockages

1
Power Optimal Dual-Vdd Buffered Tree Considering
Buffer Stations and Blockages

King Ho Tam and Lei He
Electrical Engineering Department
University of California, Los Angeles
Sponsors NSF CAREER, UC MICRO (Fujitsu, Intel
and Mindspeed), and IBM Faculty Partner Award.

2
Motivation

Increasing interconnect power
35 cells are buffers at 65nm technology Saxena,
TCAD 04
Previous work
Power-optimal single Vdd buffer
insertionLillis, JSSC 96
Delay-optimal buffered tree generationCong, DAC
00 Alpert, TCAD 02
No existing algorithms consider dual-Vdd for
buffer insertion or buffered tree generation

3
Major Contributions

First in-depth study of dual Vdd buffer insertion
and buffered tree generation
Large power saving over single Vdd buffering
Efficient algorithms for power optimality
17x faster than Lillis, JSSC 96 when single Vdd
is considered

4
Outline

Dual Vdd buffer insertion and sizing (DVB)
Problem formulation
Sampling for speedup
Experimental results
Dual Vdd buffered tree generation (D-Tree)
Problem formulation
Improved augmented orthogonal search tree
Experimental results

5
Delay, Slew and Power Modeling

Elmore delay
Wire , buffer
Bakoglus slew metric (ln 9 Elmore)
Power energy per switch
Wire
Lumped buffer dynamic/short-circuit power
Can be easily extended to leakage power
Low Vdd (VL) reduces leakage
Need to assume of clock rate and switching
activity

6
Introducing Dual Vdd Buffering

Achieves power saving since power a Vdd2
Suffer no loss of delay optimality
VL gt VH requires level converter (LC)
Restore voltage level and reduce leakage
Ext-CVS for logic Srivastava, ISLPED 04
LC delay and power overhead amortized

7
Key Observation in Dual Vdd Buffering

Disallowing VL gt VH will not affect optimality
Optimality empirically illustrated (_at_ 65nm)
(a) has LC and VH drives Cl, power (a) gt (b)
Delay (b) gt (a) only if Cl gt 0.5pF ( 9mm wire)

VH
VL
8
DVB Formulation

Dual Vdd Buffer Insertion (DVB)
Given interconnect tree
Find buffer placement, Vdd assignment for
buffers, sizes of buffers
VH buffers driving VL buffers within the tree
Level converters at VH sinks driven by VL buffers
Minimize power subject to
Arrival time requirement at the source (RAT)
Slew rate constraint at buffer inputs and sinks

9
DVB Algorithm

Based on Lillis, JSSC 96
Dynamic programming with partial solution
(option) pruning
Options must now record downstream Vdd levels for
buffering
To prevent VL gt VH, which removes unnecessary
search on solution space
Still quite slow for large nets
Challenge
Considering power causes super-linear growth in
the number of options (w.r.t. tree size)
Dual Vdd buffers gt 2x options at each node

10
Speed-up Technique

Approximate by power-delay sampling
Sampling under each distinct cap value
Uniformly pick options from the entire RATpower
trade-off curve

11
Experimental Settings for DVB

Testcase randomly generated Steiner trees
20 to 800 terminals in 1cm x 1cm routing area
Buffer sizes 16x, 32x, 64x
Sampling grid set to 20x20
Comparison
Exact power-optimal algorithm (PB)Lillis, JSSC
96
Our algorithm with single (SVB) and dual(DVB)
Vdd buffers

12
Sampling Preserves Optimality

Sampling has little impact on optimality
SVB follows PB closely
Still optimal delay, 1.7 larger power over PB

13
Dual Vdd Reduces Power

Dual Vdd shifts power-delay curve to the left

14
Experimental Results for DVB

DVB saves 23 power over SVB
More power saving in larger nets
Power saving becomes larger w/delay slack
e.g. relax delay 5, saving becomes 26

15
Runtime

SVB scales a lot better for larger testcases
Achieved 17x speedup over PB Lillis, JSSC 96
DVB takes 2.5x more runtime than SVB

16
Outline

Dual-Vdd Buffer insertion and sizing (DVB)
Problem formulation
Sampling speed-up technique
Experimental results
Dual-Vdd buffered tree generation (D-Tree)
Problem formulation
Improved augmented orthogonal search tree
Experimental results

17
D-Tree Formulation

Dual Vdd Buffered Tree (D-Tree)
Given locations of terminals, buffer stations and
blockages
Find a rectilinear Steiner tree (RST), buffer
placement/size/Vdd assignment
VH buffers driving VL buffers only
Level converters at VH sinks driven by VL buffers
Minimize power
Arrival time requirement at the source (RAT)
Slew rate constraint at buffer inputs and sinks
D-Tree is NP-Hard
Finding minimum RST alone is NP-Complete

18
Buffered Tree Construction

Delay optimization only Cong, DAC 00 by
Build Hanan Graph w/buffer insertion nodes
according to locations of buffer stations
Path search on the grid by option propagation

19
D-Tree Algorithm Overview

Challenges
Growth of option is exponential
An artifact of D-Trees NP-hardness
Considering power worsens option growth
Solution sampling efficient prune tree

20
Prune Tree in Lillis, JSSC 96

Option inserted in sorted capacitance
Never need to clear options out from the tree
If new option is checked against the tree
Automatically avoid redundant option in tree
e.g. ?new (c 20, p 100, q 600)
Not applicable to D-Tree problem
Order of new options is not known a priori

21
Our Improvement on Prune Tree

Indexing w/capacitance results in fewer trees
capacitance value lt power value
Efficient tree cleaning
Enables out-of-order option insertion
Guarantee no redundancy in tree

22
Tree Cleaning

To add an option ?new in O(clog(T)) time
Check whether ?new is dominated by any option in
the data-structure
If not, remove options in the tree dominated by
?new in two downward tree traversals
e.g. ?new (c 10, p 70, q 410, )

23
Experimental Settings for D-Tree

Random testcases
All based on a random floorplan of 1cm x 1cm
Blockages 30, buffer stations 1mm apart
Comparison
Delay-optimal tree (RMP) Cong, DAC 00
Ours with single (S-Tree) and dual(D-Tree) Vdd
Buffer

24
Experimental Results for D-Tree

Significant power saving over RMP
S-Tree 7, D-Tree 18
Larger saving for large testcases (e.g. T4)
Handles up to 6-sink nets (T5 takes 23 mins)
Similar capability compared with delay-optimal
approaches Cong, DAC 00 Chen, ASP-DAC 02

25
Conclusion