Title: PowerAware Placement
1Power-Aware Placement
- Yongseok Cheon, Pei-Hsin Ho
- Advanced Technology Group, Synopsys, Inc.
- cheon,pho_at_synopsys.com
- Andrew B. Kahng, Sherief Reda and Qinke Wang
- UCSD CSE Department
- abk,sreda,qiwang_at_cs.ucsd.edu
2Outline
- Introduction
- Activity-based register clustering
- Activity-based net weighting
- Experiments
- Conclusions
3IC Power Consumption
- Switching power
- largest source of power dissipation
- usually accounts for 40 to 80 of total power
- switching power of a net is proportional to the
product of net capacitance and signal switching
rate - Short circuit power
- power dissipation due to short current that
happens briefly during the switching of a CMOS
gate - Leakage power
- power dissipation due to spurious currents in
thenon-conducting state of a transistor
4Clock Power Consumption
- Clock net
- a major contributor to dynamic power
- much larger capacitances than most signal nets
- highest switching activity
- typically consumes up to 40 of total dynamic
power across a variety of design types - Traditional placement methodologies treat
registers no differently than combinational cells - lead to sub-optimal placements in terms of power
5Power Aware Placement Method
- Activity-based register clustering
- reduce capacitance of clock nets hence clock
power - Activity-based net weighting
- reduce capacitance of high-activity signal nets
hence total net switching power
6Outline
- Introduction
- Activity-based register clustering
- Activity-based net weighting
- Experiments
- Conclusions
7Large Weight for Clock Net?
- Not a good idea
- May only affect registers close to boundaries
- Introduce hot spots and highly congested areas
8Distribution of Clock Tree Capacitance
- Observation most of the clock tree capacitance
(e.g., 80) is at the leaf level
9Register Clustering
- Goal reduce capacitance of a clock net
- Method clumping the registers within the same
leaf cluster of the clock tree into a smaller
area - Result reduced leaf-level clock tree capacitance
and potentially clock skew
10Flow of Register Clustering
- Quick CTS algorithm group registers into
clusters such that each cluster can become a leaf
cluster of the actual clock tree - Group Bounds constrain the placement of a
cluster of registers within smaller bounding box
11Quick Clock-Tree Synthesis Algorithm
- Decide a scope of target cluster size
heuristically based on - size of the clock net
- design rule constraints max fanout and max load
- user configuration
- Perform clustering for each direction from left,
right, top and down and each target cluster size - Select the clustering with the best CTS objective
- e.g., minimum clock skew, minimum clock delay,
minimum clock buffers, etc.
12Quick CTS Algorithm (contd)
- Start with the leftmost (rightmost, highest or
lowest) un-clustered clock pin - Add clock pin with shortest Manhattan distance to
the capacitance weighted centroid of the current
cluster - Grow until target cluster size
- Repeat growing clusters until all done
13Group Bounds
- Control bounding box of a cluster and reduce it
while still fitting the registers - Compute current bounding box of registers
- Shrink the bounding box proportionally
- Shrink ratio p
- specified shrinking factor of p0
- switching rate of clock net SR and max switching
rate MSR
14Aspect Ratio of Bounding Box
- Close to the original bounding box aspect ratio
ARold when shrinking ratio p is close to 1 - without serious increasing of signal net length
- Close to square when shrinking ratio p is close
to 0 - reduced clock skew
- Linear function of original aspect ratio ARold
and shrink ratio p
15Outline
- Introduction
- Activity-based register clustering
- Activity-based net weighting
- Experiments
- Conclusions
16Pros and Cons of Register Clustering
- Effectively reduce capacitance of leaf-level
clock tree - Increase the length of some signal nets
- Cancel out clock power reduction
17Activity-Based Net Weighting
- Goal reduce capacitance of signal nets
- Assigning larger weight to signal nets with
higher switching rates - Combining register clustering and activity-based
net weighting further reduces the total net
switching power
18Activity-Based Net Weighting
- Assign larger weights to nets with higher
switching rates - T threshold for selecting high activity nets
- MSSR maximum signal net switching rate
- W controls the scope of power weights
19Compatibility with Timing Weights
- Linear combination of power and timing net
weighting - Power ratio a 0 1
- control the ratio of power weight
- knob for trade-off between timing and power
20Outline
- Introduction
- Activity-based register clustering
- Activity-based net weighting
- Experiments
- Conclusions
21Experimental Setup
- Implemented on Synopsys IC compiler
- Eight industry circuits
- cells 20k 186k
- registers 2.3k 44.2k
- clock power 32 of total power
- net switching power 39 of total power
- Power aware placement
- shrink ratio and power ratio around 0.8
22Experimental Flow
- Commercial IC implementation flow
- Power analysis IC Compiler
- specified switching rates of primary inputs
- net switching rates estimated by probabilistic
simulation
23Clock Net Switching Power
24Total Net Switching Power
25Results
26Summary
- Reduction
- clock net switching power 11.3 (1.6 34.5)
- total net switching power 25.3 (10.5 47.1)
- total power 11.4 (6.5 18.8)
- clock WL 10.1
- clock skew random
- Impact
- WNS (worst negative slack) 2.0
- total cell area 1.2
- runtime 11.5
27Power-Timing Trade-Off with Power Ratio
28Power-Timing Trade-Off with Shrink Ratio
29Conclusions
- We have presented a power-aware placement method
that performs activity-based net weighting and
register clustering to reduce the capacitance of
high-activity signal and clock nets - We have experimented the method on eight real
designs through a complete industrial physical
design flow - Our approach achieved average 25.3 and 11.4
reduction in net switching and total power, with
2.0 timing, 1.2 total cell area and 11.5
runtime degradation
30Thank You !