Title: RTL Power Optimization with Gatelevel Accuracy
1RTL Power Optimization with Gate-level Accuracy
Qi Wang Cadence Design Systems, Inc
Sumit Roy Calypto Design Systems, Inc
2Outline
- Introduction
- Literature Review
- Proposed Approach
- Experimental Results
- Conclusion
- Future work
3RTL Power Optimization Techniques
- Clock Gating
- Shut-off clock signal when the outputs of the
driven registers are not used. - Reduce the dynamic power dissipated by the clock
tree network and the registers - Sleep Mode (Operand Isolation)
- Shut-off combination blocks when the outputs of
the blocks are not used. - Useful for designs with lots of datapath blocks
and small amount of average switching activities.
4Example of Sleep Mode Transformation
5Challenges
Timing Closure
a
1
b
out
1
0
Achieving Power Saving
0
en2
en1
Identifying Complex Enable Function
6Literature Review
- H. Kapadia and et.al. Reducing Switching
Activity on Datapath Buses with Control-Signal
Gating, IEEE Journal of Solid-State Circuits,
Vol. 34, No. 3, March 1999, pp. 405-414. - 6 S. Dey and et. al., Controller-Based Power
Management for Control-Flow Intensive Designs,
IEEE Trans. on Computer-Aided Design of
Integrated Circuits and Systems, Vol. 18, No. 10,
October 1999, pp. 1496-1508. - M. Munch, and et. al., Automating RT-Level
Operand Isolation to Minimize Power Consumption
in Datapaths, Proceedings of Design and Test
Automation Conference in Europe, Mar. 2000, pp.
624-631.
7Limitations of Previous Work
- Insert gating logic before timing optimization
- Poor accuracy for power estimation at RTL
- Poor accuracy for timing analysis at RTL
- Problems for timing closure
- Timing constraints may be violated
- Inserted logic may shift critical path
- Simply undoing transformations is not enough
- Possible loops between RTL and timing
optimization to achieve timing closure - Result in long run time and bad QoR
- Power may be increased
8Proposed Approach
- Objective
- A robust solution to meet the challenges of both
timing closure and power requirement for
nanometer designs. - Two step approach
- RTL exploration
- Behavioral level observability analysis to derive
complex enable function from CDFG. - Mark the netlist with candidates for sleep mode
transformations with enable functions but do not
commit it. - Gate-level committing
- Perform regular logic and timing optimization
like not sleep mode logic has been inserted. - Commit it after timing optimization.
- Accurate power and delay trade-off becomes
possible.
9Control Data Flow Graph (CDFG)
module m(a,b,en,clk,y) input en, clk input 70
a, b output 80 y register 80 y always _at_
(posedge clk) if (en) y a b endmodule
10Behavioral Level Observability
- TOC Token Observable Condition of an edge is the
condition under which the token on that edge can
be observed at one or more output nodes of a
CDFG. - NOC Node Observable Condition is the condition
under which the token on any input edge of the
node can be observed at one or more output nodes
of a CDFG. - TO Token Observability of an edge is the
probability of the TOC of this edge being 1. - NO Node Observability of a node is the
probability of the NOC of the node being 1.
11Computation of TOC/NOC
Output/Register nodes NOC(nout) 1 TOC(i)
1
12Computation of TOC/NOC (cont.)
Operation nodes NOC(nop) TOC(o0) ?
TOC(o1) ? ?
TOC(oj-1) TOC(ip) NOC (nop) ?
p ?0, 1, , k-1
13Computation of TOC/NOC (cont.)
Merge nodes NOC(nmerge) TOC(o) TOC(c)
NOC(o) TOC(ip) cp ? TOC(o) where cp ? p ?0,
k-1 is a Boolean encoding of the variables for
the value of the token at the control port to
select output port p
14Computation of TOC/NOC (cont.)
Branch nodes NOC(nbranch) TOC(o0) ?
TOC(o1) ? ... ? TOC(oj-1) TOC(c)
NOC(nbranch) TOC(i) c0 ? TOC(o0) ? c1
? TOC(o1) ? ? cj-1 ? TOC(oj-1) where cp
? p ?0, j-1 is a Boolean encoding of the
variables for the value of the token at the
control port to select output port p
15Fast Computation of TOC/NOC
16RTL Exploration
a
1
out
1
0
0
b
en1
en2
17Gate Level Committing
a
1
out
1
0
0
b
en1
en2
18Partial Committing
committed
fa(bc)
? fnewa
19Synthesis Flows
20Experiment Setup
- Implemented into Cadence PKS/LPS? 5.0
- A commercial low power synthesis tool for both
logical and physical power optimization. - Using the incremental power analysis engine
inside PKS/LPS ? to evaluate the impact of power
during the gate-level committing stage. - Using the incremental timing analysis engine
inside PKS to evaluate the impact of timing
during the gate-level committing stage. - The overhead of extra delay and power introduced
by the gating logic is accurately considered.
21Experiment Setup (cont.)
- 6 industrial blocks were chosen for
experimentation - All except 6 having customer provided simulation
testbench to obtain the switching information for
power estimation
22Experimental Results
23Experimental Results (cont.)
- Proposed approach can achieve a wide range of
power delay trade-offs - Design 6 was chosen to run the flow several times
with different timing constraints
24Conclusion
- Robust solution for applying sleep-mode
transformation - 2-step approach toward achieving RTL
transformation with gate level accuracy - Accurate and full range of power delay trade off
- No impact on normal timing optimization
- Fully automated
- Ideal solution to meet the challenges of both
timing closure and power management for modern
nanometer designs.