Title: Business as usual in the past:
1Moores Law Blessing or Curse?
From S. Borkar, Intel
- Business as usual in the past
- Design for highest performance
- New technology deployed
- Compaction, more cache
- Bigger fan
Need to consider PERFORMANCE AND POWER TOGETHER
from the very early stages of the design
2Power Performance Optimization
Power
Initial design
Design within power budget
Cycle time
OPTIMAL POWER PERFORMANCE TRADEOFF CURVE
- Only circuits on the optimal power performance
tradeoff curve make sense
3Circuit Optimization Framework
Models
Netlist
Plug-ins
Variables
Optimization Core
Optimal Design
Results
4The Core Mathematical Optimization
- Very difficult in the general case
- Optimality not guaranteed
- CONVEX OPTIMIZATION
- f, gi convex, hj linear
- Key property every local minimum is a global
minimum - Optimality guaranteed
5Choice of Models
- ANALYTICAL
- Various degrees of complexity
- Fast parameter extraction
- Provide insight in the operation of the circuit
- Can exploit their mathematical properties to help
optimization - Target convex optimization
- Limited accuracy
- TABULATED
- Very accurate
- Slow to generate
- No insight in the operation of the circuit
- Optimization is blind
- If convex models are any good, the optimization
problem is not very non-convex
6Optimizing Combinational Circuits
- Minimize DELAY
- subject to
- Maximum ENERGY
- Constraints
- Maximum ouput slew
- Maximum internal slew
- Maximum input capacitance
- Minimum sizes
Basic Result Energy - Delay tradeoff curve
Fastest circuit
Energy
Lowest power circuit
Delay
7Optimizing Combinational Circuits
Example Optimal 64-Bit CLA Adders in the Energy
Delay Space
R. Zlatanovici, B. Nikolic Power Performance
Optimal 64-Bit Carry-Lookahead Adders, ESSCIRC
2003
864-Bit Adders Impact of Stack Height
- Technology
- 130nm 1P 6M
- VDD 1.2V
- Setup summary
- Cout 450 fF
- Cin ? 150 fF
- 1 bitslice 18 M1 pitches
- Fixed datapath height
3-stack
4-stack
5-stack
9Fastest 64-Bit Adder
- Proof of concept implementation
- Radix-4 sparse-2 tree in domino logic
- Technology
- 90 nm 1P 7M
- VDD 1 V
- Performance
- Delay 210 ps (post layout simulation)
- Energy 9.1 pJ / cycle (optimization tool)
- Core dimensions 417.3 ?m x 75.3 ?m
- Chip almost ready for tapeout
- with Sean Kao
10Optimizing Pipelined Circuits
Models Posy- nomials
Block Level Netlist
Minimize TCYCLE Subject to Maximum ENERGY
Gate Sizes Latch Positions
Static timer
Optimizer
- Fix pipeline depth
- Find shortest cycle time for fixed cutset
- Search for optimum cutset
Optimal Pipeline Configuration
11Optimizing Pipelined Circuits Example
- Models posynomials
- Variables gate sizes, latch positions
- Minimize cycle time subject to maximum energy
- Result optimal configuration
12Summary
Power and performance are the two sides of the
same coin. The connection the power
performance tradeoff curve.
- Developed a modular optimization framework to
design power performance optimal circuits - Optimizer for combinational circuits
- Experimented on 64-bit CLA adders
- Optimizer for pipelined circuits
- In test on an IEEE- compliant Floating Point Unit
(FPU)