Title: System-level Power Estimation and Optimization
1System-level Power Estimationand Optimization
- 2006.09.03
- Chong-Min Kyung
- KAIST
2Contents
- Introduction
- System-level Power Estimation
- System-level Power Optimization
3Introduction
- Power classification
- Static power
- leakage power
- Dynamic power
- Switching power
- Short-circuit power
- Glitch power
4Introduction
- Power calculation
- Static Power
- Ptotal_leakage ? Pcell_leakage
- Dynamic Power
- Pinternal Func (Cload,TRinput,output)
- TR toggle rate
- Pglitch V2dd ?(Cloadnet fglitch t)
- fglitch the frequency of glitch
- t the factor of the width of glitch
- Pswitching ½V2dd ?(Cload TRoutput)
- Ratio
- gt 0.1um switching power 7090
- lt 0.07um leakage power gt 50
- Data intensive application
- Switching power is a dominant factor.
5Introduction
- Opportunities for power reduction
- For low power design
- 1. Power model generation
- 2. Power estimation
- 3. Power optimization
Commercial tool
Academic research
6System-level Power Estimation
7Contents
- Power Model Generation
- Analytical Method
- Empirical Method
- System-level Power Estimation
- Hardware Power Estimation
- Software Power Estimation
- Bus Power Estimation
8Power Model Generation
- 1. Analytical method
- Use average values of design parameters without
different circuit styles, clock strategies and
layout techniques consideration - Average capacity, equivalent gate count, primary
input number, etc. - Mainly used for behavior-level power estimation
- when there is no information about technology
library and implementation information - Very low accuracy
- 2. Empirical method
- Use the parameters measured by existing
implementations - 2-1. Fixed-activity model
- 2-2. Activity-sensitive model
9Power Model Generation
- 2-1. Fixed-activity model
- Use data sheet of a specific hardware block
- Pprocessor Cprocessor x VDD2 x freq
- Cprocessor Pdata_sheet / (Vdata_sheet2
freqdata_sheet) - Low accuracy
- Mainly used for coarse-grained system-level power
estimation - 2-2. Activity-sensitive model
- Use signal activity or its statistics which
depends on testbench - Transition-sensitive model
- Power model is a Look-Up Table (LUT).
- Very high accuracy
- Statistical activity model
- Power model is a LUT or an equation.
- High accuracy
10Macro Modeling Method
- Macro modeling method
- Raise abstraction of power model by
characterizing macro cell - Mainly used to reduce power model complexity in
activity-sensitive power model generation - Macro cell
- 32-bit adder, multiplier, MUX, etc.
- Reduced computation complexity at the cost of
accuracy - Macro cell characterization
- Synthesize macro cell with basic cell library
- Estimate power value of macro cell with various
testbench - Generate power model and reduce its complexity
- This concept can be used for raising abstraction
of power model in hardware or software-level
power estimation.
11Macro Modeling Method
- Power model of macro modeling method
- Statistical activity model
- LUT-based model
- For each bus component, build 3-D LUT (with axes
of Pin, Din, Dout) - Fill power value at each point (Pin, Din, Dout)
- Requires a lot of memory space
- Equation-based model
- Build a polynomial approximating power
consumption. - From a large number of input patterns, perform
analysis to determine the coefficients. - Requires little memory space
Pin average input signal probability Din
average input switching activity Dout average
output zero delay switching activity
12System-level Power Estimation
- Estimation speed and power model
- Trade-off between estimation speed and accuracy
of power model - Abstraction of power estimation
- System-level power estimation
- Software-level power estimation
- Hardware-level power estimation
- Behavior-level, RT-level, gate-level,
circuit-level
Relative power results
Absolute power results
13System-level Power Estimation
- System-level power estimation
- Relative value of power consumption is important.
- Objective
- Power profiling and design exploration
- System-level power estimation is composed of
- 1. Hardware power estimation
- 2. Software power estimation in processor
- 3. Bus power estimation
14Hardware Power Estimation
- RT-level power estimation
- Dynamic simulation-based power estimation with
coarse-grained net model from power macro model
database and testbench
15Hardware Power Estimation Tool
- There are some commercial tools for hardware
power estimation - RT-level
- Synopsys Power CompilerTM
- Gate-level
- Synopsys Prime PowerTM, Synopsys Power CompilerTM
- Circuit-level
- SPICE, Synopsys PowerMillTM, Cadence
VoltageStormTM
16Software Power Estimation
- Estimation
- Processor is too complex to estimate in RT-level.
- Power consumption is related to each instruction
and instruction sequence. - Estimation method
- Power model is added to ISS for instruction-level
power profiling. -
- Bi energy consumption of inst. i
- Ni number of execution of inst. i
- Oij energy consumption when inst. i is followed
by inst. j - Nij number of pair inst. i and inst. j
- Sk other inst. Effect such as cache misses,
pipeline stall, etc.
17Software Power Estimation
- Power model
- Instruction-level power model
- Inter-instruction effect consideration
- Dynamic effect (cache miss, branch prediction,
etc) - Power modeling method
- 1) White-box approach
- 2) Black-box approach
18White-box Approach
- Power model
- Activity-sensitive model
- Characterization
- Use macro modeling method
- Process
- Run gate-level simulation
- Find predominant parameter
- Reduce power model complexity
- Simple equation or reduced LUT
- Make instruction-level power model
- Accuracy is degraded and estimation speed is
increased by reducing the power model complexity.
19Black-box Approach
- Characterization flow
- Measurement
- Characterization
Measurement
I(t)
R
V
r
V
Characterization
Instruction-level Power Model
20Black-box Approach
- Measurement
- By current measurement of real chip
- Power model
- Activity-sensitive power model
- Statistical activity model
- Characterization process
- Current is estimated using real chip with
multiple iterations of subroutine - Compare measured value with ISS including dynamic
effects - Find a power equation which is similar to the
measured power graph - Decide coefficients of power equation by
experimental iteration - ? It is important to find the closest equation to
the measurement results.
21Black-box Approach
- Measurement method
- Program under measurements are isolated by using
interrupt signal, NOP instruction and processor
wait state for finding exact measurement position
and for synchronization.
R. Muresan and C. Gebotys, Current
dynamics-based macro-model for power simulation
in a complex VLIW DSP processor, IEE
proc.-Comput. Digit. Tech., 2002
22Software Power Estimation Toolfor Research
Purpose
- SimplePower
- Functional simulator
- SimplePower core
- based on SimpleScalar ISA
- Power model
- Activity sensitive power model
- Direct simulation and profiling based
- on input transitions
- Generate switch capacitance tables
Cycle-accurate activation information
Implementation-based signal generation
23Software Power Estimation Tool for Research
Purpose
- Wattch
- Architecture-level power estimation
- Functional simulator
- SimpleScalar cycle-level performance simulator
- Power model
- Fixed activity power model
- Categories
- Array structure
- Fully associative CAM
- Combinational logic and wires
- Clocking logic
- Example Array structure
- Power C1 C2 A C3 B
- A Bit line number, B Word line number
- C1 Diffusion cap., C2 Gate cap.,
- C3 Metal cap.
24Bus Power Estimation
- Power consumed on the bus consists of two parts
- Bus component power
- Power consumed internally in the bus components
- Arbiter, decoder, muxes
- Interconnection power
- Power consumed on the bus wires that connect the
master and slave interfaces and the bus
components - Address bus, data bus, control signals
25Bus Component Power Estimation
- At System level, only the structural information
about bus architecture can be obtained. - Bus interconnection
- Bus width
- Global bus power model is used for estimation
- Characterized power model of bus component is in
the global bus power model - Arbiter, decoder, multiplexer
- Behavior, FSM
Processor
IP 1
IP 2
Memory
Global Bus Power Model
bus
26Bus Component Characterization
- Macro model
- Pre-calculated power cubic
- Useful to apply on system level power estimation.
- Input parameter of the macro models
- Data and address bus width, or the operating
frequency - The number of masters and slaves
- Input/output data characteristics
- The switching activity, the probability of signal
or the Hamming distance of two successive data
27Bus Power Analysis
- AMBA AHB bus power analysis
- A standard for on-chip communication
- Power analysis process
- Bus structure decomposition
- Arbiter
- Decoder
- Multiplexer
- Build macro model of eachcomponent
- Bus behavior decomposition and build power FSM
- IDLE, READ, WRITE, and IDLE with handover
- Monitor bus signal activity
- Power analysis through power FSM
Global bus power model
28Interconnection Power Estimation
- Power consumption on each wire
- P ½ Vdd2 C f a
- Vdd voltage swing between the logic level 1 and
0. - C capacitance of the wire.
- f clock frequency.
- a switching activity.
- Vdd and f is given as fixed value.
- We need to find C and a.
- C can be obtained from wire capacitance model.
- a can be obtained from system level simulation.
29Interconnection Power Estimation
- Wire capacitance model
-
- eox constant, 3.45 x 10-13F/cm, permittivity of
SiO2 - xint oxide thickness underneath the
interconnect - W interconnect width
- L interconnect length
- W, xint can be obtained from the technology
parameter. - L can be estimated from the area of the chip
- (where A is area of the chip)
J. P. Uyemura, Circuit Design for CMOS VLSI
Kluwer Academic Publishers 1992.
30Interconnection Power Estimation
- Switching activity model
- Switching activity can be obtained from bus
transactions. - Bus model monitors bus transition and counts bus
switching.
CPU
Bus model
mem
DSP
IP
Monitoring bus transition
System level simulation
31Bus Power Estimation
- Power estimation
- Application example is simulated in system level
simulator. - Power estimator reports power consumption using
the power model of the bus components and
interconnection. - Monitored values in the bus transition are used
as the input of the power estimator.
CPU
Bus model
mem
Power Estimator
DSP
IP
System level simulator
32System-level Power Optimization
33Contents
- Low Power System Implementation Techniques
- Circuit level
- Clock gating
- MTCMOS
- Multiple voltage supply
- Architecture level
- Memory Optimization
- Bus Optimization
- Dynamic Power Management in System Level
- Introduction to DPM
- Structure of DPM
- Component-level DPM scheme
- DPM Policy
- Dynamic Voltage Scaling
34Circuit Level Low Power System Implementation
Techniques
- Clock gating
- Most popular method for power reduction of clock
signals - Need circuit to generate enable signal
- Increases complexity of control logic
- Timing critical to avoid clock glitches at AND
gate output - Additional gate delay on clock signal
35Circuit Level Low Power System Implementation
Techniques
- MTCMOS
- Low VTH devices in logic to maintain performance
when active. - High VTH current switch (header or footer) to
cutoff leakage path when sleep. - Scheduling algorithm which controls sleep signal
is important.
VDD
header
sleep
Virtual VDD
Logic
Input
Output
Virtual GND
sleep
footer
36Circuit Level Low Power System Implementation
Techniques
- Multiple Voltage Supply
- Slows down non-critical path with lower voltage
supply - Two or more power grids
- Need high-efficiency voltage converters for
dynamic voltage scaling - Dynamic power scheduling algorithm is important.
In
Low voltage supply
Critical path need high speed logic
High voltage supply
-
37Architecture Level Low Power System
Implementation Techniques
- Memory Optimization
- Code density optimization
- Goal
- Minimize program memory occupation to reduce the
bandwidth of processor-memory communication - Approaches
- Custom instruction sets
- Object code compression
38Memory Optimization
- Custom instruction set
- Shorter size instruction sets than regular
instruction sets - Example ARM Thumb code (16bit instruction)
- Need a specific architecture for 16 bit
instruction support
Inst 5
Inst 4
Inst 4
Inst 5
Inst 3
Inst 2
Inst 2
Inst 3
In this case, 3/5 bandwidth reduction
Inst 1
Inst 1
32bit
32bit
39Memory Optimization
- Object code compression
- The size of all instructions is same, but some or
all instructions are encoded and saved in
instruction memory. - Available solution for embedded processors
- A specific architecture for different type of
instruction support is not needed. - Exploit the small subset of instructions used by
firmware code - Approaches
- Full code compression
- Selective code compression
40Memory Optimization
- Full code compression
- Replace all instructions with binary patterns of
minimum width. - log2 N, where N is the number of instructions
- Advantage
- Memory bandwidth for instruction is decreased.
- Disadvantage
- Size of IDT may be very large because N is not
small. - log2 N may not be a multiple of 8.
Memory
Memory
Addr.
Core
Core
Addr.
Inst.
Inst.
IDT
log2N
k
k
k bits
log2N bits
IDT Instruction Decompression Table
41Memory Optimization
- Selective Code Compression
- Almost program traces are covered by a small
subset of instructions. - Compression only such subset instructions that
maximize program coverage - Program is a mix of compressed and uncompressed
instructions.
Memory
Addr.
Core
Buffer
k
k
Inst.
IDT
8
8 bits
Controller
42Memory Optimization
- Advantage
- Size of IDT is fixed and limited.
- Instruction fetching/decompression logic has
reduced complexity. - Disadvantage
- Requires a controller to handle instruction
fetching
43Memory Optimization
- Data density optimization
- Same principle as code density optimization
- For the purpose of reducing memory traffic
- dynamic size of the data-set
- More complex than code compression, because both
compression and decompression are required - Hardware compression/decompression unit needed
- Design trade-off between speed and power
44Architecture Level Low Power System
Implementation Techniques
- Bus power optimization
- A large amount of power is dissipated in data
communication over heavily-loaded on-chip or
off-chip busses. -
- Reduce switching activity on busses via signal
encoding for power saving - Approaches
- Bus-invert coding
- Gray code addressing
PBus n x C x Vdd2 x freq x activity
, for an n-bit bus
45Bus Optimization
- Bus-invert coding
- Add redundant line INV to bus
- When INV 0
- Data is equal to remaining bus lines
- When INV 1
- Data is complement of remaining bus lines
- At each cycle decide whether sending the true or
compliment signal leads to fewer toggles
Source data
Data bus
Received data
INV signal
Polarity Decision logic
46Bus Optimization
- Gray code addressing
- Most instruction addresses are consecutive
- Use Gray code to address
- Word-oriented machines
- Increments by 4 (32 bit) or by 8 (64bit)
- Modify Gray code to switch 1 bit per increment
- Gray code adder needed for jump
Dec Gray(i1) Gray(i4) Gray(i8)
0 1 2 3 4 5 6 7 8 0000 0001 0011 0010 0110 0111 0101 0100 1100 0000 0001 0011 0010 0100 0101 0111 0110 1100 0000 0001 0011 0010 0110 0111 0101 0100 1000
i increment
47Introduction to DPM
- Dynamic Power Management (DPM)
- DPM controls power consumption of components
based on its usage. - Prediction of component usage is essential.
- Methods
- Shutdown (clock gating, power gating)
- Slowdown (frequency scaling, voltage scaling, VTH
scaling)
f VDD
f VDD
idle
0.6 VDD
VDD
T/2
T
48Structure of DPM
- Levels of embodiments of DPM
- Component level
- Circuit, Block
- Power mode
- System level
- Policy
- The procedure which controls the power level of
each module in a system
Policy
System
power mode
power mode
request
request
Block 1
Block n
Circuit
Circuit
Circuit
Circuit
49Component Level DPM Scheme
- Circuit level
- Clock off by clock gating
- Power off by footer/header of MTCMOS
- Multiple voltage supply
- Block level
- Power off by shutdown of power supply to IPs
- When power off pattern of two block are similar,
shutdown together.
Virtual VDD
IP 1
Virtual GND
VDD source
IP 2
GND source
50Component Level DPM Scheme
- Power mode
- Each state has combination of enabled DPM
technique. - ex) The case that system uses clock gating and
block shutdown -
- Transitions between modes of operation have a
cost.
P400mW
Run
90µs
10µs
Power mode Clock gating Block shutdown
Run disabled disabled
Idle enabled disabled
Sleep enabled enabled
10µs
160ms
P50mW
P0.16mW
90µs
Idle
Sleep
Wait for interrupt
Wait for wake-up event
Power state machine for the StrongARM processor
SA-100 Microprocessor Technical Reference Manual,
Intel, 1998
51DPM Policy
- Predictive technique
- Uses a regression equation based on previous On
and Off times of the component to estimate the
next turn on time. - Limitation
- It cannot handle components with more than two
power modes.
Go-to-sleep
Running (R)
Sleep (S)
Wake-up
Predictive power management scheme
Pre-wakeup scheme
R
R
I
R
R
I
R
R
I
R
R
I
delay
delay
delay
R
E
S
R
W
R
E
W
R
R
E
S
R
W
R
E
S
R
W
I
C.H. Hwang et al, A predictive system shutdown
method for energy saving of event-driven
computation, Proc. Int. Conf. on Computer Aided
Design, pages 28-32, Nov. 1997
M. Srivastava et al, Predictive system shutdown
and other architectural techniques for energy
efficient programmable computation, IEEE TVLSI,
Vol. 4, No.1 ,1996
I Idle state E Entering state W Waking up
state
52DPM Policy
- Markov process
- Markov process is a process which uses a previous
state and pre-characterized probability to choose
next state. - Power management optimization has been studied
within the framework of Markov process. - When system is modeled as Markov chains
- It can model the uncertainty in system power
consumption and response times. - It can model complex systems with many power
states, buffers, queues. - It can compute power management policies that are
globally optimum.
G.A. Paleologo et al, Policy optimization for
dynamic power management, Proc. DAC, 1998
53DPM Policy
- Structure of stochastic DPM
- FSM of each module
-
Observation
Observation
Power Manager
Command
Service Requestor
Service Provider
queue
Request
54Dynamic Voltage Scaling
- DVS
- Reducing VDD is a single most effective way to
reduce power consumption. - Reducing VDD is limited by the worst-case
condition. - Performance requirement varies with time.
- Solution
- Slowdown perform the job with just-in-time
performance
55DVS Applied Processor
- Transition overhead
- Max 70µs for 580MHz transition
- Max 4µJ for 580MHz transition
CPU
ARM Core
16KB Cache
64KB SRAM
System BUS
.
.
.
0.5MB
Bus interface
System Co-processor
Write Buffer
VCO
I/O Chip
Fdesired
VDD
Regulator
VBat
T.D. Burd et al, A dynamic voltage scaled
microprocessor system, IEEE JSSC, Nov. 2000
56DPM using DVS on SoC
- Divide SoC into 4 power domains
- Persistent 3.3V I/O drivers and receivers
- Persistent 1.0V PLL
- Persistent 1.8V RTC, sleep management
- DVS 1.0V 1.8V (10mV/µs)
K.J. Nowka et al, A 32-bit PowerPC
System-on-a-Chip with support for dynamic voltage
scaling and dynamic frequency scaling, IEEE
JSSC, Nov. 2002