System-level Power Estimation and Optimization

About This Presentation

Title:

System-level Power Estimation and Optimization

Description:

Title: Bottom-up Approach Author: HS Last modified by: jaemoon Created Date: 6/22/2005 5:56:35 AM Document presentation format: – PowerPoint PPT presentation

Number of Views:206

Avg rating:3.0/5.0

Slides: 57

Provided by: hs8160

Category:

more less

Transcript and Presenter's Notes

Title: System-level Power Estimation and Optimization

1
System-level Power Estimationand Optimization

2006.09.03
Chong-Min Kyung
KAIST

2
Contents

Introduction
System-level Power Estimation
System-level Power Optimization

3
Introduction

Power classification
Static power
leakage power
Dynamic power
Switching power
Short-circuit power
Glitch power

4
Introduction

Power calculation
Static Power
Ptotal_leakage ? Pcell_leakage
Dynamic Power
Pinternal Func (Cload,TRinput,output)
TR toggle rate
Pglitch V2dd ?(Cloadnet fglitch t)
fglitch the frequency of glitch
t the factor of the width of glitch
Pswitching ½V2dd ?(Cload TRoutput)

Ratio
gt 0.1um switching power 7090
lt 0.07um leakage power gt 50
Data intensive application
Switching power is a dominant factor.

5
Introduction

Opportunities for power reduction
For low power design
1. Power model generation
2. Power estimation
3. Power optimization

Commercial tool
Academic research
6
System-level Power Estimation
7
Contents

Power Model Generation
Analytical Method
Empirical Method
System-level Power Estimation
Hardware Power Estimation
Software Power Estimation
Bus Power Estimation

8
Power Model Generation

1. Analytical method
Use average values of design parameters without
different circuit styles, clock strategies and
layout techniques consideration
Average capacity, equivalent gate count, primary
input number, etc.
Mainly used for behavior-level power estimation
when there is no information about technology
library and implementation information
Very low accuracy
2. Empirical method
Use the parameters measured by existing
implementations
2-1. Fixed-activity model
2-2. Activity-sensitive model

9
Power Model Generation

2-1. Fixed-activity model
Use data sheet of a specific hardware block
Pprocessor Cprocessor x VDD2 x freq
Cprocessor Pdata_sheet / (Vdata_sheet2
freqdata_sheet)
Low accuracy
Mainly used for coarse-grained system-level power
estimation
2-2. Activity-sensitive model
Use signal activity or its statistics which
depends on testbench
Transition-sensitive model
Power model is a Look-Up Table (LUT).
Very high accuracy
Statistical activity model
Power model is a LUT or an equation.
High accuracy

10
Macro Modeling Method

Macro modeling method
Raise abstraction of power model by
characterizing macro cell
Mainly used to reduce power model complexity in
activity-sensitive power model generation
Macro cell
32-bit adder, multiplier, MUX, etc.
Reduced computation complexity at the cost of
accuracy
Macro cell characterization
Synthesize macro cell with basic cell library
Estimate power value of macro cell with various
testbench
Generate power model and reduce its complexity
This concept can be used for raising abstraction
of power model in hardware or software-level
power estimation.

11
Macro Modeling Method

Power model of macro modeling method
Statistical activity model
LUT-based model
For each bus component, build 3-D LUT (with axes
of Pin, Din, Dout)
Fill power value at each point (Pin, Din, Dout)
Requires a lot of memory space
Equation-based model
Build a polynomial approximating power
consumption.
From a large number of input patterns, perform
analysis to determine the coefficients.
Requires little memory space

Pin average input signal probability Din
average input switching activity Dout average
output zero delay switching activity
12
System-level Power Estimation

Estimation speed and power model
Trade-off between estimation speed and accuracy
of power model
Abstraction of power estimation
System-level power estimation
Software-level power estimation
Hardware-level power estimation
Behavior-level, RT-level, gate-level,
circuit-level

Relative power results
Absolute power results
13
System-level Power Estimation

System-level power estimation
Relative value of power consumption is important.
Objective
Power profiling and design exploration
System-level power estimation is composed of
1. Hardware power estimation
2. Software power estimation in processor
3. Bus power estimation

14
Hardware Power Estimation

RT-level power estimation
Dynamic simulation-based power estimation with
coarse-grained net model from power macro model
database and testbench

15
Hardware Power Estimation Tool

There are some commercial tools for hardware
power estimation
RT-level
Synopsys Power CompilerTM
Gate-level
Synopsys Prime PowerTM, Synopsys Power CompilerTM
Circuit-level
SPICE, Synopsys PowerMillTM, Cadence
VoltageStormTM

16
Software Power Estimation

Estimation
Processor is too complex to estimate in RT-level.
Power consumption is related to each instruction
and instruction sequence.
Estimation method
Power model is added to ISS for instruction-level
power profiling.
Bi energy consumption of inst. i
Ni number of execution of inst. i
Oij energy consumption when inst. i is followed
by inst. j
Nij number of pair inst. i and inst. j
Sk other inst. Effect such as cache misses,
pipeline stall, etc.

17
Software Power Estimation

Power model
Instruction-level power model
Inter-instruction effect consideration
Dynamic effect (cache miss, branch prediction,
etc)
Power modeling method
1) White-box approach
2) Black-box approach

18
White-box Approach

Power model
Activity-sensitive model
Characterization
Use macro modeling method
Process
Run gate-level simulation
Find predominant parameter
Reduce power model complexity
Simple equation or reduced LUT
Make instruction-level power model
Accuracy is degraded and estimation speed is
increased by reducing the power model complexity.

19
Black-box Approach

Characterization flow
Measurement
Characterization

Measurement
I(t)
R
V
r
V
Characterization
Instruction-level Power Model
20
Black-box Approach

Measurement
By current measurement of real chip
Power model
Activity-sensitive power model
Statistical activity model
Characterization process
Current is estimated using real chip with
multiple iterations of subroutine
Compare measured value with ISS including dynamic
effects
Find a power equation which is similar to the
measured power graph
Decide coefficients of power equation by
experimental iteration
? It is important to find the closest equation to
the measurement results.

21
Black-box Approach

Measurement method
Program under measurements are isolated by using
interrupt signal, NOP instruction and processor
wait state for finding exact measurement position
and for synchronization.

R. Muresan and C. Gebotys, Current
dynamics-based macro-model for power simulation
in a complex VLIW DSP processor, IEE
proc.-Comput. Digit. Tech., 2002
22
Software Power Estimation Toolfor Research
Purpose

SimplePower
Functional simulator
SimplePower core
based on SimpleScalar ISA
Power model
Activity sensitive power model
Direct simulation and profiling based
on input transitions
Generate switch capacitance tables

Cycle-accurate activation information
Implementation-based signal generation
23
Software Power Estimation Tool for Research
Purpose

Wattch
Architecture-level power estimation
Functional simulator
SimpleScalar cycle-level performance simulator
Power model
Fixed activity power model
Categories
Array structure
Fully associative CAM
Combinational logic and wires
Clocking logic
Example Array structure
Power C1 C2 A C3 B
A Bit line number, B Word line number
C1 Diffusion cap., C2 Gate cap.,
C3 Metal cap.

24
Bus Power Estimation

Power consumed on the bus consists of two parts
Bus component power
Power consumed internally in the bus components
Arbiter, decoder, muxes
Interconnection power
Power consumed on the bus wires that connect the
master and slave interfaces and the bus
components
Address bus, data bus, control signals

25
Bus Component Power Estimation

At System level, only the structural information
about bus architecture can be obtained.
Bus interconnection
Bus width
Global bus power model is used for estimation
Characterized power model of bus component is in
the global bus power model
Arbiter, decoder, multiplexer
Behavior, FSM

Processor
IP 1
IP 2
Memory
Global Bus Power Model
bus
26
Bus Component Characterization

Macro model
Pre-calculated power cubic
Useful to apply on system level power estimation.
Input parameter of the macro models
Data and address bus width, or the operating
frequency
The number of masters and slaves
Input/output data characteristics
The switching activity, the probability of signal
or the Hamming distance of two successive data

27
Bus Power Analysis

AMBA AHB bus power analysis
A standard for on-chip communication
Power analysis process
Bus structure decomposition
Arbiter
Decoder
Multiplexer
Build macro model of eachcomponent
Bus behavior decomposition and build power FSM
IDLE, READ, WRITE, and IDLE with handover
Monitor bus signal activity
Power analysis through power FSM

Global bus power model
28
Interconnection Power Estimation

Power consumption on each wire
P ½ Vdd2 C f a
Vdd voltage swing between the logic level 1 and
0.
C capacitance of the wire.
f clock frequency.
a switching activity.
Vdd and f is given as fixed value.
We need to find C and a.
C can be obtained from wire capacitance model.
a can be obtained from system level simulation.

29
Interconnection Power Estimation

Wire capacitance model
eox constant, 3.45 x 10-13F/cm, permittivity of
SiO2
xint oxide thickness underneath the
interconnect
W interconnect width
L interconnect length
W, xint can be obtained from the technology
parameter.
L can be estimated from the area of the chip
(where A is area of the chip)

J. P. Uyemura, Circuit Design for CMOS VLSI
Kluwer Academic Publishers 1992.
30
Interconnection Power Estimation

Switching activity model
Switching activity can be obtained from bus
transactions.
Bus model monitors bus transition and counts bus
switching.

CPU
Bus model
mem
DSP
IP
Monitoring bus transition
System level simulation
31
Bus Power Estimation

Power estimation
Application example is simulated in system level
simulator.
Power estimator reports power consumption using
the power model of the bus components and
interconnection.
Monitored values in the bus transition are used
as the input of the power estimator.

CPU
Bus model
mem
Power Estimator
DSP
IP
System level simulator
32
System-level Power Optimization
33
Contents

Low Power System Implementation Techniques
Circuit level
Clock gating
MTCMOS
Multiple voltage supply
Architecture level
Memory Optimization
Bus Optimization
Dynamic Power Management in System Level
Introduction to DPM
Structure of DPM
Component-level DPM scheme
DPM Policy
Dynamic Voltage Scaling

34
Circuit Level Low Power System Implementation
Techniques

Clock gating
Most popular method for power reduction of clock
signals
Need circuit to generate enable signal
Increases complexity of control logic
Timing critical to avoid clock glitches at AND
gate output
Additional gate delay on clock signal

35
Circuit Level Low Power System Implementation
Techniques

MTCMOS
Low VTH devices in logic to maintain performance
when active.
High VTH current switch (header or footer) to
cutoff leakage path when sleep.
Scheduling algorithm which controls sleep signal
is important.

VDD
header
sleep
Virtual VDD
Logic
Input
Output
Virtual GND
sleep
footer
36
Circuit Level Low Power System Implementation
Techniques

Multiple Voltage Supply
Slows down non-critical path with lower voltage
supply
Two or more power grids
Need high-efficiency voltage converters for
dynamic voltage scaling
Dynamic power scheduling algorithm is important.

In

Low voltage supply
Critical path need high speed logic

High voltage supply
-

37
Architecture Level Low Power System
Implementation Techniques

Memory Optimization
Code density optimization
Goal
Minimize program memory occupation to reduce the
bandwidth of processor-memory communication
Approaches
Custom instruction sets
Object code compression

38
Memory Optimization

Custom instruction set
Shorter size instruction sets than regular
instruction sets
Example ARM Thumb code (16bit instruction)
Need a specific architecture for 16 bit
instruction support

Inst 5
Inst 4
Inst 4
Inst 5
Inst 3
Inst 2
Inst 2
Inst 3
In this case, 3/5 bandwidth reduction
Inst 1
Inst 1
32bit
32bit
39
Memory Optimization

Object code compression
The size of all instructions is same, but some or
all instructions are encoded and saved in
instruction memory.
Available solution for embedded processors
A specific architecture for different type of
instruction support is not needed.
Exploit the small subset of instructions used by
firmware code
Approaches
Full code compression
Selective code compression

40
Memory Optimization

Full code compression
Replace all instructions with binary patterns of
minimum width.
log2 N, where N is the number of instructions
Advantage
Memory bandwidth for instruction is decreased.
Disadvantage
Size of IDT may be very large because N is not
small.
log2 N may not be a multiple of 8.

Memory
Memory
Addr.
Core
Core
Addr.
Inst.
Inst.
IDT
log2N
k
k
k bits
log2N bits
IDT Instruction Decompression Table
41
Memory Optimization

Selective Code Compression
Almost program traces are covered by a small
subset of instructions.
Compression only such subset instructions that
maximize program coverage
Program is a mix of compressed and uncompressed
instructions.

Memory
Addr.
Core
Buffer
k
k
Inst.
IDT
8
8 bits
Controller
42
Memory Optimization

Advantage
Size of IDT is fixed and limited.
Instruction fetching/decompression logic has
reduced complexity.
Disadvantage
Requires a controller to handle instruction
fetching

43
Memory Optimization

Data density optimization
Same principle as code density optimization
For the purpose of reducing memory traffic
dynamic size of the data-set
More complex than code compression, because both
compression and decompression are required
Hardware compression/decompression unit needed
Design trade-off between speed and power

44
Architecture Level Low Power System
Implementation Techniques

Bus power optimization
A large amount of power is dissipated in data
communication over heavily-loaded on-chip or
off-chip busses.
Reduce switching activity on busses via signal
encoding for power saving
Approaches
Bus-invert coding
Gray code addressing

PBus n x C x Vdd2 x freq x activity
, for an n-bit bus
45
Bus Optimization

Bus-invert coding
Add redundant line INV to bus
When INV 0
Data is equal to remaining bus lines
When INV 1
Data is complement of remaining bus lines
At each cycle decide whether sending the true or
compliment signal leads to fewer toggles

Source data
Data bus
Received data
INV signal
Polarity Decision logic
46
Bus Optimization

Gray code addressing
Most instruction addresses are consecutive
Use Gray code to address
Word-oriented machines
Increments by 4 (32 bit) or by 8 (64bit)
Modify Gray code to switch 1 bit per increment
Gray code adder needed for jump

Dec Gray(i1) Gray(i4) Gray(i8)
0 1 2 3 4 5 6 7 8 0000 0001 0011 0010 0110 0111 0101 0100 1100 0000 0001 0011 0010 0100 0101 0111 0110 1100 0000 0001 0011 0010 0110 0111 0101 0100 1000
i increment
47
Introduction to DPM

Dynamic Power Management (DPM)
DPM controls power consumption of components
based on its usage.
Prediction of component usage is essential.
Methods
Shutdown (clock gating, power gating)
Slowdown (frequency scaling, voltage scaling, VTH
scaling)

f VDD
f VDD
idle
0.6 VDD
VDD
T/2
T
48
Structure of DPM

Levels of embodiments of DPM
Component level
Circuit, Block
Power mode
System level
Policy
The procedure which controls the power level of
each module in a system

Policy
System
power mode
power mode
request
request
Block 1
Block n

Circuit
Circuit
Circuit
Circuit

49
Component Level DPM Scheme

Circuit level
Clock off by clock gating
Power off by footer/header of MTCMOS
Multiple voltage supply
Block level
Power off by shutdown of power supply to IPs
When power off pattern of two block are similar,
shutdown together.

Virtual VDD
IP 1
Virtual GND
VDD source
IP 2
GND source
50
Component Level DPM Scheme

Power mode
Each state has combination of enabled DPM
technique.
ex) The case that system uses clock gating and
block shutdown

Transitions between modes of operation have a
cost.

P400mW
Run
90µs
10µs
Power mode Clock gating Block shutdown
Run disabled disabled
Idle enabled disabled
Sleep enabled enabled
10µs
160ms
P50mW
P0.16mW
90µs
Idle
Sleep
Wait for interrupt
Wait for wake-up event
Power state machine for the StrongARM processor
SA-100 Microprocessor Technical Reference Manual,
Intel, 1998
51
DPM Policy

Predictive technique
Uses a regression equation based on previous On
and Off times of the component to estimate the
next turn on time.
Limitation
It cannot handle components with more than two
power modes.

Go-to-sleep
Running (R)
Sleep (S)
Wake-up
Predictive power management scheme
Pre-wakeup scheme
R
R
I
R
R
I
R
R
I
R
R
I
delay
delay
delay
R
E
S
R
W
R
E
W
R
R
E
S
R
W
R
E
S
R
W
I
C.H. Hwang et al, A predictive system shutdown
method for energy saving of event-driven
computation, Proc. Int. Conf. on Computer Aided
Design, pages 28-32, Nov. 1997
M. Srivastava et al, Predictive system shutdown
and other architectural techniques for energy
efficient programmable computation, IEEE TVLSI,
Vol. 4, No.1 ,1996
I Idle state E Entering state W Waking up
state
52
DPM Policy

Markov process
Markov process is a process which uses a previous
state and pre-characterized probability to choose
next state.
Power management optimization has been studied
within the framework of Markov process.
When system is modeled as Markov chains
It can model the uncertainty in system power
consumption and response times.
It can model complex systems with many power
states, buffers, queues.
It can compute power management policies that are
globally optimum.

G.A. Paleologo et al, Policy optimization for
dynamic power management, Proc. DAC, 1998
53
DPM Policy

Structure of stochastic DPM
FSM of each module

Observation
Observation
Power Manager
Command
Service Requestor
Service Provider
queue
Request
54
Dynamic Voltage Scaling

DVS
Reducing VDD is a single most effective way to
reduce power consumption.
Reducing VDD is limited by the worst-case
condition.
Performance requirement varies with time.
Solution
Slowdown perform the job with just-in-time
performance

55
DVS Applied Processor

Transition overhead
Max 70µs for 580MHz transition
Max 4µJ for 580MHz transition

CPU
ARM Core
16KB Cache
64KB SRAM
System BUS
.
.
.
0.5MB
Bus interface
System Co-processor
Write Buffer
VCO
I/O Chip
Fdesired
VDD
Regulator
VBat
T.D. Burd et al, A dynamic voltage scaled
microprocessor system, IEEE JSSC, Nov. 2000
56
DPM using DVS on SoC