System-level Power Estimation and Optimization - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

System-level Power Estimation and Optimization

Description:

Title: Bottom-up Approach Author: HS Last modified by: jaemoon Created Date: 6/22/2005 5:56:35 AM Document presentation format: – PowerPoint PPT presentation

Number of Views:204
Avg rating:3.0/5.0
Slides: 57
Provided by: hs8160
Category:

less

Transcript and Presenter's Notes

Title: System-level Power Estimation and Optimization


1
System-level Power Estimationand Optimization
  • 2006.09.03
  • Chong-Min Kyung
  • KAIST

2
Contents
  • Introduction
  • System-level Power Estimation
  • System-level Power Optimization

3
Introduction
  • Power classification
  • Static power
  • leakage power
  • Dynamic power
  • Switching power
  • Short-circuit power
  • Glitch power

4
Introduction
  • Power calculation
  • Static Power
  • Ptotal_leakage ? Pcell_leakage
  • Dynamic Power
  • Pinternal Func (Cload,TRinput,output)
  • TR toggle rate
  • Pglitch V2dd ?(Cloadnet fglitch t)
  • fglitch the frequency of glitch
  • t the factor of the width of glitch
  • Pswitching ½V2dd ?(Cload TRoutput)
  • Ratio
  • gt 0.1um switching power 7090
  • lt 0.07um leakage power gt 50
  • Data intensive application
  • Switching power is a dominant factor.

5
Introduction
  • Opportunities for power reduction
  • For low power design
  • 1. Power model generation
  • 2. Power estimation
  • 3. Power optimization

Commercial tool
Academic research
6
System-level Power Estimation
7
Contents
  • Power Model Generation
  • Analytical Method
  • Empirical Method
  • System-level Power Estimation
  • Hardware Power Estimation
  • Software Power Estimation
  • Bus Power Estimation

8
Power Model Generation
  • 1. Analytical method
  • Use average values of design parameters without
    different circuit styles, clock strategies and
    layout techniques consideration
  • Average capacity, equivalent gate count, primary
    input number, etc.
  • Mainly used for behavior-level power estimation
  • when there is no information about technology
    library and implementation information
  • Very low accuracy
  • 2. Empirical method
  • Use the parameters measured by existing
    implementations
  • 2-1. Fixed-activity model
  • 2-2. Activity-sensitive model

9
Power Model Generation
  • 2-1. Fixed-activity model
  • Use data sheet of a specific hardware block
  • Pprocessor Cprocessor x VDD2 x freq
  • Cprocessor Pdata_sheet / (Vdata_sheet2
    freqdata_sheet)
  • Low accuracy
  • Mainly used for coarse-grained system-level power
    estimation
  • 2-2. Activity-sensitive model
  • Use signal activity or its statistics which
    depends on testbench
  • Transition-sensitive model
  • Power model is a Look-Up Table (LUT).
  • Very high accuracy
  • Statistical activity model
  • Power model is a LUT or an equation.
  • High accuracy

10
Macro Modeling Method
  • Macro modeling method
  • Raise abstraction of power model by
    characterizing macro cell
  • Mainly used to reduce power model complexity in
    activity-sensitive power model generation
  • Macro cell
  • 32-bit adder, multiplier, MUX, etc.
  • Reduced computation complexity at the cost of
    accuracy
  • Macro cell characterization
  • Synthesize macro cell with basic cell library
  • Estimate power value of macro cell with various
    testbench
  • Generate power model and reduce its complexity
  • This concept can be used for raising abstraction
    of power model in hardware or software-level
    power estimation.

11
Macro Modeling Method
  • Power model of macro modeling method
  • Statistical activity model
  • LUT-based model
  • For each bus component, build 3-D LUT (with axes
    of Pin, Din, Dout)
  • Fill power value at each point (Pin, Din, Dout)
  • Requires a lot of memory space
  • Equation-based model
  • Build a polynomial approximating power
    consumption.
  • From a large number of input patterns, perform
    analysis to determine the coefficients.
  • Requires little memory space

Pin average input signal probability Din
average input switching activity Dout average
output zero delay switching activity
12
System-level Power Estimation
  • Estimation speed and power model
  • Trade-off between estimation speed and accuracy
    of power model
  • Abstraction of power estimation
  • System-level power estimation
  • Software-level power estimation
  • Hardware-level power estimation
  • Behavior-level, RT-level, gate-level,
    circuit-level

Relative power results
Absolute power results
13
System-level Power Estimation
  • System-level power estimation
  • Relative value of power consumption is important.
  • Objective
  • Power profiling and design exploration
  • System-level power estimation is composed of
  • 1. Hardware power estimation
  • 2. Software power estimation in processor
  • 3. Bus power estimation

14
Hardware Power Estimation
  • RT-level power estimation
  • Dynamic simulation-based power estimation with
    coarse-grained net model from power macro model
    database and testbench

15
Hardware Power Estimation Tool
  • There are some commercial tools for hardware
    power estimation
  • RT-level
  • Synopsys Power CompilerTM
  • Gate-level
  • Synopsys Prime PowerTM, Synopsys Power CompilerTM
  • Circuit-level
  • SPICE, Synopsys PowerMillTM, Cadence
    VoltageStormTM

16
Software Power Estimation
  • Estimation
  • Processor is too complex to estimate in RT-level.
  • Power consumption is related to each instruction
    and instruction sequence.
  • Estimation method
  • Power model is added to ISS for instruction-level
    power profiling.
  • Bi energy consumption of inst. i
  • Ni number of execution of inst. i
  • Oij energy consumption when inst. i is followed
    by inst. j
  • Nij number of pair inst. i and inst. j
  • Sk other inst. Effect such as cache misses,
    pipeline stall, etc.

17
Software Power Estimation
  • Power model
  • Instruction-level power model
  • Inter-instruction effect consideration
  • Dynamic effect (cache miss, branch prediction,
    etc)
  • Power modeling method
  • 1) White-box approach
  • 2) Black-box approach

18
White-box Approach
  • Power model
  • Activity-sensitive model
  • Characterization
  • Use macro modeling method
  • Process
  • Run gate-level simulation
  • Find predominant parameter
  • Reduce power model complexity
  • Simple equation or reduced LUT
  • Make instruction-level power model
  • Accuracy is degraded and estimation speed is
    increased by reducing the power model complexity.

19
Black-box Approach
  • Characterization flow
  • Measurement
  • Characterization

Measurement
I(t)
R
V
r
V
Characterization
Instruction-level Power Model
20
Black-box Approach
  • Measurement
  • By current measurement of real chip
  • Power model
  • Activity-sensitive power model
  • Statistical activity model
  • Characterization process
  • Current is estimated using real chip with
    multiple iterations of subroutine
  • Compare measured value with ISS including dynamic
    effects
  • Find a power equation which is similar to the
    measured power graph
  • Decide coefficients of power equation by
    experimental iteration
  • ? It is important to find the closest equation to
    the measurement results.

21
Black-box Approach
  • Measurement method
  • Program under measurements are isolated by using
    interrupt signal, NOP instruction and processor
    wait state for finding exact measurement position
    and for synchronization.

R. Muresan and C. Gebotys, Current
dynamics-based macro-model for power simulation
in a complex VLIW DSP processor, IEE
proc.-Comput. Digit. Tech., 2002
22
Software Power Estimation Toolfor Research
Purpose
  • SimplePower
  • Functional simulator
  • SimplePower core
  • based on SimpleScalar ISA
  • Power model
  • Activity sensitive power model
  • Direct simulation and profiling based
  • on input transitions
  • Generate switch capacitance tables

Cycle-accurate activation information
Implementation-based signal generation
23
Software Power Estimation Tool for Research
Purpose
  • Wattch
  • Architecture-level power estimation
  • Functional simulator
  • SimpleScalar cycle-level performance simulator
  • Power model
  • Fixed activity power model
  • Categories
  • Array structure
  • Fully associative CAM
  • Combinational logic and wires
  • Clocking logic
  • Example Array structure
  • Power C1 C2 A C3 B
  • A Bit line number, B Word line number
  • C1 Diffusion cap., C2 Gate cap.,
  • C3 Metal cap.

24
Bus Power Estimation
  • Power consumed on the bus consists of two parts
  • Bus component power
  • Power consumed internally in the bus components
  • Arbiter, decoder, muxes
  • Interconnection power
  • Power consumed on the bus wires that connect the
    master and slave interfaces and the bus
    components
  • Address bus, data bus, control signals

25
Bus Component Power Estimation
  • At System level, only the structural information
    about bus architecture can be obtained.
  • Bus interconnection
  • Bus width
  • Global bus power model is used for estimation
  • Characterized power model of bus component is in
    the global bus power model
  • Arbiter, decoder, multiplexer
  • Behavior, FSM

Processor
IP 1
IP 2
Memory
Global Bus Power Model
bus
26
Bus Component Characterization
  • Macro model
  • Pre-calculated power cubic
  • Useful to apply on system level power estimation.
  • Input parameter of the macro models
  • Data and address bus width, or the operating
    frequency
  • The number of masters and slaves
  • Input/output data characteristics
  • The switching activity, the probability of signal
    or the Hamming distance of two successive data

27
Bus Power Analysis
  • AMBA AHB bus power analysis
  • A standard for on-chip communication
  • Power analysis process
  • Bus structure decomposition
  • Arbiter
  • Decoder
  • Multiplexer
  • Build macro model of eachcomponent
  • Bus behavior decomposition and build power FSM
  • IDLE, READ, WRITE, and IDLE with handover
  • Monitor bus signal activity
  • Power analysis through power FSM

Global bus power model
28
Interconnection Power Estimation
  • Power consumption on each wire
  • P ½ Vdd2 C f a
  • Vdd voltage swing between the logic level 1 and
    0.
  • C capacitance of the wire.
  • f clock frequency.
  • a switching activity.
  • Vdd and f is given as fixed value.
  • We need to find C and a.
  • C can be obtained from wire capacitance model.
  • a can be obtained from system level simulation.

29
Interconnection Power Estimation
  • Wire capacitance model

  • eox constant, 3.45 x 10-13F/cm, permittivity of
    SiO2
  • xint oxide thickness underneath the
    interconnect
  • W interconnect width
  • L interconnect length
  • W, xint can be obtained from the technology
    parameter.
  • L can be estimated from the area of the chip
  • (where A is area of the chip)

J. P. Uyemura, Circuit Design for CMOS VLSI
Kluwer Academic Publishers 1992.
30
Interconnection Power Estimation
  • Switching activity model
  • Switching activity can be obtained from bus
    transactions.
  • Bus model monitors bus transition and counts bus
    switching.


CPU
Bus model
mem
DSP
IP
Monitoring bus transition
System level simulation
31
Bus Power Estimation
  • Power estimation
  • Application example is simulated in system level
    simulator.
  • Power estimator reports power consumption using
    the power model of the bus components and
    interconnection.
  • Monitored values in the bus transition are used
    as the input of the power estimator.


CPU
Bus model
mem
Power Estimator
DSP
IP
System level simulator
32
System-level Power Optimization
33
Contents
  • Low Power System Implementation Techniques
  • Circuit level
  • Clock gating
  • MTCMOS
  • Multiple voltage supply
  • Architecture level
  • Memory Optimization
  • Bus Optimization
  • Dynamic Power Management in System Level
  • Introduction to DPM
  • Structure of DPM
  • Component-level DPM scheme
  • DPM Policy
  • Dynamic Voltage Scaling

34
Circuit Level Low Power System Implementation
Techniques
  • Clock gating
  • Most popular method for power reduction of clock
    signals
  • Need circuit to generate enable signal
  • Increases complexity of control logic
  • Timing critical to avoid clock glitches at AND
    gate output
  • Additional gate delay on clock signal

35
Circuit Level Low Power System Implementation
Techniques
  • MTCMOS
  • Low VTH devices in logic to maintain performance
    when active.
  • High VTH current switch (header or footer) to
    cutoff leakage path when sleep.
  • Scheduling algorithm which controls sleep signal
    is important.

VDD
header
sleep
Virtual VDD
Logic
Input
Output
Virtual GND
sleep
footer
36
Circuit Level Low Power System Implementation
Techniques
  • Multiple Voltage Supply
  • Slows down non-critical path with lower voltage
    supply
  • Two or more power grids
  • Need high-efficiency voltage converters for
    dynamic voltage scaling
  • Dynamic power scheduling algorithm is important.

In


Low voltage supply
Critical path need high speed logic

High voltage supply
-

37
Architecture Level Low Power System
Implementation Techniques
  • Memory Optimization
  • Code density optimization
  • Goal
  • Minimize program memory occupation to reduce the
    bandwidth of processor-memory communication
  • Approaches
  • Custom instruction sets
  • Object code compression

38
Memory Optimization
  • Custom instruction set
  • Shorter size instruction sets than regular
    instruction sets
  • Example ARM Thumb code (16bit instruction)
  • Need a specific architecture for 16 bit
    instruction support

Inst 5
Inst 4
Inst 4
Inst 5
Inst 3
Inst 2
Inst 2
Inst 3
In this case, 3/5 bandwidth reduction
Inst 1
Inst 1
32bit
32bit
39
Memory Optimization
  • Object code compression
  • The size of all instructions is same, but some or
    all instructions are encoded and saved in
    instruction memory.
  • Available solution for embedded processors
  • A specific architecture for different type of
    instruction support is not needed.
  • Exploit the small subset of instructions used by
    firmware code
  • Approaches
  • Full code compression
  • Selective code compression

40
Memory Optimization
  • Full code compression
  • Replace all instructions with binary patterns of
    minimum width.
  • log2 N, where N is the number of instructions
  • Advantage
  • Memory bandwidth for instruction is decreased.
  • Disadvantage
  • Size of IDT may be very large because N is not
    small.
  • log2 N may not be a multiple of 8.

Memory
Memory
Addr.
Core
Core
Addr.
Inst.
Inst.
IDT
log2N
k
k
k bits
log2N bits
IDT Instruction Decompression Table
41
Memory Optimization
  • Selective Code Compression
  • Almost program traces are covered by a small
    subset of instructions.
  • Compression only such subset instructions that
    maximize program coverage
  • Program is a mix of compressed and uncompressed
    instructions.

Memory
Addr.
Core
Buffer
k
k
Inst.
IDT
8
8 bits
Controller
42
Memory Optimization
  • Advantage
  • Size of IDT is fixed and limited.
  • Instruction fetching/decompression logic has
    reduced complexity.
  • Disadvantage
  • Requires a controller to handle instruction
    fetching

43
Memory Optimization
  • Data density optimization
  • Same principle as code density optimization
  • For the purpose of reducing memory traffic
  • dynamic size of the data-set
  • More complex than code compression, because both
    compression and decompression are required
  • Hardware compression/decompression unit needed
  • Design trade-off between speed and power

44
Architecture Level Low Power System
Implementation Techniques
  • Bus power optimization
  • A large amount of power is dissipated in data
    communication over heavily-loaded on-chip or
    off-chip busses.
  • Reduce switching activity on busses via signal
    encoding for power saving
  • Approaches
  • Bus-invert coding
  • Gray code addressing

PBus n x C x Vdd2 x freq x activity
, for an n-bit bus
45
Bus Optimization
  • Bus-invert coding
  • Add redundant line INV to bus
  • When INV 0
  • Data is equal to remaining bus lines
  • When INV 1
  • Data is complement of remaining bus lines
  • At each cycle decide whether sending the true or
    compliment signal leads to fewer toggles

Source data
Data bus
Received data
INV signal
Polarity Decision logic
46
Bus Optimization
  • Gray code addressing
  • Most instruction addresses are consecutive
  • Use Gray code to address
  • Word-oriented machines
  • Increments by 4 (32 bit) or by 8 (64bit)
  • Modify Gray code to switch 1 bit per increment
  • Gray code adder needed for jump

Dec Gray(i1) Gray(i4) Gray(i8)
0 1 2 3 4 5 6 7 8 0000 0001 0011 0010 0110 0111 0101 0100 1100 0000 0001 0011 0010 0100 0101 0111 0110 1100 0000 0001 0011 0010 0110 0111 0101 0100 1000
i increment
47
Introduction to DPM
  • Dynamic Power Management (DPM)
  • DPM controls power consumption of components
    based on its usage.
  • Prediction of component usage is essential.
  • Methods
  • Shutdown (clock gating, power gating)
  • Slowdown (frequency scaling, voltage scaling, VTH
    scaling)

f VDD
f VDD
idle
0.6 VDD
VDD
T/2
T
48
Structure of DPM
  • Levels of embodiments of DPM
  • Component level
  • Circuit, Block
  • Power mode
  • System level
  • Policy
  • The procedure which controls the power level of
    each module in a system

Policy
System
power mode
power mode
request
request
Block 1
Block n


Circuit
Circuit
Circuit
Circuit

49
Component Level DPM Scheme
  • Circuit level
  • Clock off by clock gating
  • Power off by footer/header of MTCMOS
  • Multiple voltage supply
  • Block level
  • Power off by shutdown of power supply to IPs
  • When power off pattern of two block are similar,
    shutdown together.

Virtual VDD
IP 1
Virtual GND
VDD source
IP 2
GND source
50
Component Level DPM Scheme
  • Power mode
  • Each state has combination of enabled DPM
    technique.
  • ex) The case that system uses clock gating and
    block shutdown
  • Transitions between modes of operation have a
    cost.

P400mW
Run
90µs
10µs
Power mode Clock gating Block shutdown
Run disabled disabled
Idle enabled disabled
Sleep enabled enabled
10µs
160ms
P50mW
P0.16mW
90µs
Idle
Sleep
Wait for interrupt
Wait for wake-up event
Power state machine for the StrongARM processor
SA-100 Microprocessor Technical Reference Manual,
Intel, 1998
51
DPM Policy
  • Predictive technique
  • Uses a regression equation based on previous On
    and Off times of the component to estimate the
    next turn on time.
  • Limitation
  • It cannot handle components with more than two
    power modes.

Go-to-sleep
Running (R)
Sleep (S)
Wake-up
Predictive power management scheme
Pre-wakeup scheme
R
R
I
R
R
I
R
R
I
R
R
I
delay
delay
delay
R
E
S
R
W
R
E
W
R
R
E
S
R
W
R
E
S
R
W
I
C.H. Hwang et al, A predictive system shutdown
method for energy saving of event-driven
computation, Proc. Int. Conf. on Computer Aided
Design, pages 28-32, Nov. 1997
M. Srivastava et al, Predictive system shutdown
and other architectural techniques for energy
efficient programmable computation, IEEE TVLSI,
Vol. 4, No.1 ,1996
I Idle state E Entering state W Waking up
state
52
DPM Policy
  • Markov process
  • Markov process is a process which uses a previous
    state and pre-characterized probability to choose
    next state.
  • Power management optimization has been studied
    within the framework of Markov process.
  • When system is modeled as Markov chains
  • It can model the uncertainty in system power
    consumption and response times.
  • It can model complex systems with many power
    states, buffers, queues.
  • It can compute power management policies that are
    globally optimum.

G.A. Paleologo et al, Policy optimization for
dynamic power management, Proc. DAC, 1998
53
DPM Policy
  • Structure of stochastic DPM
  • FSM of each module

Observation
Observation
Power Manager
Command
Service Requestor
Service Provider
queue
Request
54
Dynamic Voltage Scaling
  • DVS
  • Reducing VDD is a single most effective way to
    reduce power consumption.
  • Reducing VDD is limited by the worst-case
    condition.
  • Performance requirement varies with time.
  • Solution
  • Slowdown perform the job with just-in-time
    performance

55
DVS Applied Processor
  • Transition overhead
  • Max 70µs for 580MHz transition
  • Max 4µJ for 580MHz transition

CPU
ARM Core
16KB Cache
64KB SRAM
System BUS
.
.
.
0.5MB
Bus interface
System Co-processor
Write Buffer
VCO
I/O Chip
Fdesired
VDD
Regulator
VBat
T.D. Burd et al, A dynamic voltage scaled
microprocessor system, IEEE JSSC, Nov. 2000
56
DPM using DVS on SoC
  • Divide SoC into 4 power domains
  • Persistent 3.3V I/O drivers and receivers
  • Persistent 1.0V PLL
  • Persistent 1.8V RTC, sleep management
  • DVS 1.0V 1.8V (10mV/µs)

K.J. Nowka et al, A 32-bit PowerPC
System-on-a-Chip with support for dynamic voltage
scaling and dynamic frequency scaling, IEEE
JSSC, Nov. 2002
Write a Comment
User Comments (0)
About PowerShow.com