Power Emulation: A New Paradigm for Power Estimation - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Power Emulation: A New Paradigm for Power Estimation

Description:

Estimation Time. Accuracy. RTL. System. Transistor. Logic. days. seconds. 5% 30 ... Better accuracy with information-theoretic approaches (Marculescu-95, Nemani-99) ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 29
Provided by: jcob9
Category:

less

Transcript and Presenter's Notes

Title: Power Emulation: A New Paradigm for Power Estimation


1
Power Emulation A New Paradigm for Power
Estimation
  • Joel Coburn, Srivaths Ravi,Anand Raghunathan

NEC Laboratories America, Inc. 4 Independence
Way Princeton, NJ 08540
2
Power Emulation
Power estimation flow
Circuit Simulation
Component input statistics
Power model evaluation for each circuit component
Aggregate power consumption of individual
components
Power profile
3
Outline
  • Conventional power estimation
  • Motivation for power emulation
  • Power emulation techniques
  • Proposed methodology
  • Results

4
How Power Estimation is Addressed
System-level
design
Power models for system-level components
System-level power analysis
High-level synthesis,
RTL optimizations
Power models for macroblocks and control logic
Architecture-level power analysis
Levels of the design flow
Logic synthesis
days
Layout
Power models for gates, cells, and nets
Transistor
Logic-level power analysis
Good speed/accuracy trade-off
Logic
Estimation Time
Transistor-level/
RTL
Layout synthesis
Algorithm
System
Transistor models, wire models
Transistor-level power analysis
seconds
5
30
Accuracy
5
RTL Power Estimation
  • Analytical techniques
  • Correlate power consumption to simple measuresof
    design complexity
  • Use gate count and user-specified activity
    factors (Glaser-91)
  • Useful for regular structures (memories and clock
    networks) (Liu-94)
  • Better accuracy with information-theoretic
    approaches (Marculescu-95, Nemani-99)
  • Characterization-based macromodeling
  • Characterize a lower level implementation of
    anRTL block
  • Characterize as a constant power value
    (Powell-90)
  • Characterize as a function of input signal
    statistics (Landman-95, Raghunathan-96, Mehta-96,
    Benini-96,Gupta-00)
  • Address training data bias, fitting data
    errors(Bogliolo-98, Corgnati-99)

6
RTL Power Estimation
  • Fast synthesis based power estimation
  • Map design through low-effort to a netlist for
    power estimation (Llopis-98)
  • Speedup techniques
  • Statistical sampling (Ravi-03)
  • Circuit partitioning for parallel mixed-level
    simulation (Chinosi-99)
  • Commercial/In-house Tools
  • PowerTheater (Sequence Design)
  • PowerChecker (Bulldast)
  • CYBER RTL Power Estimation (NEC)
  • Orinoco (ChipVision)

7
Power Emulation Technology
  • RTL power estimation is too slow forlarge
    designs
  • New paradigm for power estimation!
  • Use emulation to accelerate RTL power estimation

Time to decode 4 video frames
Testbench
Outputs
2 to 3 orders of magnitude speedup possible !
Power
Host PC
FPGA platform
8
NECs RTL Power Estimation Flow
Characterization-based macromodeling
Behavior
RTL library
Synthesis conditions
Testbench/ stimuli
Synthesis PR
Behavioral Synthesis
Power model library generator
NECs C-based design flow (CYBER)
RTL netlist
RTL simulation
Post-layout netlists
Powerlib.vhd
CHARACTERIZATION FLOW
Power model inference and estimation code
generation
Power characterization
Powerlib.v
Powerlib.c
Enhanced RTL
Power macro-model database
Power
Simulateable Power Model Libraries
Output
Input
Power Profiles
9
Enhanced RTL Design
first
last
value
data
Controller
FSM
? ? ?
1
Power Model
  • Components for power estimation
  • Power models for every component Monitor
    component I/O values and compute power
  • Power strobe generator Trigger power models
    (statistical sampling employed for improved
    efficiency since RTL simulation can also be slow
    for large designs)
  • Power aggregator Compute total power consumption

Functional Units
-1
Power Model
gtgt 1
Registers
reg_c0
reg_first
reg_last
reg_out
reg_c1
reg_c1
reg_mid
Power Model
Bus 1
Bus 2
Bus 3
addr
out
Power Aggregator
Power Model
Power Model
Power Model
Power Model
? ? ?
Power Strobe Generator
Total Power
10
Power Model Architecture
Queues
Power summation
Component Inputs/Outputs
Transition count function
  • What does the power model contain?
  • Queues to store present and past values
  • Transition count function is a simple computation
  • Coefficients aggregated based on output of
    transition count function

11
Power Emulation Challenges
  • Size of design enhanced with power models isvery
    large!
  • Size increases an average of 18.2X for MPEG4
    sub-designs
  • Enhanced version exceeds capacity of largest
    Xilinx Virtex-II FPGA

20.4X
Capacity of XC2V8000 FPGA
20.6X
17.7X
16.3X
17.5X
Need to reduce the area requirements of power
models !
14.7X
15.0X
12
Power Emulation Challenges
  • Why area increase?
  • Resource-hungry power models used for every RTL
    component in the design
  • How to reduce area?
  • Optimize the number of power models used
  • Make the implementations of power models
    resource-efficient
  • Catch Ensure minimum loss of estimation accuracy
    due to area reduction techniques

13
Area Optimization Techniques
  • Clustering of power models
  • Single power model servicing multiple components
  • Changing component granularity
  • Constructing power models for complex components
    that subsume several smaller components
  • Exploiting correlation
  • Using power correlation between components to
    reduce the number of monitored components
  • Optimizing power model implementations
  • Multi-cycling additions in power model
    computations
  • Using FPGA block memories for efficient storage
    of power model coefficients

14
Clustering
  • Construct a generic power model that is
    responsible for a cluster of (say, M) components
  • Hardware must be added to support multiple
    components
  • Multiplexers for component I/Os and coefficient
    addresses

Component selector
Input bits
N max(I/O) pins among serviced components
N
Comp_1
Comp_2
  • In a given cycle, a generic power model can
    monitor only one component
  • Similar to sampling previously used for power
    estimation
  • Can cause estimation error

32
N
M1
Inputs
Power
Power
Changes to monitor different comp.
  • But the maximum number of I/O pins from serviced
    components determines power model bitwidth
  • Extra bits are wasted for some components
  • Zero padding used for coefficients and I/O bits
    of components with bitwidth lt N

? ? ?
? ? ?
N-bit power model
Comp_N
log2M
POW_STROBE
sel_comp
sel_comp
Coefficients
Coefficient Address
Ncoeff_width
K
Comp_1 0000
Coefficient ROM
Comp_2 0001
  • Clustering saves area because M dedicated power
    models are collapsed into a single power model
  • Queues and adders are shared

Comp_3 0001
K
M1
addr
dout
Hardwired comp. addresses
Ultra-wide memory for max coeff. BW
? ? ?
? ? ?
Comp_N 1011
clk
CLOCK
15
Clustering Area/Accuracy Trade-offs
  • Estimate error increases with larger clusters
  • Area first decreases and then increases with more
    clustering
  • Multiplexer and select logic area dominate for
    large clusters

Area vs Error trade-offs
Area
Estimate Error

16
Changing Component Granularity
  • Hardware overhead for power models depends on
    granularity of RTL components
  • High overhead for small granularity components
    (e.g., logic gates)
  • Increase component granularity for power modeling
  • Estimation error increases, since internal
    signals are not visible tothe power model

Error vs. component granularity
17
Exploiting Correlations Between Component Power
Consumptions
  • Given two components x and y, approximate
    Power(y) as f( Power(x) )
  • Power correlations occur due to internal circuit
    structure, e.g., fanout, logic replication
  • No power model needed for y
  • f( ) should be a simple function

Non-linear correlation
Strong linear correlation
Weak linear correlation
18
Power Emulation Flow
Power Model library
Resource sharing
0
Optimized power model library
Testbench
Power model inference and estimation
code generation
FPGA synthesis, PR
Optimize for area and minimize error
Download to FPGA Execute
Power Profile
RTL design
3
4
2
1
19
Optimization Methodology
Enhanced RTL design, testbench, power model
library, parameters target_area and k
Apply hierarchical clustering with area-based
objective function to determine component
groups to generic power model mappings
Short RTL simulation to generate component
power profiles
target_area, k
k valid solutions
Component Power Profiles
Determine optimum sampling rate for each
component
Compute mean, variance for component power
inter-component power correlation factor
Multi-way component swapping to minimize
undersampling
for_all_k
Use inter-component power profile correlations
to collapse component list
Choose solution with the lowest undersampling
(i) Select component sets suitable for higher
granularity power models (ii) Update power model
library, design
Power emulation ready RTL
20
Experimental Procedure
design
Synthesis transformation, Area optimizations
Create ROM init files
Behavioral Synthesis (CYBER)
Build coefficient ROMs Xilinx CoreGenerator
Power- enhanced RTL
RTL description
Power emulation- ready RTL
RTL power library
FPGA Synthesis Synplify Pro
Sequence Design PowerTheater
RTL Power Estimation
Place Route, FPGA Configuration Xilinx ISE
ModelSim
targeting Xilinx Virtex-II FPGA
RTL Power Estimation
Power Emulation
21
Experimental Results
  • Evaluation on various designs, each compared with
    CYBER-RTL and PowerTheater

Nearly 500X speedup possible !
  • Upto 500X speedup compared to RTL power
    estimation
  • 3 Loss of accuracy on an average
  • Area overheads lowered to 3X

22
Conclusions
  • Power Emulation is a promising way to perform
    fast power estimation
  • Extends capabilities of current powerestimation
    tools

23
Thank you!Questions?
24
An Example Correlation Distribution
  • Components exhibiting a correlation coefficient
    above a threshold value (say 0.5) can be grouped
    together and replaced with one scaled power model
  • RTL component example from Bubble Sort design

Contributes 1.04 of total estimated power
36 components with ? gt 0.5
25
Resource Sharing
  • Estimation error decreases with more adders
  • Area first decreases and then increases with more
    adders
  • Scheduling overhead dramatically affects area
    adders contribute less area because of dedicated
    carry-chains in FPGA architecture

Area vs Error trade-offs
Area
Estimate Error
26
Hierarchical Clustering
  • Inputs list of components, target area
    constraint
  • Outputs k valid solutions that meet the target
    area constraint
  • Initially, every component forms a cluster with
    its own power model
  • Look at pairwise cost of combining two clusters
    into a single cluster
  • Choose the pair of clusters that combine to
    result in the best area savings
  • Update the bitwidth of the resultant cluster
  • Repeat the previous steps until k solutions meet
    the target area constraint or all component are
    in a single cluster

27
Determining Optimum Component Sampling Rates
  • Observation components whose power consumption
    characteristics are associated with a higher mean
    or variance must be sampled more frequently
  • Objective minimize the aggregate error due to
    sampling

Aggregate error
Component weights
Minimization constraints
Use solver to get values of N
28
Minimizing Undersampling
  • Estimate error introduced for a component is
    computed by finding its distance from the optimum
    sampling rate
  • Aggregate undersampling for the present
    clustering solution
  • Minimize by using an iterative improvement
    algorithm based on the Kernighan-Lin heuristic
  • Move components to other clusters to reduce
    undersampling while ensuring that the target area
    constraint is not violated
Write a Comment
User Comments (0)
About PowerShow.com