Title: Rapid Estimation of Power Consumption for Hybrid FPGAs
1Rapid Estimation of Power Consumption for Hybrid
FPGAs
- Chun Hok Ho1, Philip Leong2, Wayne Luk1, Steve
Wilton3 - 1 Department of Computing, Imperial College
London - 2 Department of Computer Science and Engineering,
Chinese University of Hong Kong - 3 Department of Electrical and Computer
Engineering, University of British Columbia
9 September 2008
2Overview
- 1. Motivation
- 2. Contributions
- 3. Related Work
- 4. Rapid Power Estimation Flow
- 5. Technology Mapper
- 6. Evaluation
- 7. Future work Conclusion
3Motivation
- For a new hybrid FPGA architecture
- How do we assess power dissipation rapidly?
- How do we map application into such architecture
effectively?
4Contributions
- High level power estimation flow
- Estimate the power using various vendor toolchain
and technique - Hybrid FPGA technology mapper
- Produce netlist/bitstream based on dataflow graph
(DFG)
5Related work Hybrid FPGA architecture 1
D9, M4, R3, F3, 2 add, 2 mul best density
over benchmarks
1 C. Ho et. al , Domain-Specific Hybrid FPGA
Architecture and Floating Point Applications,
FPL 2007
6Related work Virtual Embedded Blocks 1
- Dummy blocks used to model coarse-grained
blocks area and delay - Timing analyzer can be used to determine
hybrids performance (including fine-to-coarse
routing and delays)
1 C. Ho et. al, Virtual Embedded Blocks A
Methodology for Evaluating Embedded Elements in
FPGAs , FCCM 2006
7Power estimation flow
- Different tools chain involved
- VEB modelling flow
- FPGA power spreadsheet model
- ASIC power compiler flow
- Limitation
- Dynamic power consumption only (power loss due
to switching activity) - Constant activity rate is assumed
- Core only no I/O power is assessed
- First order estimation
- Accurate simulation based model is required
8Power estimation flow
- Pall Total power dissipations
- Pfgu power dissipated in fine-grained unit
(FGU) - Pcgu power dissipated in coarse-grained unit
(CGU) - Pr power dissipated in routing between FGU and
CGU
9Power estimation flow (Pfgu)
- Synthesis the circuit with VEB flow
- Measure the power of the circuit with spreadsheet
approach (P) - Constant activity rate of 12.5 applied
- Measure the power of the VEB with spreadsheet
approach (Pveb) - Pfgu P - Pveb
10Power estimation flow (Pcgu)
- Synthesis the coarse-grained unit with ASIC flow
- Configure the ASIC netlist with bitstream
- Apply constant activity rate on all the nets
- Estimate the dynamic power with power compiler
tool
11Power estimation flow (Pr)
- Pr can be modeled by providing suitable output
loading in estimating Pcgu - Output loading can be calibrated by referring
existing embedded block - Embedded multiplier blocks in Virtex II is used
in calibration.
12Power estimation flow (Pr)
- Measure the power of multiplier in FPGA using
spreadsheet (Pem) - Implement a multiplier in ASIC flow
- Measure the power of ASIC multiplier (Pam)
- Adjust loading capacitance (CL)such that Pam
Pem - Apply CL in estimating Pcgu
13Technology mapper
- A tool for producing netlist/bitstream from high
level description - Reuse existing C-to-gate compiler
- CHiMPS 1
- Trident 2
- fly 3
- Only backend is different technology mapper
1 A. Putnam, et. al, CHiMPS A C-Level
Compilation Flow for Hybrid CPU-FPGA
Architectures, FPL 2008 2 J. Tripp, et. al,
Trident An FPGA Compiler Framework for
Floating-Point Algorithms, FPL 2005 3 C. Ho,
et. al, Fly - A Modifiable Hardware Compiler,
FPL 2002
14Technology mapper
15Technology mapper
- Greedy algorithm
- Not optimal but effective in most cases
- Pack as much operations in a single
coarse-grained unit as possible - No suitable block use soft core
- Coarse-grained units use up use soft core
16Mapping example
17Mapping example
- fadd tmp1, a, b
- fadd tmp2, c, d
18Mapping example
19Mapping example
- fsqrt tmp4, tmp3
- No square root dedicated block, use fine-grained
unit
20Mapping example
- fmul z, tmp4, g
- Instantiate another coarse-grained unit and
connect altogether
21Evaluation
- How effective of the technology mapper?
- Compare with optimal mapping
- How much power/energy can be reduced by
introducing coarse-grained unit? - Compare with existing FPGA devices
22Evaluation
- 8 benchmark circuits
- DSP computation kernels e.g. bfly
- Linear algebra e.g. mm3
- Complete application e.g. bgm
- Synthetic benchmark e.g. syn2
- Circuits are mapped to hybrid FPGA using
technology mapper - Synthesized to Xilinx Virtex II devices for
comparison
23EvaluationTechnology mapper
24EvaluationPower reduction
syn7 is implemented on XC2V8000-5
25EvaluationEnergy reduction
Energy reduced by 14 times on average
26Future work
- Integration of technology mapper into existing
compiler - Trident, fly
- Simulation based power estimation flow for more
accurate results - Power estimation comparison with HHVPR 1 flow
- Static power consumption?
1 N. Choy, et. al, Activity-Based Power
Estimation and Characterization of DSP and
Multiplier Blocks in FPGAs, FPT 2006
27Conclusion
- Rapid power estimation flow on hybrid FPGA
- VEB flow, FPGA power spreadsheet, ASIC power
compiler - Technology mapper for hybrid FPGA
- Target different coarse-grained units
- DFG input to cope with existing compiler
- Produce netlist and bitstream
- Assess hybrid FPGA power consumption
- Power reduced by 4 times
- Energy reduced by 14 times