Title: Power Driven Architecture Exploration
1Power Driven Architecture Exploration
2Overview / objective / aim of project
- Development of a power estimation framework
- Integration of power models for different
architectural components in SrijanSim - Processor Core
- Register File
- Cache
- Shared Bus
- Memory
- Case Study
- Sequential Benchmarks
- Parallel Benchmarks
3Energy Estimation Flow
- Different levels of Architecture
- Description
Application C Program
IMPACT
Processor Architecture Description
Binary Code (Memory Images)
MDES
System C
RTL Level VHDL
SRIJAN Simulator
Instruction Energy Library
Energy Performance Stats
4Instruction Energy Library Generation
Gate Level Verilog Generation
Processor RTL Model
Synopsys Design Compiler
Post Route Netlist, SDF, SPEF Generation
Instruction Energy Library
Cadence Soc Encounter
VCD File Generation
TestBench Instruction Data Memory
ModelSim Simulator
Power Estimation
Prime Power
5Energy Models VLIW Core
- Instruction Level Power Analysis
- Based on instruction set simulation
- Assign a given power cost (base energy cost) to
each single instruction of the instruction set. - During execution, certain inter-instruction
effects occur - Cost of a pair of instructions is always greater
than the base cost of each instruction in the
pair - Other inter-instruction effects are related to
resource constraints that can lead to stalls
6Energy Formula
E ? (Bi x Ni) ? (Oi,j x Ni,j) ? Ek
i i,j
k
Bi Base Energy Cost Oi.j Inter-instruction
effect Energy Cost Ek additional energy
penalties due to resource constraints
Require computation cost associated with every
pair of instructions O(N2), where N number
of instructions in ISA For VLIW processors, it
explodes to O(N2K), where K word length
7Modified VLIW Energy Formula
- Slot Independence
- Consider nop as the base energy
- Suppose VLIW instruction stream W w1,w2,wm
- E(W) S U(wnwn-1)
- U(wnwn-1) U(00) S v(wnk,wn-1k)
- Wnk operation issued on lane k by instruction
wn - Example
- Wn ALU NOP NOP NOP
- Wn-1 LS NOP ALU NOP
- U(wnwn-1) U(00) v(ALULS) v(NOPALU)
8Modified Energy Model for VLIW
- Cluster Similar Instructions based on cost
- T e1, e2, , et
- et energy consumption of instruction t
- Partition T into M clusters (C1, C2, , Cm)
- Memory Requirement
- O(KM2)
- Same Instructions with different addressing modes
give very similar energy values - Can be merged into same cluster
9Energy Model Register File
- Assumptions
- Energy consumption across Ports is independent
- E ? Ep ? (Er Ew)
- Er n1 x Er(1) n0 x Er(0)
- Ew n11 x Ew (1?1) n10 x Ew (1?0)
- n01 x Ew (0?1) n00 x Ew (0?0)
- Where,
- Ep Energy consumption on port p
- Er(x) Energy consumption on reading bit x
- Ew(x1?x2) energy consumption on bit transition
x1 ? x2
10Energy Model Cache
- Used eCACTI (enhanced Cacti)
- Takes into account leakage current
- More accurate than Cacti
- Gives Read/Write hit/miss energy values
- E Nrhit x Erhit Nrmiss x Ermiss
- Nwhit x Ewhit Nwmiss x Ewmiss
- Where,
- Nr/w Number of reads/writes
- Er/w Energy Consumption per read/write
11Energy Model Bus, Memory
- Simple Transaction Model
- Per Access Energy Assigned
12Energy Profiler
- In sync with execution profiler
- Comprehensive profile of multi-threaded
applications - Macro Profile
- Thread level energy information
- Flat Profile
- Detailed function level information
- Call Graph
- Information of all direct children
13Work Done (First Semester)
- Literature Survey
- Familiarization with Tool chain
- SrijanSim SrijanSoft
- Synthesis Power Estimation Tools
- Energy Library Generation
- Verification of Assumptions
- Slot Independence
- Data Independence
14Work Done (Second Semester)
- Integration of Energy Models into SrijanSim
- Verification of SrijanSim Results with Gate-level
Estimates - Case Studies over Sequential Parallel
benchmarks - Web Interface for SrijanSim
15Verification
- Verified for DSPStone Benchmarks
16Case Study Mpeg2 Decoder
- Simulated for several video streams
- Test Configurations
- Uni-Processor
- 2 Processors
- 2p1 hdr, slice_dec idct, add, pred, store
- 2p2 hdr, add, slice_dec idct, pred, store
- 2p3 hdr, pred, slice_dec idct, add, store
- 2p4 hdr, add, pred idct, slice_dec, store
- 3 Processors
- hdr, slice_dec idct, add pred, store
- 4 Processors
- hdr slice_dec idct, add pred, store
17Energy Vs No. of Frames
Energy Consumption linear wrt Number of Frames
3 processor configuration
2 processor configuration
18Energy Profile
19Time-Energy Comparison
20Energy-Time tradeoff
212 processor configurations
- 10 Variation in Energy consumption
- Less than 1 variation in performance
22Problems Faced
- Problem in SrijanSim Profiler
- Wrong threads statistics
- Problem in context switch handling
- Creates a new thread every time in context switch
- Blocked thread not resumed
- What was wrong?
- Incorrect value of stack pointer saved as stack
pointer of blocked thread - But, for resume, correct value of old threads SP
looked up - These values never matched gt Creation of new
thread
23SrijanSim Web Interface
- Features
- Support for Uni-processor Process Network
simulations - Maintains separate project workspaces
- Implements lock, to prevent simultaneous
simulation calls - Complete with
- Documentation
- Tutorials
- Help
24SrijanSim Web Interface contd..
- Inputs
- Uniprocessor
- Accepts C file
- Process Network
- Process Network Information
- Optional energy configuration files
- Outputs
- Execution statistics
- Execution Profile
- Energy Profile
- Simulation Trace
25(No Transcript)
26References
- M.Sami, D.Sciuto, C.Silvano, and V.Zaccaria. An
Instruction-Level Energy Model for Embedded VLIW
Architectures. In IEEE Transactions on CAD of
Integrated circuits and Systems, September 2002 - M.Sami, D.Sciuto, C.Silvano, and V.Zaccaria.
Energy Estimation and Optimization of Embedded
VLIW Processors based on Instruction Clustering.
In Design Automation Conference, June 2002 - Mahesh Mamidipaka and Nikil Dutt. eCACTI An
Enhanced Power Estimation Model for On-chip
Caches. Technical report, Center for Embedded
Computer Systems, University of California at
Irwine, October 2004 - Seongmoo Heo, Kenneth Barr, Mark Hampton, and
Krste Asanovic. Dynamic Fine-Grain Leakage
Reduction Using Leakage-Biased Bitlines.
Proceedings of the 29th Annual International
Symposium on Computer Architecture (ISCA.02)