The Design and Use of SimplePower: A CycleAccurate Energy Estimation Tool - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

The Design and Use of SimplePower: A CycleAccurate Energy Estimation Tool

Description:

The Design and Use of SimplePower: A Cycle-Accurate Energy ... use the decoded control signals to selectively gate the clock of pipeline register fields ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 33
Provided by: vijaynaray
Category:

less

Transcript and Presenter's Notes

Title: The Design and Use of SimplePower: A CycleAccurate Energy Estimation Tool


1
The Design and Use of SimplePower A
Cycle-Accurate Energy Estimation Tool
  • W. Ye, N. Vijaykrishnan, M. Kandemir, M. J. Irwin
  • Microsystems Design Lab
  • The Pennsylvania State University

2
Why Power Matters
  • Packaging costs cooling costs
  • Power supply rail design
  • Digital noise immunity
  • Battery life (in portable systems)
  • Environmental concerns
  • Office equipment accounted for 5 of total US
    commercial energy usage in 1993
  • Energy Star compliant systems

3
Motivation
Abstraction Analysis Analysis
Analysis Analysis Power Level
Capacity Accuracy Speed Resources
Savings Most
Worst Fastest Least
Most Behavior (System) Architectural (RTL) Logic
(Gate) Transistor (Switch)
Least Best Slowest Most
Least
4
Architectural Level Analysis Considerations
  • Very computationally efficient
  • requires predefined analytical and
    transition-sensitive energy characterization
    models
  • Simulation based so can be used to support
    architectural, compiler, operating system, and
    application level experimentation
  • WattWatcher (Sente), DesignPower and
    PowerCompiler (Synopsys), prototype academic
    tools (Wattch - Princeton, Avalanche -
    Princeton/NEC)

5
SimplePower Framework
6
Functional Unit Characterization
  • Transition sensitive energy models
  • energy tables Adders, Register files
  • Transition aware energy models
  • system level interconnect
  • Analytical energy models
  • cache and main memory

7
Switch Capacitance Table
8
Table Compression
  • Problem
  • Results in large uncompressed table
  • (e.g., 16-bit adder ? 232 rows)
  • Excessive simulation (e.g., 232 !)
  • Existing Solutions
  • Clustering Algorithm
  • Reference Huzefa Mehta, et. al Module Energy
    Characterization using Clustering, DAC96
  • For 16-bit adder, to keep 12 average error ?
  • 1000 simulation points, 97 rows
  • Becomes complex for larger input widths to
    maintain accuracy after clustering

9
Modeling Solutions
  • Partitioning of unit into smaller sub-modules
  • Modeling adders, multipliers, shifters, register
    files
  • bit dependent decoder
  • bit independent each bit transition can be
    considered independently - pipeline registers,
    logic unit in ALU
  • Analytical Modeling
  • Structure is too large or complicated

10
Register File
Write Decoder
532
Write Data Drivers
Write Decoder
532
32 x 32 Cell Array
Read Decoder
Word Line Drivers
532
Read Decoder
532
Read Decoder
Read Sense Amps
532
Bit Independent
Bit Dependent
11
Decoder Characterization
  • A 532 decoder ? 210 row table!
  • Build 532 decoder out of smaller decoders
  • A 24 decoder (with enable) ? 26 row table

12
Analytical Energy Model Example
  • On-chip cache Kamble Off-chip Memory Shiue
  • Energy Ebus Ecell Epad Emain
  • Ecell ?(wl_length)(bl_length4.8)(Nhit
    2Nmiss)
  • wl_length m(T 8L St)
  • bl_length C/(mL)
  • Nhit number of hits Nmiss number of misses
    C cache size L cache line size in bytes
    m set associativity T tag size in bits St
    of status bits per line ? 1.44e-14
    (technology parameter)

13
Validation of Energy Model
HSPICE Power Consumption
Estimated Power Consumption
14
SimplePower Design Summary
  • Supports Integer Instruction Set of SimpleScalar
  • Models On-Chip Caches and Off-Chip Memory along
    with buses
  • Provides cycle-accurate energy information across
    different system components
  • Does not account for clock generation and
    distribution circuitry
  • Computationally efficient
  • Register file takes 0.1 second for each input
    sequence as opposed to 9 minutes for the HSPICE
    simulation

15
The Use of SimplePower
  • Can reuse the technology based files to evaluate
    other architectures
  • Number and type of Functional Units
  • Study Architectural Modifications and
    Optimizations
  • Number of pipeline stages
  • Gated-pipelining
  • Study Influence of Software
  • High-level Algorithmic Choices
  • High-level Compiler Optimizations
  • Low-level Compiler Optimizations

16
Compiler Framework
Benchmark source
17
Sample of Benchmark Set
18
Datapath Energy Consumption
19
Selectively Gated Pipeline Regs
  • Pipeline registers consume a large percentage of
    datapath power
  • 40 for 0.35?
  • Pipeline registers have large width
  • Pipeline registers are clocked every cycle
  • Not all clockings are necessary
  • use the decoded control signals to selectively
    gate the clock of pipeline register fields
  • only simple extra logic necessary
  • can be built into the clock buffer circuit

20
Gated Pipeline Registers
Instr SW r1, 0(r2)
MEM/WB
EXE/MEM
mem/wb_cntl
MemData
Address
D
Data
EXE
MEM
WB
21
Switch Capacitance Reduction
22
Compiler Framework
Benchmark source
23
Compiler Optimizations
  • High-level Optimizations
  • Inter and Intra Procedural Dataflow Optimization
  • Loop Transformations
  • Memory-Layout Transformation

24
Data Transformation Effects
25
Compiler Framework
Benchmark source
compiler transformations
Source to source translation
GCC
assembly code
GAS
object code
26
Low Level Optimizations
  • Instruction Scheduling
  • Register allocation
  • Operand Swapping
  • Register Relabelling

27
Register Relabeling
  • A post-compilation optimization
  • Exploits corresponding fields in consecutive
    instructions
  • Reduces bit switches on the instruction bus
  • Reduces the energy of the pipeline registers and
    register file decoder

28
Register Relabeling Example
29
Solution Steps
  • Construct a Register Transition Graph
  • Compiler analysis to record all consecutive
    transitions between all possible pairs of
    registers
  • Profiling (use training inputs to create sample
    traces)
  • Determine important paths
  • Paths that contain edges with high transition
    counts
  • Relabel the registers

30
Register Transition Graph
80
R5
R6
35
15
90
R1
R4
35
5
R3
100
R2
Registers with large transition counts should be
labeled using minimum Hamming distance
31
Icache Data Bus Reduction
32
Conclusions
  • SimplePower - cycle accurate simulator
  • Find energy hotspots in the architecture
  • Study hardware/software interaction
  • Architectural experiments
  • Selectively gated pipeline registers
  • 18-36 energy savings in datapath
  • Compiler optimization experiments
  • Data transformations Memory
  • 62 Reduction
  • Register relabeling Bus
  • 11.7 Reduction
Write a Comment
User Comments (0)
About PowerShow.com