Decomposition of Instruction Decoder for Low Power Design - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Decomposition of Instruction Decoder for Low Power Design

Description:

Static dissipation due to leakage circuit. Short-circuit dissipation. Charge and discharge of output load capacitor. Power Dissipation ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 23
Provided by: che136
Category:

less

Transcript and Presenter's Notes

Title: Decomposition of Instruction Decoder for Low Power Design


1
Decomposition of Instruction Decoder for Low
Power Design
  • TingTing Hwang
  • Department of Computer Science
  • Tsing Hua University

2
Power Dissipation
  • Static dissipation due to leakage circuit
  • Short-circuit dissipation
  • Charge and discharge of output load capacitor

3
Power Dissipation
  • Static dissipation due to leakage circuit
  • Short-circuit dissipation
  • Charge and discharge of output load capacitor

4
Dynamic Power Dissipation Model
  • P power dissipation
  • C load capacitance
  • E avg. transition count of the gate/ clock
    cycle
  • Vdd supply voltage
  • Tcyc clock period

5
Dynamic Power Dissipation Model
  • P power dissipation
  • C load capacitance
  • E Avg. transition count of the gate/ clock
    cycle
  • Vdd supply voltage
  • Tcyc clock period

6
Motivation
  • Execution frequency of instructions is uneven
  • Take MOV class as an example
  • three instructions
  • 22 execution frequency

Profiling from Powerstone
7
Coupling Sub-decoders
  • Partition an instruction decoder into two
    coupling sub-decoders
  • The smaller decoder decodes only a small number
    of instructions
  • When the smaller decoder is active, the larger
    decoder is turned off
  • The smaller decoder is active frequently

8
Architecture of Coupling Sub-decoders
  • Controls to turn on/off sub-decoders
  • Activate-Control
  • Input AND-OR
  • Output OR

instruction

0
FF1
FF2
FF3
FFn
I-Activate Control
1
0
I-Control0
I-Control1
...
...
FFn
1
1
0
1
FF1
I-Decoder0
I-Decoder1
0
Output bit0
...
Output bit0
S-Activate Control
S-Control0
S-Control1
...
...
S-Decoder0
S-Decoder1
...
...
...
9
Instruction Grouping Problem
  • How to decompose Decoder so that
  • the smaller sub-decoder is small
  • the smaller sub-decoder is executed frequently
  • the activate logic is small

10
Weighted Graph Model of Execution Sequence
  • Node instruction type
  • Edge (U,V) instruction U (V) executed after V
    (U)
  • Weights on nodes and edges execution frequency

2
mov
2
3
14
14
14
mul
4
5
14
ldr
1
2
15
14
1
14
15
3
1
b
cmp
15
15
11
Power Model
Mj
5
Mi
15
2
mov
2
3
14
14
14
mul
4
5
14
ldr
1
2
15
14
14
1
15
3
1
cmp
b
15
15
  • SFi transition frequency from Mi to Mi
  • CFij transition frequency between Mi and Mj
  • Poweri power of Mi estimated by Synopsys

12
Instruction Grouping Problem Graph Partitioning
  • Generation of transition graph
  • Initial clustering by random walk
  • Initial partition of clusters
  • Iterative improvement by moving clusters among
    groups

13
Experimental Process
  • ARM7tdmi
  • Circuit described by Verilog
  • Circuit synthesized by Synopsys Design Compiler
  • Power estimated by PrimePower switching
    activities are collected by simulating Powerstone
    benchmark set

14
Results on Two-way Decomposition
15
Power Consumption Comparisons
  • Lower power consumption

16
Critical Path and Area Comparisons
  • Shorter critical path timing
  • Area overhead

17
Results on Multiple-way Decomposition
18
Power Consumption for Different Multi-way Grouping
  • Two-way decomposition has best power reduction
  • more groups ? more overhead

5.E-04
4.E-04
3.E-04
Power (W)
2.E-04
1.E-04
0
4way
Original
3way
2way
Decoder
Overhead
19
Critical Path Timing for Different Multi-way
Grouping
  • Four-way decomposition has best timing reduction

40
20
Area Comparisons
  • Area for different multi-way grouping

42000
40000
38000
Area
36000
34000
32000
30000
4way
Original
3way
5way
2way
21
Conclusions
  • Two-way partitioning has the best results for
    142-instruction set
  • Compared to un-decomposed decoder
  • 30 reduction in power consumption
  • 13 improvement in critical path timing
  • Compared to un-decomposed control-U
  • 19 reduction in power consumption
  • 12 improvement in critical path timing

22
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com