Dynamic%20Removal%20of%20Redundant%20Computations - PowerPoint PPT Presentation

About This Presentation
Title:

Dynamic%20Removal%20of%20Redundant%20Computations

Description:

Physical Register Reuse (Jourdan et al. 98) Trace Reuse. Basic ... RCB (Working Example) U. P. C. Enhanced Result Cache. Mtable. address tag. result. Atable ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 32
Provided by: tonic4
Category:

less

Transcript and Presenter's Notes

Title: Dynamic%20Removal%20of%20Redundant%20Computations


1
Dynamic Removal of Redundant Computations
ICS99, Rhodes (Greece) - June 20-25, 1999
  • Carlos Molina, Antonio González and Jordi Tubella
  • Universitat Politècnica de Catalunya -
    Barcelonacmolina,antonio,jordit_at_ac.upc.es

2
Motivation
Quasi - invariant
Quasi-common subexpression
for (i0 iltN i) Ai BiCi
. . . . . R S / T . . . .
. X S / U . . . . .
3
Outline
  • Instruction Reuse
  • Related Work
  • Redundant Computation Buffer
  • Performance Results
  • Conclusions

4
Instruction Reuse
Reuse Mechanism
index
OOO Execution
Fetch
Decode Rename
Commit
5
Related Work
  • Instruction Reuse
  • Value Cache for the Tree Machine (Harbison 82)
  • Result Cache (Richardson 92, Oberman et al. 95)
  • Reuse Buffer (Sodani and Sohi 97)
  • Physical Register Reuse (Jourdan et al. 98)
  • Trace Reuse
  • Basic blocks (Huang and Lilja 99)
  • General traces (González et al. 99)

6
Related Work
  • Result Cache
  • Richardson 92, Oberman Flynn 95
  • Special purpose (long latency operations)
  • Indexed by operand values
  • No reuse chaining
  • Can reuse dynamic instances of other static
    instructions
  • Reuse Buffer
  • Sodani Sohi 97
  • General purpose
  • Indexed by PC
  • Reuse chaining
  • Only reuse dynamic instances of same static
    instructions

7
Redundant Computation Buffer
Vtable
Atable pointer
Mtable
Atable
opcode
result/address
opnd1
opnd2
pointer
Reuse Test
Reused Memory Value
Reused Value
8
RCB (Working Example)
Vtable
Atable
while (cond) r s / t
...... x s / u
9
RCB (Working Example)
Vtable
Atable
div
8
nil
2
4
10
while (cond) r s / t
...... x s / u
10
RCB (Working Example)
Vtable
Atable
10
while (cond) r s / t
...... x s / u
11
RCB (Working Example)
Vtable
Atable
10
while (cond) r s / t
...... x s / u
12
Enhancements to Other Schemes
  • Enhanced Result Cache

Mtable
Atable
Operands
  • Enhanced Reuse Buffer

Mtable
Atable
opcode
result/address
opnd1
opnd2
13
Timing Considerations
Pipeline Stages
14
Experimental Framework
  • Simulator
  • Alpha version of the SimpleScalar Toolset
  • Benchmarks
  • Spec95
  • Maximum Optimization Level
  • DEC C F77 compilers with -non_shared -O5
  • Statistics Collected for 125 million instructions
  • Skipping initializations

15
Basic Reuse Statistics
  • We evaluate different schemes
  • - Enhanced Result Cache (ERC)
  • - Enhanced Reuse Buffer (ERB)
  • - Redundant Computation Buffer (RCB)
  • We find best configuration for each scheme
  • - Number of entries
  • - History depth
  • Best configurations will be evaluated
  • - Percentage of reuse
  • - Speedup

16
Quasi-Common Subexpressions
32 KB
17
Study of Reuse (ERB)


8 16 32 64
128 256 512 1024 2048
4096 Size in Kbytes
18
Study of Reuse (RCB)


8 16 32 64
128 256 512 1024 2048
4096 Size in Kbytes
19
Study of Reuse (Comparative)



8 16 32 64
128 256 512 1024 2048
4096 Size in Kbytes
20
Performance Evaluation
  • Two different capacities are evaluated
  • - 32 KB
  • - 200 KB
  • Best configuration has been chosen
  • for each reuse scheme
  • We present a performance evaluation
  • for a supercalar processor
  • - Speedup
  • - Percentage of reuse

21
Base Microarchitecture
22
Speedup (32 KB)
23
Speedup (200 KB)
1.25
1.20
1.15
1.10
1.05
1.00
24
Reuse (32 KB)
Ops ready
25
Reuse (200 KB)
Ops ready
26
Reuse by Instruction Category
? Load Value ? Memory Address ? Arithmetic ? Cond
Branch
27
Hybrid Scheme
Atable
Atable
PC
PC
Atable
Opnds
Opnds
28
Speedup (Hybrid Scheme)
1.20
1.15
1.10
1.05
1.00
29
Reuse (Hybrid Scheme)
30
Speedup (Perfect Reuse Engine)
2.20
2.00
1.80
1.60
1.40
1.20
1.00
31
Conclusions
  • Redundant Computation Buffer
  • Quasi-invariants
  • Quasi-common subexpressions
  • High reuse coverage and low latency
  • 30 reuse
  • 10 speedup
  • Outperforms previous schemes
Write a Comment
User Comments (0)
About PowerShow.com