Dynamic%20Removal%20of%20Redundant%20Computations

About This Presentation

Title:

Dynamic%20Removal%20of%20Redundant%20Computations

Description:

Physical Register Reuse (Jourdan et al. 98) Trace Reuse. Basic ... RCB (Working Example) U. P. C. Enhanced Result Cache. Mtable. address tag. result. Atable ... – PowerPoint PPT presentation

Number of Views:15

Avg rating:3.0/5.0

Slides: 32

Provided by: tonic4

Learn more at: https://arco.e.ac.upc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Dynamic%20Removal%20of%20Redundant%20Computations

1
Dynamic Removal of Redundant Computations
ICS99, Rhodes (Greece) - June 20-25, 1999

Carlos Molina, Antonio González and Jordi Tubella
Universitat Politècnica de Catalunya -
Barcelonacmolina,antonio,jordit_at_ac.upc.es

2
Motivation
Quasi - invariant
Quasi-common subexpression
for (i0 iltN i) Ai BiCi
. . . . . R S / T . . . .
. X S / U . . . . .
3
Outline

Instruction Reuse
Related Work
Redundant Computation Buffer
Performance Results
Conclusions

4
Instruction Reuse
Reuse Mechanism
index
OOO Execution
Fetch
Decode Rename
Commit
5
Related Work

Instruction Reuse
Value Cache for the Tree Machine (Harbison 82)
Result Cache (Richardson 92, Oberman et al. 95)
Reuse Buffer (Sodani and Sohi 97)
Physical Register Reuse (Jourdan et al. 98)
Trace Reuse
Basic blocks (Huang and Lilja 99)
General traces (González et al. 99)

6
Related Work

Result Cache
Richardson 92, Oberman Flynn 95
Special purpose (long latency operations)
Indexed by operand values
No reuse chaining
Can reuse dynamic instances of other static
instructions
Reuse Buffer
Sodani Sohi 97
General purpose
Indexed by PC
Reuse chaining
Only reuse dynamic instances of same static
instructions

7
Redundant Computation Buffer
Vtable
Atable pointer
Mtable
Atable
opcode
result/address
opnd1
opnd2
pointer
Reuse Test
Reused Memory Value
Reused Value
8
RCB (Working Example)
Vtable
Atable
while (cond) r s / t
...... x s / u
9
RCB (Working Example)
Vtable
Atable
div
8
nil
2
4
10
while (cond) r s / t
...... x s / u
10
RCB (Working Example)
Vtable
Atable
10
while (cond) r s / t
...... x s / u
11
RCB (Working Example)
Vtable
Atable
10
while (cond) r s / t
...... x s / u
12
Enhancements to Other Schemes

Enhanced Result Cache

Mtable
Atable
Operands

Enhanced Reuse Buffer

Mtable
Atable
opcode
result/address
opnd1
opnd2
13
Timing Considerations
Pipeline Stages
14
Experimental Framework

Simulator
Alpha version of the SimpleScalar Toolset
Benchmarks
Spec95
Maximum Optimization Level
DEC C F77 compilers with -non_shared -O5
Statistics Collected for 125 million instructions
Skipping initializations

15
Basic Reuse Statistics

We evaluate different schemes
- Enhanced Result Cache (ERC)
- Enhanced Reuse Buffer (ERB)
- Redundant Computation Buffer (RCB)
We find best configuration for each scheme
- Number of entries
- History depth
Best configurations will be evaluated
- Percentage of reuse
- Speedup

16
Quasi-Common Subexpressions
32 KB
17
Study of Reuse (ERB)

8 16 32 64
128 256 512 1024 2048
4096 Size in Kbytes
18
Study of Reuse (RCB)

8 16 32 64
128 256 512 1024 2048
4096 Size in Kbytes
19
Study of Reuse (Comparative)

8 16 32 64
128 256 512 1024 2048
4096 Size in Kbytes
20
Performance Evaluation

Two different capacities are evaluated
- 32 KB
- 200 KB
Best configuration has been chosen
for each reuse scheme
We present a performance evaluation
for a supercalar processor
- Speedup
- Percentage of reuse

21
Base Microarchitecture
22
Speedup (32 KB)
23
Speedup (200 KB)
1.25
1.20
1.15
1.10
1.05
1.00
24
Reuse (32 KB)
Ops ready
25
Reuse (200 KB)
Ops ready
26
Reuse by Instruction Category
? Load Value ? Memory Address ? Arithmetic ? Cond
Branch
27
Hybrid Scheme
Atable
Atable
PC
PC
Atable
Opnds
Opnds
28
Speedup (Hybrid Scheme)
1.20
1.15
1.10
1.05
1.00
29
Reuse (Hybrid Scheme)
30
Speedup (Perfect Reuse Engine)
2.20
2.00
1.80
1.60
1.40
1.20
1.00
31
Conclusions