Title: Load-Reuse Analysis design and evaluation
1Load-Reuse Analysisdesign and evaluation
- Rastislav Bodík Rajiv Gupta Mary
Lou Soffa
2Partial Redundancy Elimination (PRE)
- Partially redundant computed on some incoming
paths
3a..
4(No Transcript)
5Register promotion PRE of loads
store a1, x
load a2
store a3
load a4
- Three steps
- ? load-reuse analysis find loads that can
reuse prior loads/stores - ? alias analysis which stores may kill
reuse? - ? transformation remove redundancy PRE
PLDI 98
6Load-reuse analysis
- Design goal
- completeness find all reuse
- To approach completeness, the analysis is
- uniform analyze scalar, array,
and pointer loads - path-sensitive different source of
reuse on each path
- Evaluation goal
- how complete?
- compare with ideal analysis
- Detecting all reuse is undecidable
- no ideal algorithm exists
- instead, use simulation
7Experimental framework
program
input
load-reuse analysis
simulator
1.
2.
data-flow solution
profile
estimator
3.
reuse level
weighted solution
transformation
PLDI 98
comparison
4.
81. Load-reuse analysis
- Its a data-flow analysis
- on a reuse-aware representation
- Value Name Graph (VNG) POPL98
- Whats new?
- Sparse version of the VNG
- up to 30-times smaller than non-sparse
- Analyzing indirect loads/stores
- also, model killing stores
9Naming the value
y bc
a c-1
x ab1
10bc
ab1
x
names for the value in x
11GEN
1
1
1
x
bc
ab1
12Naming the value across loads
f
offset 0
next
4
p
1
.. p-gtf
.. p-gtnext-gtf
GEN
(p4)
1
r ...
(p4)
1
p p-gtnext
p
1
p
(p4)
13kill if r p4 or r (p4)
KILL ?
14Sparse representation
for I 1, N .. AI AI-1
a1 AI
load a1
a2 AI-1
load a2
I I1
15Ø
Ø
1
1
GEN
load a1
1
1
1
1
load a2
1
1
162. The simulator algorithm
for I 1, N .. AI AI-1
Ø
memory access history
load a1
AI
103
102
101
100
history length 1 to 4
load a2
AI-1
102
101
100
99
Simulator detects all PRE-exploitable reuse (up
to given history length), but also some noise
e.g. due to hash table accesses
17Ideal amount of load reuse
of all dynamic loads
go m88ksim gcc compress li ijpeg vortex tomcatv sw
im su2cor hydro
history length
1
4
65 of executed loads has reuse exploitable by
PRE intra-procedural reuse, history1
183. How frequent is the reuse?
load x
Edge profile cheap and available - cannot
reconstruct frequencies of reuse paths
50
100
10
65
35
load x
40
75
5
40
35
30
900
855
25
kill x
75
20
55
load x
19- Path profile
- precise
- - more expensive
- ? Use edge profile, but
- bound its inherent error
- compute lower upper bound on reuse
20Hierarchy of estimators
Estimator data-flow solution edge profile ?
weighted data-flow solution
PRE
CMP1
smaller error (but more complex)
CMPc
CMPr
CMPf
Hierarchy a practical approach ? A simple
estimator not precise enough? Use next better
one !
21The algorithms
1. The bounds generators points generating
reuse stealers points with no reuse upper bound
all reuse consumed lower bound all reuse stolen
load x
50
100
10
65
35
load x
75
40
5
40
35
30
900
855
25
kill x
75
20
55
load x
150
22- 2. Separating uncertainty
- using the CMP region
- defined for PRE PLDI 98
- CMP code-motion preventing
- all error is contained in the CMP region!
23Improving precision
one region
connected regions
control flow reachability
network flow reachability
24Estimators precision
PRE
CMP1
CMPc
smaller error
CMPr
CMPf
INT
FP
254. Analysis how close to ideal ?
100 reuse seen by simulator
p
ideal alias info
p
calls
array pointer stores calls
all stores calls
reuse killed by
26Related Work
- Load-Reuse Analysis
- makes value numbering path-sensitive
- Steffen, Knoop, Rüthing Value Flow Graph ESOP
90 - we show how analyze indirect loads, via symbolic
evaluation - Simulation-based analysis evaluation
- Diwan, McKinley, Moss PLDI98
- Type-based alias analysis how powerful it needs
to be? - Estimators
- Ramalingam Frequency Analysis PLDI96
- returns a single estimate, not its bounds
27Summary
- Load-reuse analysis
- reuse across indirect memory references
- sparse representation
- Estimators three principles
- confidence bound the edge-profile error
- separation of uncertainty inside/outside the CMP
region - hierarchy increasing precision and complexity
- Evaluation
- about 65 loads are amenable to PRE
- our analysis can find about 80 of those
28Combine three removal methods
PLDI 98
control speculation
S
code motion
restructuring
M
R
29Example
10
50
ab
ab
ab
30Relative removal power
Loads removed, dynamic count, normalized
INT
FP
Global CSE path-insensitive