Title: The PPA Algorithm
1The PPA Algorithm
- Jeff Da Silva
- September 10th, 2004
2The Pointer Alias Analysis Problem
A B
- Statically decide for any pair of pointers, at
any point in the program, whether two pointers
point to the same memory location.
3Pointer Analysis Issues
- Scalability vs. Accuracy
- Generally, a difficult tradeoff exists between
- the amount of computation and memory required vs.
- the accuracy of the analysis.
- Precision/Efficiency tradeoff, where is the sweet
spot? - Which metric should be used?
- Direct metric
- Report performance applied to an optimization
- Dynamically measure false positives
- Which benchmark suite?
- Are the results reproducible?
4Pointer Analysis Issues
- Complications associated with pointer arithmetic,
casting, function pointers, long jumps, and
multithreaded applications. - Can these be ignored?
- Different pointer analysis uses have different
needs. - A universal pointer analysis probably doesnt
exist.
5Pointer Analysis Design Choices
- Flow sensitivity
- Context sensitivity
- Heap modeling
- Aggregate modeling
- Alias representation
- Whole program
- Incremental compilation
6Accuracy/Efficiency Tradeoff
- Doubly Exponential
- Accurate very few maybe outputs (control
deps/runtime) - Requires Entire Program Info
- Memory Required Oodles
- Does not scale well
Chen, et al Only Other PPA
Address-taken
SPAN
Steensgaard
BDD based
- Linear Time Complexity
- Inaccurate - many false maybe outputs
- Memory Required Negligible
7PPA Algorithm Objectives
- An Interprocedural, Flow Sensitive, Context
Sensitive/Merged approach that uses Transfer
Functions. - Must be scalable and should require less space
and time than any traditional analysis. - Provide an approximate probability for the
Maybe output.
8Design Choices (tentative)
- Flow sensitivity flow sensitive
- Context sensitivity context merged
- Heap modeling allocation site
- Aggregate modeling arrays aggregated, structs
separated - Alias representation points-to
- Whole program not required
- Incremental compilation limited support
9How is Probabilistic Pointer Analysis used?
Probabilistic Pointer Analysis
Dynamic Profiling
Speculative Parallelized Executable
Source Code
10The Probabilistic Pointer Analysis (PPA) Problem
A B
- Probabilistic Pointer Analysis (PPA) For any
pair of pointers, at any point in the program,
statically estimate the probability that two
pointers point to the same memory location.
11The Traditional Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
12The Traditional Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
if() xs sc
13The Traditional Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
if() xs sc rb zr
14The Traditional Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
if() xs sc rb zr if() y
x
15The Traditional Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
if() xs sc rb zr if() y
x x a
16The Probabilistic Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
1.0
1.0
1.0
1.0
1.0
1.0
17The Probabilistic Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
if() /60 taken/ xs sc
0.4
0.6
0.6
0.4
18The Probabilistic Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
if() /60 taken/ xs sc rb
zr
0.6
0.4
0.6
0.4
19The Probabilistic Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
if() /60 taken/ xs sc rb
zr if() /10 taken/ y x
0.04
0.96
0.6
0.4
0.6
0.4
20The Probabilistic Points-To Graph
y
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
if() /60 taken/ xs sc rb
zr if() /10 taken/ y x x a
x
z
0.04
0.6
0.96
0.4
r
s
t
0.4
0.16
0.4
0.24
0.6
0.6
0.6
a
b
c
? What is the probability that y points to a?
21A Probabilistic Points-To Matrix
22My PPA Algorithm
- PPA Algorithm Goal
- For each program point generate a probabilistic
points-to graph that specifies, for each pointer,
the set of probabilities that it points to each
location.
23Definition Probability Analysis
- Let ltp,vgt denote a points-to relationship from a
pointer p to a location v. - At every static program point s there exists a
probability function P(s, ltp,vgt) that denotes the
probability that p points to v during dynamic
program execution.
P(s, ltp,vgt) D(s, ltp,vgt) / D(s)
- Where D(s) is the number of times s is (expected
to be) dynamically visited and D(s, ltp,vgt) is the
number of times that the points-to relation ltp,vgt
dynamically holds.
24Conservative Probability
- My algorithm is a may alias conservative
analysis. - A probability of 0.0 P(s,ltp,vgt) 0.0 indicates
that a points-to relation ltp,vgt will never hold. - The converse is not true.
- A probability of 1.0 P(s,ltp,vgt) 1.0 indicates
that a points to relation ltp,vgt will always hold. - The converse is not necessarily true a dynamic
points-to relationship ltp,vgt that always exists
may not be reported with a probability of 1.0.
25Location Sets
- Each node in the graph is implemented with a
location set, which is a triple of the form
ltname, offset, stridegt consisting of a variable
name that describes the memory block, an offset
within that block and a stride that characterizes
the recurring structure of data vectors (in
bytes).
struct ds int e,f,g int a struct ds
b int c100 struct ds d100
Aggregate modeling arrays aggregated, structs
separated
26Special Location Sets
- Each dynamic memory allocation site has its own
name. Eg the location set that represents a
field f in a structure dynamically located at
site s is lts, f, 0gt. - Additional Location Sets
- UND undefined
- UNK unknown
- NULL C null
27Basic Pointer Assignment Transformations
? Ignoring pointer arithmetic and casting for now.
28PPA
- Let Xs represent the probabilistic points-to
graph/matrix at a specific program point s.
XIN
Basic pointer assignment instruction
XOUT
- Claim There exists a transformation function
T(X) for every instruction i, such that XOUT
Ti(XIN).
29Linear Transformations
- A transformation T(X) is linear iff the following
relationships hold for all points-to matrices U
and V - T(UV) T(U) T(V)
- T(cU) cT(U)
- If TB and TA are linear transformations
represented by the matrices B and A respectively,
then - TB(TA(X)) BAX
30Linear Points-To Representation
- A points-to matrix is used to represent the
points-to graph. - Matrix row/column labeling
- Locations sets are denoted with Lltidgt
- Pointers are denoted with Pltidgt
- Rules for linearity
- Pointers can only point to Location sets
- Location sets always point to themselves with
probability 1.0 - All rows sum to 1.0
31Linear Points-To Representation
int a /L1, P1/ int b /L2, P2/ int
xN /L3/ int yN /L4/ int tmp /L5,
P5/ (int)calloc(N, sizeof(int)) /L6/
32Linear Points-To Representation
int a /L1, P1/ int b /L2, P2/ int
xN /L3/ int yN /L4/ int tmp /L5,
P5/ (int)calloc(N, sizeof(int)) /L6/
UND
NULL
UNK
L3
L4
L6
33Points-To Matrix
int a /L1, P1/ int b /L2, P2/ int
xN /L3/ int yN /L4/ int tmp /L5,
P5/ (int)calloc(N, sizeof(int)) /L6/
34Points-To Matrix Properties
Ø
I
Ø
35The Transformation Matrix
- For every Basic Pointer Assignment there exists a
linear transformation matrix T such that - XOUT TXIN
XIN
Basic pointer assignment instruction
XOUT
36The Pointer Assignment Operation
MATLAB code PPA_ptra Probabilistic Pointer
Analysis pointer assignment function Returns
the PPA ptr assignment transformation matrix
function T PPA_ptra(ptr, loc, N) T
eye(N) T(ptr,ptr) 0.0 T(ptr,loc) 1.0
37Pointer Assignment Example
int a /L1, P1/ int b /L2, P2/ int xN
/L3/ int yN /L4/ int tmp /L5, P5/
tmp a
S1 P5 -gt P1
T(P5-gtP1) TS1 eye(12) Ts1(P5,P5)
0.0 Ts1(P5,P1) 1.0
38Pointer Assignment Example
int a /L1, P1/ int b /L2, P2/ int xN
/L3/ int yN /L4/ int tmp /L5, P5/
a x
S2 P1 -gt L3
T(P1-gtL3) TS2 eye(12) Ts2(P1,P1)
0.0 Ts2(P1,L3) 1.0
39Combining Transformation Matrices
XIN
T1 Basic pointer assignment instruction
T2 Basic pointer assignment instruction
XOUT
40Combining Pointer Assignment Example
int a /L1, P1/ int b /L2, P2/ int xN
/L3/ int yN /L4/ int tmp /L5, P5/
void swap tmp a a b b tmp
S1 P5 -gt P1 S2 P1 -gt P2 S3 P2 -gt P5
Tswap TS3 TS2 TS1
41Combining Pointer Assignment Example
Tswap
42Combining Pointer Assignment Example
Tswap
0.9
0.3
0.9
0.1
0.7
0.7
0.1
0.3
0.9
0.1
43Control flow and loops
- Loops are found and back edges are labeled with
there back edge count. assume all loops have
constant trip count for now - Denoted with a capital letter
- All other edges are labeled with there basic
block fan-in probability that sums to 1. - Denoted with a small case letter
44The Effect of Control Flow
45The Effect of Control Flow
46The Effect of Control Flow
47Example
int a /L1, P1/ int b /L2, P2/ int xN
/L3/ int yN /L4/ int tmp /L5, P5/
void might_alias if(!RANDOM(10)) a
b
BB1 if() /0.1/ BB2 S1 P1 -gt
P2 fi BB3
Tmight_alias TBB3 0.1 TBB2 TBB1 0.9 TBB1
48Loops Constant Trip Count
TB TAN1 M1
A
N
M
B
49Loop Transformation types
- Identity
- Converges
- Periodic
- Converges and Periodic
for(i0iltNi) swap()
for(i0iltNi) if(RANDOM(10)) a
b swap()
for(i0iltNi) if(!RANDOM(10))
a b
If Odd
If Even
50Loops Non-Constant Trip Count
Geometric Series Transform gstr operation
gstr(TA, 0, N)
51The Characterization Matrix
XIN
for(i0iltN-1i) if(!RANDOM(10000))
swap() b a ai xi1
Xchar Tloop_inner x gstr(Tloop_inner, 0, N-1)
XOUT
XOUT Tloop_innerN-1
52Example Results
int a /L1, P1/ int b /L2, P2/ int xN
/L3/ int yN /L4/ int tmp /L5, P5/
PTM
void swap() tmp a a b b tmp
return
void might_alias() if(!RANDOM(10)) /10
taken/ a b
53Example Results
int main() a x b y swap()
might_alias() b (int)calloc(N,
sizeof(int))/L6/ for(i0iltRANDOM(9)i)
if(RANDOM(10)) /90 taken/
b x a y else
for(j0jlt3j)
swap()
/does a point to x/
for(i0iltN-1i) if(!RANDOM(10000))
/0.01 taken/ swap()
b a ai xi1 /assume
lots of other work as well.../ return
0
54Example Results
int main() a x b y swap()
might_alias() b (int)calloc(N,
sizeof(int))/L6/ for(i0iltRANDOM(9)i)
if(RANDOM(10)) /90 taken/
b x a y else
for(j0jlt3j)
swap()
PTM - Matlab
PTM - Simulation
55Example Results
int main() a x b y swap()
might_alias() b (int)calloc(N,
sizeof(int))/L6/ for(i0iltRANDOM(9)i)
if(RANDOM(10)) /90 taken/
b x a y else
for(j0jlt3j)
swap()
PTM - Matlab
PTM - Simulation
56Example Results
int main() a x b y swap()
might_alias() b (int)calloc(N,
sizeof(int))/L6/ for(i0iltRANDOM(9)i)
if(RANDOM(10)) /90 taken/
b x a y else
for(j0jlt3j)
swap()
PTM - Matlab
PTM - Simulation
57Example Results
PTM - Matlab
/does a point to x/
for(i0iltN-1i) if(!RANDOM(10000))
/0.01 taken/ swap()
b a ai xi1 /assume
lots of other work as well.../ return
0
PTM - Simulation
58Example Results
PTM-CHA - Matlab
/does a point to x/
for(i0iltN-1i) if(!RANDOM(10000))
/0.01 taken/ swap()
b a ai xi1 /assume
lots of other work as well.../ return
0
PTM-CHA - Simulation
59Future Work
- Currently in the early stages of implementing for
SUIF. - What happens when Double/Multiple pointers are
introduced? - How to deal with casting, irregular control flow,
pointer arithmetic, unknown/library functions? - When to propagate the unknown UNK location set
- Optimizing for time complexity and space.
- How to best utilize PPA for TLS and/or other
compiler optimizations? - Runtime parameters.
- What is the relationship between data and
control?
60The End
- Any Comments or Questions?