Title: Probabilistic Pointer Analysis PPA
1Probabilistic Pointer Analysis PPA
- Presented by Jeff DaSilva
CARG April 12, 2005
2The Pointer Alias Analysis Problem
A B
- Statically decide for any pair of pointers, at
any point in the program, whether two pointers
point to the same memory location.
?This problem is known to be undecidable. Landi
1992
3The Compiler Writer vs. The Programming Language
Designer
- Pointers are needed by programmers to realize
complex data structures - Pointers can make life difficult for optimizing
compilers
4Concluding Remarks (from last time)
- Traditional pointer analysis techniques are
either overly conservative or are so complex that
they fail to scale with respect to code size - Examples include Address taken, Andersons
analysis, Steensgaard, Emami - Pointer analysis is a very difficult problem that
may never be adequately solved. - Does hardware support for data speculation make
the analysis easier for the compiler?
5Support for Data Speculation Exists
- EPIC instruction sets
- Uses explicit load/store speculation
- Thread Level Speculation (TLS)
- Speculative Compiler Optimizations
- Speculative PRE, register promotion and strength
reduction - Speculative behavioral synthesis
?Why are these techniques not more widely used?
One Reason Currently, they rely on data
dependence profiling.
6ExampleSpeculative Compiler Optimization
7ExampleThread Level Speculation (TLS)
? Can this loop be parallelized?
?To take advantage of data speculation support,
a probability metric and a cost function are
required.
8Probabilistic Pointer Analysis (PPA)
A B
- Statically decide for any pair of pointers, at
any point in the program, whether two pointers
point to the same memory location.
- Statically estimate for any pair of pointers, at
any point in the program, the probability that
two pointers point to the same memory location.
?This problem is known to be undecidable. Landi
1992
?Isnt this problem even worse?
9Outline
- PPA Objectives
- Probabilistic Pointer Analysis (PPA) Theory
- The Probabilistic Points-To Graph
- The Points-To Matrix
- The Transformation Matrix
- An Example
- Some Preliminary Results
10How is Pointer Analysis used?
Pointer Analysis
Executable
Source Code
11Traditional Points-To Graph
int a, b, c int r, s, t int x, y,
z
12Traditional Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
13Traditional Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
if() xs sc
14Traditional Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
if() xs sc rb zr
15Traditional Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
if() xs sc rb zr if() y
x
16Traditional Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
if() xs sc rb zr if() y
x x a
17How is PPA used?
Control Flow Edge Profiling (Optional)
Probabilistic Pointer Analysis
Speculative Executable
Source Code
with Data Speculation Support
18Definition Probability Analysis
- Let ltp,vgt denote a points-to relationship from a
pointer p to a location v. - At every static program point s there exists a
probability function P(s, ltp,vgt) that denotes the
probability that p points to v during dynamic
program execution.
P(s, ltp,vgt) D(s, ltp,vgt) / D(s)
- Where D(s) is the number of times s is (expected
to be) dynamically visited and D(s, ltp,vgt) is the
number of times that the points-to relation ltp,vgt
dynamically holds.
19Algorithm should output a Safe Conservative may
alias Probability
- A probability of 0.0 P(s,ltp,vgt) 0.0 indicates
that a points-to relation ltp,vgt will never hold. - The converse in not necessarily true
- A probability of 1.0 P(s,ltp,vgt) 1.0 indicates
that a points to relation ltp,vgt will always hold. - The converse is also not necessarily true a
dynamic points-to relationship ltp,vgt that always
exists may not be reported with a probability of
1.0
20Probabilistic Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
1.0
1.0
1.0
1.0
1.0
1.0
21Probabilistic Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
if() /60 taken/ xs sc
0.4
0.6
0.6
0.4
22Probabilistic Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
if() /60 taken/ xs sc rb
zr
0.6
0.4
0.6
0.4
23Probabilistic Points-To Graph
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
if() /60 taken/ xs sc rb
zr if() /10 taken/ y x
0.04
0.96
0.6
0.4
0.6
0.4
24Probabilistic Points-To Graph
y
int a, b, c int r, s, t int x, y,
z xr ys zt ra sb zc
if() /60 taken/ xs sc rb
zr if() /10 taken/ y x x a
x
z
0.04
0.6
0.96
0.4
r
s
t
0.4
0.16
0.4
0.24
0.6
0.6
0.6
a
b
c
? What is the probability that y points to a?
25Our PPA Algorithm Objectives
- An Interprocedural, Flow Sensitive, Context
Sensitive approach that uses Linear Transfer
Functions. - Must be scalable in time and space.
- Provides an approximate probability for the
Maybe output.
26Accuracy/Efficiency Tradeoff
- Doubly Exponential
- Accurate very few maybe outputs
- Does not scale
BDD based
Chen, et al Only Other PPA
Address-taken
Anderson
SPAN
Steensgaard
Emami
- Linear Time Complexity
- Inaccurate - many false maybe outputs
- Memory Required Negligible
27Encoding the Probabilistic Points-To GraphThe
Points-To Matrix
- The Probabilistic Points-To graph is encoded
using a sparse Markov Matrix - All elements are real numbers in the closed
interval 0,1 - All rows sum to 1.0
- Each pointer set and location set is given a
unique id representing its matrix row column - Rules for linearity
- Pointers can only point to Location sets
- Location sets always point to themselves with
probability 1.0
28Points-To Matrix Structure
Pointer Sets
Location Sets
N-1 N
1 2 3
Area Of Interest
ø
1 2
Pointer Sets
ø
I
N-1 N
Location Sets
29Points-To Matrix Example
30Another Points-To Matrix Example
31What about double pointers?
32Basic Pointer Assignments
33Transforming the points-to matrix
- Let Xs represent the probabilistic points-to
matrix at a specific program point s.
XIN
Basic pointer assignment instruction
XOUT
- Claim There exists a transformation function
T(X) for every instruction i, such that XOUT
Ti(XIN).
34The fundamental PPA Equation
Points-To Matrix Out
Points-To Matrix In
Transformation Matrix
35Transformation Matrix Structure
Pointer Sets
Location Sets
1 2 3
N-1 N
1 2
Area of Interest
Pointer Sets
ø
I
Location Sets
N-1 N
36Example
37Example
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 1.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
0.0 0.0 1.0 0.0 0.0 0.0 1.0
0.0 0.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
38Example
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 1.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
0.0 0.0 1.0 0.0 0.0 0.0 1.0
0.0 0.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
39Combining Transformation Matrices
XIN
T1 Basic pointer assignment instruction
T2 Basic pointer assignment instruction
XOUT
40Combining Transformation Matrices Example
0.0 0.0 1.0 0.0 0.0 0.0 1.0
0.0 0.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
41Combining Transformation Matrices Example
0.0 0.0 1.0 0.0 0.0 0.0 1.0
0.0 0.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
1.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
42Combining Transformation Matrices Example
0.0 0.0 1.0 0.0 0.0 0.0 1.0
0.0 0.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
1.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
0.001 0.01 0.989 0.0 0.0 0.0 0.001
0.999 0.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
43Combining Transformation Matrices Example
0.0 0.0 1.0 0.0 0.0 0.0 1.0
0.0 0.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
1.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
0.001 0.01 0.989 0.0 0.0 0.0 0.001
0.999 0.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
0.0 0.0 0.99 0.01 0.0 0.0 0.0
0.999 0.001 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
44Combining Transformation Matrices Example
0.0 0.0 0.99 0.01 0.0 0.0 0.0
0.999 0.001 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
45Combining Transformation Matrices Example
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 1.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
0.0 0.0 0.99 0.01 0.0 0.0 0.0
0.999 0.001 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0
46How to handle Control Flow?
1
3
2
4
10
TF_foo
0.5
0.5
3
2
1
4
47PPA Infrastructure
Suif Infrastructure
.spd files
.spx files
PPA Results
ICFG
Points-To Matrix Propagator TD
Edge Profile
Abstract Memory Model (AMM)
Transformation Matrix Collector BU
Transfer Function Builder (Suif2TF)
PPA Matrix Builder
MATLAB C Library
MATLAB Debugging Script
48Current Results
49Current Results
50The End
- Any Comments or Questions?
51References
- Peng-Sheng Chen, Ming-Yu Hung, Yuan-Shin Hwang,
Roy Dz-Ching Ju, Jenq Kuen Lee. Compiler support
for speculative multithreading architecture with
probabilistic points-to analysis PPOPP 2003
25-36 - Only other PPA research Group
- Jin Lin, Tong Chen, Wei-Chung Hsu, Peng-Chung
Yew, Roy Dz-Ching Ju, Tin-Fook Ngai and Sun Chan,
A Compiler Framework for Speculative Analysis
and Optimizations, in Proceedings of the ACM
SIGPLAN 2003 Conference on Programming Language
Design and Implementation, San Diego, California,
June 9-11, 2003, pp. 289-299 - Roy Dz-Ching Ju. Probabilistic Memory
Disambiguation and its Application to Data
Speculation (1996) - Probabilistic array subscript analysis
- Manel Fernandez and Roger Espasa. Speculative
Alias Analysis for Executable Code (2002)
52BACKUPSLIDES
53Does Probabilistic Aliasing Exist?
54Does Probabilistic Aliasing Exist?
55Does Probabilistic Aliasing Exist?
56Does Probabilistic Aliasing Exist?
57Does Probabilistic Aliasing Exist?
58Does Probabilistic Aliasing Exist?
59Probabilistic Dependence Matrix
Compress95 Ref input set Flow Sensitive
Analysis Location Oriented DDA 85x85 static
lod/str pairs
60Pointer Analysis Issues
- Scalability vs. Accuracy
- Generally, a difficult tradeoff exists between
- the amount of computation and memory required vs.
- the accuracy of the analysis.
- Precision/Efficiency tradeoff, where is the sweet
spot? - Which metric should be used?
- Direct metric
- Report performance applied to an optimization
- Dynamically measure false positives
- Which benchmark suite?
- Are the results reproducible?
61Pointer Analysis Issues
- Complications associated with pointer arithmetic,
casting, function pointers, long jumps, and
multithreaded applications. - Can these be ignored?
- Different pointer analysis uses have different
needs. - A universal pointer analysis probably doesnt
exist.
62Optimizing Compilers 101
- Optimizing compilers must preserve program
correctness - All compiler algorithms (code transformations)
are developed within this rule - What if this rule was relaxed?
- If not bound by this rule of program correctness,
the opportunity constitutes a reevaluation of all
optimizations that were originally created within
this rule. - If given a framework for speculation recovery
(hardware or software), this rule becomes
flexible.
63Compiler Support for Speculation
- Control Speculation
- Executing instructions before determining that
they will execute in the normal flow of
execution. - Existing compiler frameworks can adequately
incorporate and exploit control speculation using
control flow profiling or simple heuristic rules. - Data speculation
- Executing loads before potentially dependant
stores - Very little has been done to effectively exploit
data speculation. - Data dependence profiling is expensive
- No effective heuristics exist
64Pointer Analysis
- Pointer analysis is a critical compiler component
used to analyze programs written in C-like
programming languages, which utilize pointers and
pointer-based data structures - It attempts to disambiguate indirect memory
references, so that subsequent compiler passes
have a more accurate view of program behaviour.
65How is Pointer Analysis used?
Parallelizing Compiler Can the loop be
parallelized? (TLP)
void foo(int a, int b) for(i1 iltN i)
ai bi / 13 for(i1 iltN
i) a b 1
Guide Compiler Optimizations load/store
redundancy elimination, register allocation,
CSE, dead code elimination, live variable,
instruction scheduling eg. VLIW (ILP), loop
invariant code motion, etc.
Behavioral Synthesis Data Flow Processors
Necessary for partitioning and instruction
scheduling.
Error Detection Program Understanding Programme
r can detect errors or discover poorly written
code.
66Pointer Analysis Design Choices
- Flow/Path sensitivity
- Context sensitivity
- Heap modeling
- Aggregate modeling
- Alias representation
- Whole Program
- Incremental Compilation
67How is Pointer Analysis used?
Parallelizing Compiler Can the loop be
parallelized? (TLP)
void foo(int a, int b) for(i1 iltN
i) ai bi-1 for(i1 iltN i)
a b 1
Guide Compiler Optimizations load/store
redundancy elimination, register allocation,
CSE, dead code elimination, live variable,
instruction scheduling eg. VLIW (ILP), loop
invariant code motion, etc.
Behavioral Synthesis Data Flow Processors
Necessary for partitioning and instruction
scheduling.
Error Detection Program Understanding Programme
r can detect errors or discover poorly written
code.
?Any more?
68Pointer Analysis Published by Year
?Havent we Solved this Problem Yet?