Title: Program Analysis via Graph Reachability
1Program Analysis via Graph Reachability
- Thomas Reps
- University of Wisconsin
http//www.cs.wisc.edu/reps/
See http//www.cs.wisc.edu/wpis/papers/tr1386.ps
2PLDI ?00 Registration Form
- PLDI ?00 .. ____
- Tutorial (morning) ____
- Tutorial (afternoon) .. ____
- Tutorial (evening) . 0
31987
1993
1994
1995
1996
1997
1998
4Applications
- Program optimization
- Software engineering
- Program understanding
- Reengineering
- Static bug-finding
- Security (information flow)
5Collaborators
- Susan Horwitz
- Mooly Sagiv
- Genevieve Rosay
- David Melski
- David Binkley
- Michael Benedikt
- Patrice Godefroid
6Themes
- Harnessing CFL-reachability
- Exhaustive alg. ? Demand alg.
7Program Slicing
- The backward slice w.r.t variable v at program
point p The program subset that may influence
the value of - variable v at point p.
- The forward slice w.r.t variable v at program
point p - The program subset that may be influenced by
- the value of variable v at point p.
8Backward Slice
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
9Backward Slice
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Backward slice with respect to printf(d\n,i)
10Slice Extraction
int main() int i 1 while (i lt 11)
i i 1 printf(d\n,i)
Backward slice with respect to printf(d\n,i)
11Forward Slice
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
12Forward Slice
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Forward slice with respect to sum 0
13What Are Slices Useful For?
- Understanding Programs
- What is affected by what?
- Restructuring Programs
- Isolation of separate computational threads
- Program Specialization and Reuse
- Slices specialized programs
- Only reuse needed slices
- Program Differencing
- Compare slices to identify changes
- Testing
- What new test cases would improve coverage?
- What regression tests must be rerun after a
change?
14Line-Character-Count Program
void line_char_count(FILE f) int lines
0 int chars BOOL eof_flag FALSE int
n extern void scan_line(FILE f, BOOL bptr,
int iptr) scan_line(f, eof_flag, n) chars
n while(eof_flag FALSE) lines lines
1 scan_line(f, eof_flag, n) chars chars
n printf(lines d\n,
lines) printf(chars d\n, chars)
15Character-Count Program
void char_count(FILE f) int lines 0 int
chars BOOL eof_flag FALSE int n extern
void scan_line(FILE f, BOOL bptr, int
iptr) scan_line(f, eof_flag, n) chars
n while(eof_flag FALSE) lines lines
1 scan_line(f, eof_flag, n) chars chars
n printf(lines d\n,
lines) printf(chars d\n, chars)
16Line-Character-Count Program
void line_char_count(FILE f) int lines
0 int chars BOOL eof_flag FALSE int
n extern void scan_line(FILE f, BOOL bptr,
int iptr) scan_line(f, eof_flag, n) chars
n while(eof_flag FALSE) lines lines
1 scan_line(f, eof_flag, n) chars chars
n printf(lines d\n,
lines) printf(chars d\n, chars)
17Line-Count Program
void line_count(FILE f) int lines 0 int
chars BOOL eof_flag FALSE int n extern
void scan_line2(FILE f, BOOL bptr, int
iptr) scan_line2(f, eof_flag, n) chars
n while(eof_flag FALSE) lines lines
1 scan_line2(f, eof_flag, n) chars
chars n printf(lines d\n,
lines) printf(chars d\n, chars)
18Specialization Via Slicing
wc -lc
19How are Slices Computed?
- Reachability in a Dependence Graph
- Program Dependence Graph (PDG)
- Dependences within one procedure
- Intraprocedural slicing is reachability in one
PDG - System Dependence Graph (SDG)
- Dependences within entire system
- Interprocedural slicing is reachability in the SDG
20How is a PDG Created?
- Control Flow Graph (CFG)
- PDG is union of
- Control Dependence Graph
- Flow Dependence Graph
- computed from CFG
21Control Flow Graph
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
F
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
sum sum i
i i i
22Control Dependence Graph
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Control dependence
q is reached from p if condition p is true (T),
not otherwise.
p
q
T
Similar for false (F).
p
q
F
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
23Flow Dependence Graph
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Flow dependence
Value of variable assigned at p may be used at q.
p
q
Enter
i 1
sum 0
printf(sum)
printf(i)
while(i lt 11)
sum sum i
i i i
24Program Dependence Graph (PDG)
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Control dependence
Flow dependence
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
25Program Dependence Graph (PDG)
int main() int i 1 int sum 0 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
26Backward Slice
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
27Backward Slice (2)
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
28Backward Slice (3)
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
29Backward Slice (4)
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
30Slice Extraction
int main() int i 1 while (i lt 11)
i i 1 printf(d\n,i)
Enter
T
T
T
T
i 1
printf(i)
while(i lt 11)
T
i i i
31CodeSurfer
32(No Transcript)
33CodeSurfer
34(No Transcript)
35Browsing a Dependence Graph
Pretend this is your favorite browser What does
clicking on a link do?
36(No Transcript)
37(No Transcript)
38(No Transcript)
39Interprocedural Slice
int main() int sum 0 int i 1 while (i
lt 11) sum add(sum,i) i
add(i,1) printf(d\n,sum) printf(d\n,i
)
int add(int x, int y) return x y
40Interprocedural Slice
int main() int sum 0 int i 1 while (i
lt 11) sum add(sum,i) i
add(i,1) printf(d\n,sum) printf(d\n,i
)
int add(int x, int y) return x y
Backward slice with respect to printf(d\n,i)
41Interprocedural Slice
int main() int sum 0 int i 1 while (i
lt 11) sum add(sum,i) i
add(i,1) printf(d\n,sum) printf(d\n,i
)
int add(int x, int y) return x y
Superfluous components included by Weisers
slicing algorithm TSE 84 Left out by algorithm
of Horwitz, Reps, Binkley PLDI 88 TOPLAS 90
42How is an SDG Created?
- Each PDG has nodes for
- entry point
- procedure parameters and function result
- Each call site has nodes for
- call
- arguments and function result
- Appropriate edges
- entry node to parameters
- call node to arguments
- call node to entry node
- arguments to parameters
43System Dependence Graph (SDG)
Enter main
Call p
Call p
Enter p
44SDG for the Sum Program
Enter main
while(i lt 11)
sum 0
i 1
printf(sum)
printf(i)
Call add
Call add
yin i
xin sum
sum xout
xin i
yin 1
i xout
Enter add
x xin
y yin
x x y
xout x
45Interprocedural Backward Slice
Enter main
Call p
Call p
Enter p
46Interprocedural Backward Slice (2)
Enter main
Call p
Call p
Enter p
47Interprocedural Backward Slice (3)
Enter main
Call p
Call p
Enter p
48Interprocedural Backward Slice (4)
Enter main
Call p
Call p
Enter p
49Interprocedural Backward Slice (5)
Enter main
Call p
Call p
Enter p
50Interprocedural Backward Slice (6)
Enter main
Call p
Call p
Enter p
51Matched-Parenthesis Path
52Interprocedural Backward Slice (6)
Enter main
Call p
Call p
Enter p
53Interprocedural Backward Slice (7)
Enter main
Call p
Call p
Enter p
54Slice Extraction
Enter main
Call p
Enter p
55Slice of the Sum Program
Enter main
while(i lt 11)
i 1
printf(i)
Call add
xin i
yin 1
i xout
Enter add
x xin
y yin
x x y
xout x
56CFL-ReachabilityYannakakis 90
- G Graph (N nodes, E edges)
- L A context-free language
- L-path from s to t iff
- Running time O(N 3)
57Interprocedural Slicingvia CFL-Reachability
- Graph System dependence graph
- L L(matched) roughly
- Node m is in the slice w.r.t. n iff there is an
L(matched)-path from m to n
58Asymptotic Running Time Reps, Horwitz, Sagiv,
Rosay 94
- CFL-reachability
- System dependence graph N nodes, E edges
- Running time O(N 3)
- System dependence graph Special structure
Running time O(E CallSites MaxParams3)
59(No Transcript)
60Regular-Language ReachabilityYannakakis 90
- G Graph (N nodes, E edges)
- L A regular language
- L-path from s to t iff
- Running time O(NE)
- Ordinary reachability ( transitive closure)
- Label each edge with e
- L is e
61CFL-Reachability via Dynamic Programming
Graph
Grammar
B
C
62Degenerate Case CFL-Recognition
exp ? id exp exp exp exp ( exp )
?
(a b) c ? L(exp) ?
63Degenerate Case CFL-Recognition
exp ? id exp exp exp exp ( exp )
a b) c ? L(exp) ?
64Program Chopping
Given source S and target T, what program points
transmit effects from S to T?
Intersect forward slice from S with backward
slice from T, right?
65Non-Transitivity and Slicing
int main() int sum 0 int i 1 while (i
lt 11) sum add(sum,i) i
add(i,1) printf(d\n,sum) printf(d\n,i
)
int add(int x, int y) return x y
66Non-Transitivity and Slicing
int main() int sum 0 int i 1 while (i
lt 11) sum add(sum,i) i
add(i,1) printf(d\n,sum) printf(d\n,i
)
int add(int x, int y) return x y
Forward slice with respect to sum 0
67Non-Transitivity and Slicing
int main() int sum 0 int i 1 while (i
lt 11) sum add(sum,i) i
add(i,1) printf(d\n,sum) printf(d\n,i
)
int add(int x, int y) return x y
68Non-Transitivity and Slicing
int main() int sum 0 int i 1 while (i
lt 11) sum add(sum,i) i
add(i,1) printf(d\n,sum) printf(d\n,i
)
int add(int x, int y) return x y
Backward slice with respect to printf(d\n,i)
69Non-Transitivity and Slicing
int main() int sum 0 int i 1 while (i
lt 11) sum add(sum,i) i
add(i,1) printf(d\n,sum) printf(d\n,i
)
int add(int x, int y) return x y
Forward slice with respect to sum 0
?
Backward slice with respect to printf(d\n,i)
70Non-Transitivity and Slicing
int main() int sum 0 int i 1 while (i
lt 11) sum add(sum,i) i
add(i,1) printf(d\n,sum) printf(d\n,i
)
int add(int x, int y) return x y
?
Chop with respect to sum 0 and
printf(d\n,i)
71Non-Transitivity and Slicing
Enter main
while(i lt 11)
sum 0
i 1
printf(sum)
printf(i)
Call add
Call add
yin i
xin sum
sum xout
xin i
yin 1
i xout
Enter add
x xin
y yin
x x y
xout x
72Program Chopping
Given source S and target T, what program points
transmit effects from S to T?
Precise interprocedural chopping Reps Rosay
FSE 95
731987
1993
1994
1995
1996
1997
1998
74Dataflow Analysis
- Goal For each point in the program, determine a
superset of the facts that could possibly hold
during execution - Examples
- Constant propagation
- Reaching definitions
- Live variables
- Possibly uninitialized variables
75Possibly Uninitialized Variables
w,x,y
w,y
w,y
w,y
w
w,y
76Precise Intraprocedural Analysis
C
pfp fk ? fk-1 ? ? f2 ? f1
MOPn ? pfp(C)
p?PathsTon
77if . . .
78Precise Interprocedural Analysis
ret
C
n
start
MOMPn ? pfp(C)
p?MatchedPathsTon
Sharir Pnueli 81
79Representing Dataflow Functions
Identity Function
a
b
c
Constant Function
80Representing Dataflow Functions
a
b
c
Gen/Kill Function
a
b
c
Non-Gen/Kill Function
81if . . .
82Composing Dataflow Functions
83x
y
a
b
if . . .
84matched ? matched matched
(i matched )i 1 ? i ? CallSites
edge ?
85unbalLeft ? matched unbalLeft
(i unbalLeft 1 ? i ? CallSites
?
86Interprocedural Dataflow Analysisvia
CFL-Reachability
- Graph Exploded control-flow graph
- L L(unbalLeft)
- Fact d holds at n iff there is an
L(unbalLeft)-path from
87Asymptotic Running Time Reps, Horwitz, Sagiv
95
- CFL-reachability
- Exploded control-flow graph ND nodes
- Running time O(N3D3)
- Exploded control-flow graph Special
structure
Running time O(ED3)
Typically E l N, hence O(ED3) l O(ND3)
Gen/kill problems O(ED)
88Why Bother?Were only interested in
million-line programs
- Know thy enemy!
- Any algorithm must do these operations
- Avoid pitfalls (e.g., claiming O(N2) algorithm)
- The essence of context sensitivity
- Special cases
- Gen/kill problems O(ED)
- Compression techniques
- Basic blocks
- SSA form, sparse evaluation graphs
- Demand algorithms
89Unifying Conceptual Modelfor Dataflow-Analysis
Literature
- Linear-time gen-kill Hecht 76, Kou 77
- Path-constrained DFA Holley Rosen 81
- Linear-time GMOD Cooper Kennedy 88
- Flow-sensitive MOD Callahan 88
- Linear-time interprocedural gen-kill
- Knoop Steffen 93
- Linear-time bidirectional gen-kill Dhamdhere 94
- Relationship to interprocedural DFA
- Sharir Pneuli 81, Knoop Steffen 92
90Themes
- Harnessing CFL-reachability
- Exhaustive alg. ? Demand alg.
91Exhaustive Versus Demand Analysis
- Exhaustive analysis All facts at all points
- Optimization Concentrate on inner loops
- Program-understanding tools Only some facts are
of interest
92Exhaustive Versus Demand Analysis
- Demand analysis
- Does a given fact hold at a given point?
- Which facts hold at a given point?
- At which points does a given fact hold?
- Demand analysis via CFL-reachability
- single-source/single-target CFL-reachability
- single-source/multi-target CFL-reachability
- multi-source/single-target CFL-reachability
93if . . .
94Experimental ResultsHorwitz , Reps, Sagiv
1995
- 53 C programs (200-6,700 lines)
- For a single fact of interest
- demand always better than exhaustive
- All appropriate demands beats exhaustive when
percentage of yes answers is high - Live variables
- Truly live variables
- Constant predicates
- . . .
95A Related Result Sagiv, Reps, Horwitz 1996
- Uses a generalized analysis technique
- 38 C programs (300-6,000 lines)
- copy-constant propagation
- linear-constant propagation
- All appropriate demands always beats exhaustive
- factor of 1.14 to about 6
96Exhaustive Versus Demand Analysis
- Demand algorithms for
- Interprocedural dataflow analysis
- Set constraints
- Points-to analysis
97Most Significant Contributions 1987-2000
- Asymptotically fastest algorithms
- Interprocedural slicing
- Interprocedural dataflow analysis
- Demand algorithms
- Interprocedural dataflow analysis CC94,FSE95
- All appropriate demands beats exhaustive
- Tool for slicing and browsing ANSI C
- Slices programs as large as 75,000 lines
- University research distribution
- Commercial product CodeSurfer (GrammaTech,
Inc.)
98References
- Papers by Reps and collaborators
- http//www.cs.wisc.edu/reps/
- CFL-reachability
- Yannakakis, M., Graph-theoretic methods in
database theory, PODS 90. - Reps, T., Program analysis via graph
reachability, Inf. and Softw. Tech. 98.
99References
- Slicing, chopping, etc.
- Horwitz, Reps, Binkley, TOPLAS 90
- Reps, Horwitz, Sagiv, Rosay, FSE 94
- Reps Rosay, FSE 95
- Dataflow analysis
- Reps, Horwitz, Sagiv, POPL 95
- Horwitz, Reps, Sagiv, FSE 95, TR-1283