Title: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis
1Speeding Up Dataflow Analysis Using
Flow-Insensitive Pointer Analysis
- Stephen Adams, Tom Ball, Manuvir Das
- Sorin Lerner, Mark Seigle
- Westley Weimer
Microsoft Research University of Washington UC
Berkeley
2Motivation
- Static analysis for program verification
- Complex dataflow analyses are popular
- SLAM, ESP, BLAST, CQual,
- Flow-Sensitive
- Interprocedural
- Expensive!
- Cut down on data flow facts
- Without losing anything important
3General Idea
- If complex analysis is worse than O(N)
- And you have a cheap analysis that
- Is O(N)
- Reduces N
- Then composing them saves time
4Value Flow Graph (VFG)
- Variant of a points-to graph
- Encodes the flow of values in the program
- Conservative approximation
- Lightweight, fast to compute and query
- Early queries can safely reduce
- data-flow facts considered
- program points considered
- Like slicing a program wrt. value flow
5Computing a VFG
- Use a subtyping-based pointer analysis
- We used One-Level Flow Das
- Process all assignments
- Not just those involving pointers
- Represent constant values explicitly
- Put them in the graph
- Label graph with source locations
- Encodes program slices
6Example Points-To Graph
x
Points-to Edge
a
Source Address Node
x
Expr Node
7One Level Flow Graph
Flow Edge
x
Points-to Edge
1 int a, x 2 x a 3 x 7
a
Source Address Node
x
Expr Node
8Value Flow Graph
2
Flow Edge
x
Points-to Edge
1 int a, x 2 x a 3 x 7
2
7
a
Source Address Node
x
Expr Node
3
2
2,3
9VFG Properties
- Computed in almost-linear time
- Get points-to sets from VFG in linear time
- Backwards reachability via flow edges
- Gather up all variables
- Get value flow from VFG in linear time
- Backwards reachability via flow edges
- Follow points-to edges up one
10VFG Query Points-To of x
2
Flow Edge
x
Points-to Edge
1 int a, x 2 x a 3 x 7
2
7
a
Source Address Node
x
Expr Node
3
2
2,3
11VFG Query Value Flow into a
2
Flow Edge
x
Points-to Edge
1 int a, x 2 x a 3 x 7
2
7
a
Source Address Node
x
Expr Node
3
2
2,3
12VFG Summary
- Computed in almost-linear time
- Queries complete in linear time
- Approximates flow of values in program
- Show two applications that benefit
- ESP
- SLAM
13Application 1 ESP
- Verification tool for large C programs
- Tracks typestate of values
- Encoded as Finite State Machine
- Special Error state
- Core interprocedural data-flow engine
- Flow sensitive state at every point
- Performed bottom-up on call graph
- Requires function summaries
14ESP Function Summaries
- Consider stateful memory locations
- Summarize function behavior for each loc
- Reducing number of locs would be good!
- But C has evil casts, so types cannot be used
- Worst case set of locations
- All globals and formal parameters
- Everything transitively reachable from there
15Reduce Location Set
- Location L needs to be considered in F if
- Some exp E has its state changed in F
- Value held by L at entry to F can flow into E
- Assuming state-changing ops are known
- Query VFG to find values that flow in
16ESP Example
- FILE e, f, g, h
- void foo()
- FILE p
- int a (int)h
- if () p e
- else p f
- p fopen()
-
Locations to consider for foo() summary e,
e, f, f, g, g, h, h
17ESP Example
- FILE e, f, g, h
- void foo()
- FILE p
- int a (int)h
- if () p e
- else p f
- p fopen()
-
- Compute VFG
- (2) Query value flow on p
- (3) Reduced locations to consider for foo()
summary e, f - (4) Reduce lines to consider for dataflow
18ESP Results
- FILE output in GCC
- 140 KLOC, 2149 functions, 66 files, 1068 globals
- VFG Queries take 200 seconds
- Reduce average number of locations per function
summary from 1100 to lt1 - Median of 15 for functions with gt0
- Verification takes 15 minutes
- Infeasible otherwise
19Application 2 SLAM
- Validates temporal safety properties
- Boolean abstraction
- Interprocedural dataflow analysis
- Counterexample-driven refinement
- Convert C program to Boolean program
- Exhaustive dataflow analysis
- No errors? Program is safe.
- Real error? Program has a bug.
- False error? Add predicates, repeat.
20Boolean Programs
- int x,y
- x 5
- y 6
- x x 2
- y y 2
- assert(xlty)
bool p,q p 1 q 1 p 0 q 0 q
1 assert(q)
p means x 5 q means x lt y
Predicates (important!)
C Program
Boolean Program
21SLAM Predicates
- Hard to come up with good predicates
- Counterexample-driven refinement
- Picks good predicates
- Is very slow
- Taking all possible predicates
- Is even slower
- Want all the useful predicates
22Speeding Up SLAM
- For a simple subset of C
- Similar to Copy Constants
- Use VFG to find a sufficient set of predicates
- Provably sufficient for this subset
- If this set fails to prove the real program
- Fall back on counterexample-driven refinement
23A Simple Language
- s vi n // constants
- vi vj // variable copy
- if () s1 else s2 // condition ignored
- vi fun(vj, ) // function call
- return(vi) // function return
- assert(vi vj) // safety property
24Predicate Discovery
- High-level idea
- Each flow edge in the VFG means values may flow
from X to Y - Add predicates to see if they do
- For each assert(vi vj)
- Consider the chain of values flowing to vi, vj
- Add an equality predicate for each link
- Use constants to resolve scoping
25SLAM Example
- int sel(int f)
- int r
- if () r f
- else r 3
- return(r)
-
- void main()
- int a,b,c
- a 1
- b sel(a)
- if () c 2
- else c 4
- assert(b gt c)
-
a
1
f
r
3
b
4
c
2
26Predicates For b
- int sel(int f)
- int r
- if () r f
- else r 3
- return(r)
-
- void main()
- int a,b,c
- a 1
- b sel(a)
- if () c 2
- else c 4
- assert(b gt c)
-
a
1
f
r
3
b
Predicates b r r 3 r f f a a 1
27Predicates For b
- int sel(int f)
- int r
- if () r f
- else r 3
- return(r)
-
- void main()
- int a,b,c
- a 1
- b sel(a)
- if () c 2
- else c 4
- assert(b gt c)
-
a
1
f
r
3
b
Predicates b r r 3 r f f a // no
scope! a 1
28Predicates For b
- int sel(int f)
- int r
- if () r f
- else r 3
- return(r)
-
- void main()
- int a,b,c
- a 1
- b sel(a)
- if () c 2
- else c 4
- assert(b gt c)
-
a
1
f
r
3
b
Predicates b r b r r 3 r 3 r
f r f f a // no scope! f 1 f
3 a 1 a 1 a 3
29Why does this work?
- Simple language
- No arithmetic, etc.
- Just copying around initial values
- Knowing final values of variables
- Completely decides safety condition
- Still related to real life
- Cannot do arithmetic on locks, FILE s, device
driver status codes, etc.
30Some SLAM Results
Program LOC Original Runtime Improved Runtime Generated Predicates Missing Predicates
apmbatt 2207 229s 22s 85 0
pnpmem 3849 1132s 125s 143 4
floppy 7562 1063s 600s 154 33
iscsiprt 4543 729s 146 42
Generated predicates are between all and
two-thirds of the necessary predicates. However,
since SLAM must iterate once to generate 3-7
missing predicates, the net performance increase
is more than linear. Predicates can be
specialized or simplified if the assert()
condition is a common relational operator (e.g.,
xy, xlty, x5).
31Conclusions
- Complex interprocedural analyses can benefit from
inexpensive value-flow - VFG encodes value flow
- Constructed and queried quickly
- Prune the set of dataflow facts and program
points considered - Large net performance increase