Title: Precise Interprocedural Analysis using Random Interpretation
1Precise Interprocedural Analysis using Random
Interpretation
- Sumit Gulwani George Necula
- UC-Berkeley
2Random Interpretation
- Random Testing Abstract Interpretation
- Almost as simple as random testing but better
soundness guarantees. - Almost as sound as abstract interpretation but
more precise, efficient, and simple.
3Example
- Random testing needs to execute all 4 paths to
verify assertions. - Abstract interpretation analyzes statements once
but uses complicated operations. - Random interpretation simply executes program
once (and captures effect of all paths).
True
False
a 0 b i
a i-2 b 2
True
False
c b a d i 2b
c 2a b d b 2i
assert(cd 0) assert(c ai)
4Outline
- Framework for intraprocedural random
interpretation - Advantages
- Investigate all analyses using one framework
- Design and proof of new analyses will be simpler
- A generic algorithm for interprocedural analysis
5Outline
- Framework for intraprocedural random
interpretation - Affine join function
- Eval function
- Example
- A generic algorithm for interprocedural analysis
6Random Interpretation framework
- Goal Detect equivalences of expressions.
- Generic Algorithm
- Choose random values for input variables.
- Execute assignments.
- Using Eval function to evaluate expressions.
- Execute both branches of conditionals and combine
the program states at join points. - Using Affine Join function.
- Compare values of expressions to decide equality.
7Affine Join function
- Used for combining program states at join points.
- ?w State State ! State
- Let ? ?w(?1,?2). Then,
- ?(y) def w?1(y) (1-w)?2(y)
?2 a4, b1
?1 a2, b3
- ?7(?1,?2) a72 (1-7)4, b73 (1-7)1
- i.e. a-10, b15
8Properties of Affine Join
a 4 b 1
a 2 b 3
?2 a4, b1
?1 a2, b3
- ?7(?1,?2) a72 (1-7)4, b73 (1-7)1
- i.e. a-10, b15
- Affine join preserves common linear relationships
e.g. ab5. - It does not introduce false relationships w.h.p.
9Eval function
- Eval Expression State ! Value
- Used for executing expressions
- Defined in terms of Poly Expression ! Polynomial
- Poly is abstraction specific
-
- Eval(e,?) Evaluation of Poly(e) using ? and
random choices for non-program variables - Poly must satisfy
- Correctness Poly(e1) Poly(e2) iff e1 e2
- Linearity Poly(e) is linear in program
variables.
10Example of Poly function
- Linear Arithmetic (POPL 2003)
- Expression e y e1 e2 ce
- Poly(e) e
- Uninterpreted Functions (POPL 2004)
- Expression e y F(e)
- Poly(y) y
- Poly(F(e)) a Poly(e) b
11Example Random Interpretation for Linear
Arithmetic
i3
False
True
a 0 b i
a i-2 b 2
w1 5
i3, a1, b2
i3, a0, b3
i3, a-4, b7
False
True
c b a d i 2b
c 2a b d b 2i
i3, a-4, b7 c-1, d1
i3, a-4, b7 c11, d-11
w2 2
i3, a-4, b7 c23, d-23
assert (cd 0) assert (c ai)
12Outline
- Framework for intraprocedural random
interpretation - Affine join function
- Eval function
- Example
- A generic algorithm for interprocedural analysis
- Random summary (Idea 1)
- Issue of freshness (Idea 2)
- Error probability and complexity
- Experiments
13Example
- The second assertion is true in the context i2.
- We need two new ideas to make the analysis
interprocedural.
i3
False
True
a 0 b i
a i-2 b 2
w1 5
i3, a1, b2
i3, a0, b3
i3, a-4, b7
False
True
c b a d i 2b
c 2a b d b 2i
i3, a-4, b7 c-1, d1
i3, a-4, b7 c11, d-11
w2 2
i3, a-4, b7 c23, d-23
assert (cd 0) assert (c ai)
14Idea 1 Keep input variables symbolic
False
True
- Do not choose random values for input variables
(to later instantiate by any context). - Resulting program state at the end is a random
summary.
a 0 b i
a i-2 b 2
a0, bi
ai-2, b2
w1 5
a8-4i, b5i-8
True
False
c b a d i 2b
c 2a b d b 2i
a8-4i, b5i-8 c8-3i, d3i-8
a8-4i, b5i-8 c9i-16, d16-9i
w2 2
a0, b2 c2, d-2
i2
a8-4i, b5i-8 c21i-40, d40-21i
assert (cd 0) assert (c ai)
15Idea 2 Generate fresh summaries
Procedure P
Procedure Q
Input i
u P(2) v P(1) w P(1)
True
False
x i1
x 3
u 52 -7 3 v 51 -7 -2 w 51 -7 -2
w 5
x 3
x i1
x 5i-7
Assert (u 3) Assert (v w)
return x
- Plugging the same summary twice is unsound.
- Fresh summaries can be generated by random affine
combination of few independent summaries!
16Generating 2 random summaries for P
Input i
Procedure P
x ?7(5i-7,7-2i) 47i-91 x ?6(5i-7,7-2i)
40i-77 x ?2(5i-7,7-2i) 19i-35 x
?0(5i-7,7-2i) 7-2i x ?5(5i-7,7-2i) 33i-63 x
?1(5i-7,7-2i) 5i-7
True
False
x i1
x 3
w5,-2
x 3,3
xi1,i1
x5i-7,7-2i
return x
Procedure Q calls P 3 times. Hence, generating 2
random summaries for Q requires 23 fresh
summaries of P.
17Generating 2 random summaries for Q
Procedure Q
x ?7(5i-7,7-2i) 47i-91 x ?6(5i-7,7-2i)
40i-77 x ?2(5i-7,7-2i) 19i-35 x
?0(5i-7,7-2i) 7-2i x ?5(5i-7,7-2i) 33i-63 x
?1(5i-7,7-2i) 5i-7
u P(2) v P(1) w P(1)
u 472-91, 402-77 3,3 v 191-35,
7-21 -16,5 w 331-63, 51-7 -30,-2
Assert (u 3) Assert (v w)
18Loops and Fixed point computation
- In presence of loops (in procedures and
call-graphs), fixed point computation is
required. - The number of iterations required to reach fixed
point is kv(2kI1) 1 - kv of visible variables
- kI of input variables
19Error Probability and Complexity
- Time Complexity nkVkI2t
- Error probability 1/qt-m
- n size of program
- kV, kI of visible and input variables
- t of random summaries
- q size of set from which random values are
chosen - m kI kV (generic bound)
- kI kV (for linear arithmetic)
- 4 (for unary uninterpreted functions)
20Related Work
- Intraprocedural random interpretation
- Linear arithmetic (POPL 03)
- Uninterpreted functions (POPL 04)
- Interprocedural dataflow analysis (POPL 95, TCS
96) - Sagiv, Reps, Horwitz
- Cons simpler properties, e.g. liveness, linear
constants - Pro better computational complexity
- Interprocedural linear arithmetic (POPL 04)
- Muller-Olm, Seidl
- Cons O(k2) times slower
- Pro works for non-linear relationships too
21Related Work
- Intraprocedural random interpretation
- Linear arithmetic (POPL 03)
- Uninterpreted functions (POPL 04)
- Interprocedural dataflow analysis (POPL 95, TCS
96) - Sagiv, Reps, Horwitz
- Cons simpler properties, e.g. liveness, linear
constants - Pro better computational complexity
- Interprocedural linear arithmetic (POPL 04)
- Muller-Olm, Seidl
- Cons O(k2) times slower
- Pro works for non-linear relationships too
22Experiments
Random Inter (this paper)
Random Intra (POPL 2003)
Det Inter (TCS 96)
Prog Line Inp Var Time
go 29K 63 1700 47
ijpeg 28K 31 825 4
li 23K 53 392 34
gzip 8K 49 525 2
?(Var) Speedup
170 107
34 24
160 756
200 39
?(Inp) Speedup
17 1.9
3 2.3
20 1.3
6 2.0
- Inp of input variables that were constants
- Var of local variable that were constants
- ?(Var) of fewer local variable constants
discovered - Random Inter discovers 10-70 more facts Random
Intra is faster by 10-500 times Det Inter is
faster by 2 times.
23Conclusion
- Randomization buys efficiency, simplicity at cost
of probabilistic soundness. - Combining randomized techniques with symbolic
techniques is powerful.