Title: Quantified Invariant Generation using an Interpolating Saturation Prover
1Quantified Invariant Generationusing
anInterpolating Saturation Prover
- Ken McMillan
- Cadence Research Labs
TexPoint fonts used in EMF AAAAA
2Quantified invariants
- Many systems that we would like to verify
formally are effectively infinite state - Parameterized protocols
- Programs manipulating unbounded data structures
(arrays, heaps, stacks) - Programs with unbounded thread creation
- To verify such systems, we must construct a
quantified invariant - For all processes, array elements, threads, etc.
- Existing fully automated techniques for
generating invariants are not strongly relevance
driven - Invisible invariants
- Indexed predicate abstraction
- Shape analysis
3Interpolants and abstraction
- Interpolants derived from proofs can provide an
effective relevance heuristic for constructing
inductive invariants - Provides a way of generalizing proofs about
bounded behaviors to the unbounded case - Exploits a provers ability to focus on relevant
facts - Used in various applications, including
- Hardware verification (propositional case)
- Predicate abstraction (quantifier-free)
- Program verification (quantifier-free)
- This talk
- Moving to the first-order case, including FO(TC)
- Modifying SPASS to create an interpolating FO
prover - Apply to program verification with arrays, linked
lists
4Invariants from unwindings
- Consider this very simple approach
- Partially unwind a program into a loop-free,
in-line program - Construct a Floyd/Hoare proof for the in-line
program - See if this proof contains an inductive invariant
proving the property - Example program
x y 0 while() x y while(x ! 0)
x-- y-- assert (y 0)
5Unwind the loops
- Assertions may diverge as we unwind
- A practical method must somehow prevent this kind
of divergence!
6Interpolation Lemma
Craig,57
- If A Ù B false, there exists an interpolant A'
for (A,B) such that - A implies A
- A is inconsistent with B
- A is expressed over the common vocabulary of A
and B
A variety of techniques exist for deriving an
interpolant from a refutation of A Ù B, generated
by a theorem prover.
7Interpolants for sequences
- Let A1...An be a sequence of formulas
- A sequence A0...An is an interpolant for
A1...An when - A0 True
- Ai-1 Ai ) Ai, for i 1..n
- An False
- and finally, Ai 2 L (A1...Ai) \ L(Ai1...An)
In other words, the interpolant is a
structured refutation of A1...An
8Interpolants as Floyd-Hoare proofs
2. Each is over common symbols of prefix and
suffix
3. Begins with true, ends with false
9FOCI An Interpolating Prover
- Proof-generating decision procedure for
quantifier-free FOL - Equality with uninterpreted function symbols
- Theory of arrays
- Linear rational arithmetic, integer difference
bounds - SAT Modulo Theories approach
- Boolean reasoning performed by SAT solver
- Exploits SAT relevance heuristics
- Quantifier-free interpolants from proofs
- Linear-time construction TACAS 04
- From Q-F interpolants, we can derive atomic
predicates for Predicate Abstraction Henzinger,
et al, POPL 04 - Allows counterexample-based refinement
- Integrated with software verification tools
- Berkeley BLAST, Cadence IMPACT
10Avoiding divergence
- Programs are infinite state, so convergence to a
fixed point is not guaranteed. - What would prevent us from computing an infinite
sequence of interpolants, say, x0, x1, x2,...
as we unwind the loops further? - Limited completeness result TACAS06
- Stratify the logical language L into a hierarchy
of finite languages - Compute minimal interpolants in this hierarchy
- If an inductive invariant proving the property
exists in L, you must eventually converge to one
Interpolation provides a means of static analysis
in abstract domains of infinite height. Though we
cannot compute a least fixed point, we can
compute a fixed point implying a given property
if one exists.
11Expressiveness hierarchy
Canonical Heap Abstractions
8FO(TC)
Indexed Predicate Abstraction
8FO
Expressiveness
Predicate Abstraction
QF
Interpolant Language
Parameterized Abstract Domain
12Need for quantified interpolants
for(i 0 i lt N i) ai i for(j 0
j lt N j) assert aj j
- Existing interpolating provers cannot produce
quantified interpolants - Problem how to prevent the number of quantifiers
from diverging in the same way that constants
diverge when we unwind the loops?
13Need for Reachability
... node a create_list() while(a)
assert(alloc(a)) a a-gtnext
...
invariant
8 x (rea(next,a,x) x ? nil ! alloc(x))
- This condition needed to prove memory safety (no
use after free). - Cannot be expressed in FO
- We need some predicate identifying a closed set
of nodes that is allocated - We require a theory of reachability (in effect,
transitive closure)
Can we build an interpolating prover for full
FOL than that handles reachability, and avoids
divergence?
14Clausal provers
- A clausal refutation prover takes a set of
clauses and returns a proof of unsatisfiability
(i.e., a refutation) if possible. - A prover is based on inference rules of this form
P1 ... Pn
C
- where P1 ... Pn are the premises and C the
conclusion. - A typical inference rule is resolution, of which
this is an instance
p(a) p(U) ! q(U)
q(a)
- This was accomplished by unifying p(a) and P(U),
then dropping the complementary literals.
15Superposition calculus
- Modern FOL provers based on the superposition
calculus - example superposition inference
Q(a) P ! (a c)
P ! Q(c)
- this is just substitution of equals for equals
- in practice this approach generates a lot of
substitutions! - use reduction order to reduce number of
inferences
16Reduction orders
- A reduction order  is
- a total, well founded order on ground terms
- subterm property f(a) Â a
- monotonicity a  b implies f(a)  f(b)
- Example Recursive Path Ordering (with Status)
(RPOS) - start with a precedence on symbols a  b  c Â
f - induces a reduction ordering on ground terms
- f(f(a)  f(a)  a  f(b)  b  c  f
17Ordering Constraint
- Constrains rewrites to be downward in the
reduction order
Q(a) P ! (a c)
P ! Q(c)
example this inference only possible if a  c
18Local Proofs
- A proof is local for a pair of clause sets (A,B)
when every inference step uses only symbols from
A or only symbols from B. - From a local refutation of (A,B), we can derive
an interpolant for (A,B) in linear time. - This interpolant is a Boolean combination of
formulas in the proof
19Reduction orders and locality
- A reduction order is oriented for (A,B) when
- s  t for every s ? L (B) and t 2L(B)
- Intuition rewriting eliminates first A
variables, then B variables.
oriented x y c d f
x y f(x) c f(y) c
Local!!
f(y) c f(y) d c d
c d c ? d ?
20Orientation is not enough
A
B
Q(a)
a c
Q  a  b  c
b c
Q(b)
- Local superposition gives only cc.
- Solution replace non-local superposition with
two inferences
Second inference can be postponed until after
resolving with Q(b)
This procrastination step is an example of a
reduction rule, and preserves completeness.
21Completeness of local inference
- Thm Local superposition with procrastination is
complete for refutation of pairs (A,B) such that - (A,B) has a universally quantified interpolant
- The reduction order is oriented for (A,B)
- This gives us a complete method for generation of
universally quantified interpolants for arbitrary
first-order formulas! - This is easily extensible to interpolants for
sequences of formulas, hence we can use the
method to generate Floyd/Hoare proofs for inline
programs.
22Avoiding Divergence
- As argued earlier, we still need to prevent
interpolants from diverging as we unwind the
program further. - Idea stratify the clause language
Example Let Lk be the set of clauses with at
most k variables and nesting depth at most k.
Note that each Lk is a finite language.
- Stratified saturation prover
- Initially let k 1
- Restrict prover to generate only clauses in Lk
- When prover saturates, increase k by one and
continue
The stratified prover is complete, since every
proof is contained in some Lk.
23Completeness for universal invariants
- Lemma For every safety program M with a 8
safety invariant, and every stratified saturation
prover P, there exists an integer k such that P
refutes every unwinding of M in Lk, provided - The reduction ordering is oriented properly
- This means that as we unwind further, eventually
all the interpolants are contained in Lk, for
some k. - Theorem Under the above conditions, there is
some unwinding of M for which the interpolants
generated by P contain a safety invariant for M.
This means we have a complete procedure for
finding universally quantified safety invariants
whenever these exist!
24In practice
- We have proved theoretical convergence. But does
the procedure converge in practice in a
reasonable time? - Modify SPASS, an efficient superposition-based
saturation prover - Generate oriented precedence orders
- Add procrastination rule to SPASSs reduction
rules - Drop all non-local inferences
- Add stratification (SPASS already has something
similar) - Add axiomatizations of the necessary theories
- An advantage of a full FOL prover is we can add
axioms! - As argued earlier, we need a theory of arrays and
reachability (TC) - Since this theory is not finitely axiomatizable,
we use an incomplete axiomatization that is
intended to handle typical operations in
list-manipulating programs
25Partially Axiomatizing FO(TC)
- Axioms of the theory of arrays (with select and
store)
8 (A, I, V) (select(update(A,I,V), I) V
8 (A,I,J,V) (I ? J ! select(update(A,I,V), J)
select(A,J))
- Axioms for reachability (rea)
8 (L,E) rea(L,E,E)
8 (L,E,X) (rea(L,select(L,E),X) ! rea(L,E,X))
if e-gtlink reaches x then e reaches x
8 (L,E,X) (rea(L,E,X) ! E X _
rea(L,select(L,E),X))
if e reaches x then e x or e-gtlink reaches x
etc...
Since FO(TC) is incomplete, these axioms must be
incomplete
26Simple example
for(i 0 i lt N i) ai i for(j 0
j lt N j) assert aj j
27Unwinding simple example
note stratification prevents constants
diverging as 0, succ(0), succ(succ(0)), ...
28List deletion example
a create_list() while(a) tmp a-gtnext
free(a) a tmp
- Invariant synthesized with 3 unwindings (after
some simplification)
rea(next,a,nil)
8 x (rea(next,a,x)! x nil _ alloc(x))
- That is, a is acyclic, and every cell is
allocated - Note that interpolation can synthesize Boolean
structure.
29More small examples
This shows that divergence can be controlled.
But can we scale to large programs?...
30Canonical abstraction
- Abstraction replaces concrete heaps with abstract
symbolic heaps - Abstraction parameterize by instrumentation
predicates
- Abstract heap represents infinite class of
concrete heaps - Summary node represents equivalence class of
concrete nodes - Dotted arcs mean may point to
31Example program
node create_list() node l NULL
while() node n malloc(...)
n-gtnext l l n return l main()
node a create_list() while(a)
assert(alloced(a)) a a-gtnext
- Want to prove this program does not access a
freed cell.
32Canonical Abstraction
- Predicates Pta, Reaa, is_null, alloc
- Relations next
is_null
Pta
Rean
a
Pta
is_null
Rean
Rean
a
alloc
Pta
Rean
is_null(n)
Rean
Rean
a
alloc
alloc
All three abstract heaps verify property!
33A slightly larger program
main() node a create_list() node b
create_list() node c create_list()
node p ? a ? b c while(p)
assert(alloced(p)) p p-gtnext
- We have to track a, b and c to prove this
property - Lets look at what happens with canonical heap
abstractions...
34After creating a
- Predicates Pta, Reaa, is_null, alloced
- Relations next
35After creating b
36After creating c
Picture 27 abstract heaps here
Problem abstraction scales exponentially with
number of independent data structures.
37Independent analyses
- Suppose we do a Cartesian product of 3
independent analyses for a,b,c.
- How do we know we can decompose the analysis in
this way and prove the property? - What if some correlations are needed between the
analyses? - For non-heap properties, one good answer is to
compute interpolants.
38Abstraction from interpolants
main() node a create_list() node b
create_list() node c create_list()
node p ? x ? b c while(p)
assert(alloced(p)) p p-gtnext
- Interpolants contain inductive invariants after
unrolling loops 3 times. - Interpolant after creating c
39Shape of the interpolant
( a ? 0 ) alloced(a) ) ( b ? 0 ) alloced(b) )
( c ? 0 ) alloced(c) )
8 x. (x ? 0 alloced(x) )
alloced(next(x))
next
a b c
alloced
next
null
- Invariant says that allocated cells closed under
next relation - Notice also the size of this formula is linear in
the number of lists, not exponential as is the
set of shape graphs.
40Suggests decomposition
( a ? 0 ) alloced(a) ) ( b ? 0 ) alloced(a) )
( c ? 0 ) alloced(a) )
8 x. (x ? 0 alloced(x) )
alloced(next(x))
Canonical abstract domains
Predicates
Relations
a 0, alloced(n)
b 0, alloced(n)
c 0, alloced(n)
next
n 0, alloced(n)
- Each of these analyses proves one conjunct of the
invariant.
41Conclusion
- Interpolants and invariant generation
- Computing interpolants from proofs allows us to
generalize from special cases such as loop-free
unwindings - Interpolation can extract relevant facts from
proofs of these special cases - Must avoid divergence
- Quantified invariants
- Needed for programs that manipulating arrays or
heaps - FO equality prover modified to produce local
proofs (hence interpolants) - Complete for universal invariants
- Can be used to construct invariants of simple
array- and list-manipulating programs, using
partial axiomatization of FO(TC) - Language stratification prevents divergence
- Might be used as a relevance heuristic for shape
analysis, IPA
For this approach to work in practice, we need FO
provers with strong relevance heuristics as in
DPLL...
42Expressiveness hierarchy
Canonical Heap Abstractions
8FO(TC)
Indexed Predicate Abstraction
8FO
Expressiveness
Predicate Abstraction
QF
Interpolant Language
Parameterized Abstract Domain
43Need for Reachability
... node a create_list() while(a)
assert(alloc(a)) a a-gtnext
...
invariant
8 x (rea(next,a,x) x ? nil ! alloc(x))
- This condition needed to prove memory safety (no
use after free). - Cannot be expressed in FO
- We need some predicate identifying a closed set
of nodes that is allocated - We require a theory of reachability (in effect,
transitive closure)
Can we build an interpolating prover for full
FOL than that handles reachability, and avoids
divergence?