Title: Applying First-Order Theorem Provers in Formal Software Safety Certification
1Applying First-Order Theorem Proversin Formal
Software Safety Certification
Joint work with E. Denney and J. Schumann,NASA
Ames Research Center
2Disclaimer
Keep in mind that we are ATP users, not
developers we are not looking for proofs but
for assurance we consider ATPs a necessary
evil -)
3 Code Generator DSL Compiler
Initial model (DSL program)
. . .
model landsat as Landsat Clustering. const nat
N as number of pixels. const nat B as number
of bands. const nat C 5 as number of
classes where C ltlt N. double phi(1..N) as
class weights where 1 sum(I 1..C,
phi(I)). double mu(1..C), sig(1..C) where 0 lt
sig(_). int c(1..N) as class assignments. c(_)
discrete(phi). data double x(1..N, 1..B) as
pixels. x(I,_) gauss(mu(c(I)),
sig(c(I))). max pr(x phi,mu,sig) for
phi,mu,sig.
- Ground cover map
- multiple Landsat-bands
- estimate pixel classes
- estimate class parameters
- Implementation problems
- which model?
- which algorithm?
- efficient C/C code?
- correctness?
Model refinements
sig(_) invgamma(delta/21,sig0delta/2).
Model changes
x(I,_) cauchy(mu(c(I)), sig(c(I))).
x(I,_) mix(c(I) cases 1 -gt
gauss(0, error), _ -gt
cauchy(mu(c(I)),sig(c(I)))).
4 Code Generator DSL Compiler
Generated program
. . .
- Ground cover map
- multiple Landsat-bands
- estimate pixel classes
- estimate class parameters
- Implementation problems
- which model?
- which algorithm?
- efficient C/C code?
- correctness?
5 Code Generator DSL Compiler
- Generated program
- 1sec. generation time
- 600 lines
- 130 leverage
- fully documented
- deeply nested loops
- complex calculations
- correct-by-construction
. . .
- Ground cover map
- multiple Landsat-bands
- estimate pixel classes
- estimate class parameters
- Implementation problems
- which model?
- which algorithm?
- efficient C/C code?
- correctness?
6Generator Assurance
- Should you trust a code generator?
- Correctness of the generated code depends on
correctness of the generator - Correctness of the generator is difficult to show
- very large
- very complicated
- very dynamic
- So what do you do?
7Generator Assurance
- Should you trust a code generator?
- Correctness of the generated code depends on
correctness of the generator - Correctness of the generator is difficult to show
- very large
- very complicated
- very dynamic
- So what???
- Dont care whether generator is buggy for other
peopleas long as it works for me now! - ? Certifiable Code Generation (PCC for
code generators)
8Certifiable Code Generation
- Generator is extended to support post-generation
verification - certify generated programs, not the generator
- minimizes trusted component base
- no need to re-certify generator
- use standard program verification techniques
- annotations (e.g., invariants) give hints only
- proofs are independently verifiable evidence
(certificates) - keeps certification independent from code
generation - focus on specific safety properties
- keeps annotation generation and certification
tractable
... // Initialization for(v440v44ltn-1v44
) for(v450v45ltc-1v45 )
q(v44,v45)0 for(v460v46ltn-1v46)
q(v46,z(v46))1 ... for(v120v12ltn-1pv12
) for(v130v13ltc-1pv13) pv68 0
for(v410v41 lt c-1pv41 )
v68exp((x(v12)-mu(v41))
(x(v12)-mu(v41))/ (double)(-2)/
...) ...
model mog as 'Mixture of Gaussians'. ...
Class probabilities double rho(1..c). where 1
sum(I1..c, rho(I)). Class parameters double
mu(1..c). double sigma(1..c). where 0 lt
sigma(_). Hidden variable nat z(1..n)
discrete(rho). Data data double x(1..n). x(I)
gauss(mu(z(I)),sigma(z(I))). Goal max
pr(xrho,mu,sigma) for rho,mu,sigma.
Proofs
Model
Code
9Hoare-Style Certification Framework
- Safety property formal characterization of
aspects of intuitively - safe programs
- introduce shadow variables to record safety
information - extend operational semantics by effects on shadow
variables - define semantic safety judgements on
expressions/statements - Safety policy proof rules designed to show that
safety property - holds for program
- extend Hoare-rules by safety predicate and shadow
variables - ? prove soundness and completeness (offline,
manual) -
10Certification Framework
- Safety property formal characterization of
aspects of intuitively - safe programs
- All automatic variables shall have been
assigned a value before being used (MISRA 9.1) - Formal
- introduce shadow variables to record safety
information - extend operational semantics by effects on shadow
variables
11Certification Framework
- Safety property formal characterization of
aspects of intuitively - safe programs
- All automatic variables shall have been
assigned a value before being used (MISRA 9.1) - Formal
- introduce shadow variables to record safety
information - extend operational semantics by effects on shadow
variables - define semantic safety judgements on
expressions/statements
12Certification Framework
- Safety property formal characterization of
aspects of intuitively - safe programs
- All automatic variables shall have been
assigned a value before being used (MISRA 9.1) - Formal
- introduce shadow variables to record safety
information - extend operational semantics by effects on shadow
variables - define semantic safety judgements on
expressions/statements - prove safety reduction (i.e., consistency of
safety property) - ? safe programs dont go wrong
13Certification Framework
- Safety policy proof rules designed to show that
safety property - holds for program
- responsible for
- maintenance of shadow variables
- construction of safety obligations
- extend Hoare-rules by safety predicate and shadow
variables
14Safety Properties
- Language-specific properties
- array indices within bounds (array) ?ai ? c
a i a - variable initialization before use (init) ? rvar
x ? c x INIT - nil-pointer dereference,
- Domain-specific properties
- matrix symmetry (symm) ? covar m ? c ?i,i
mi,j mj,i - covariance matrices known by code generator
- can insert annotations
- vector norm, coordinate frame safety,
lo
hi
similar to PCC
init
15Certification Architecture
- standard PCC architecture
- organically grown job control
- run scripts based on SystemOnTPTP
- dynamic axiom generation (sed / awk)
- dynamic axiom selection (based on problem names)
16 17 Lesson 1 Things dont go wrong in the ATP
- Most errors are in the application,
axiomatization, or integration - interface between application and ATP proof
task - application debugging task proving / refuting
- application errors difficult to detect
- application must provide full axiomatization
(axioms / lemmas) - no standards ? no reuse
- consistency difficult to ensure manually
- integration needs generally supported job control
language - better than SystemOnTPTP shell scripts C
pre-processor - applications need more robust ATPs
- better error checking free variables, ill-sorted
terms, - consistency checking more important than proof
checking
18 Lesson 2 TPTP ? Real World
- Applications and benchmarks have different
profiles - full (typed) FOF vs. CNF, UEQ, HNE, EPR,
- clausification is integral part of ATP
- problems in subclasses are rare (almost
accidental) - ATPs need to work on full FOF(? branch to
specialized solvers hidden) - task stream vs. single task one problem many
tasks - success only if all tasks proven
- most tasks relatively simple, but often large
- most problems contain hard tasks
- background theory remains stable
- ATPs need to minimize overhead (? batch mode)
- ATPs should be easily tunable to application
domain
19 Lesson 2 TPTP ? Real World
- characteristics (and results) vary with policy
- array partial orders ground arithmetics
- significant fraction of tasks solved, but not
problems - most tasks relatively simple, most problems
contain hard tasks - problem-oriented view magnifies ATP differences
array
20 Lesson 2 TPTP ? Real World
- characteristics (and results) vary with policy
- array partial orders ground arithmetics
- init deeply nested select/update terms
- completely overwhelms ATPs
- response times determined by Tmax (60secs.)
array
init
21 Lesson 2 TPTP ? Real World
- characteristics (and results) vary with policy
- array partial orders ground arithmetics
- init deeply nested select/update terms
- symm deeply nested select/update terms (but
trickier context...) - completely overwhelms ATPs
- ATPs only solve trivial VCs
array
init
symm
22 Lesson 2 TPTP ? Real World
- characteristics (and results) vary with policy
- array partial orders ground arithmetics
- init deeply nested select/update terms
- symm deeply nested select/update terms (but
trickier context...) - array and init are inherently simple
- should be solvable by any competitive ATP
(low-hanging fruit) - TPTP overtuning?
array
init
symm
23 Lesson 3 Need controlled simplification
- Applications generate large conjectures with
redundancies - propositional true ? x, x ? true,
- arithmetics 65, minus(6, 1),
- Hoare-logic select(update(update(update(x,0,0),1,
1),2,2),0) - can frequently be evaluated / simplified before
provingrewriting beats resolution - ATPs should provide (user-controlled)
simplification mode - ground-evaluation of built-in functions /
predicates - orientation of equalities / equivalences
- here hand-crafted rewrite-based simplifications
24 Lesson 3 Need controlled simplification
- propositional simplification, min-scoping,
splitting into VCs - evaluation of ground arithmetics
- array good w/ E / Vampire, neutral w/ Spass,
mixed w/ Equinox - init floods ATPs with hard VCs (other ATPs
killed after hours) - symm splits hard VCs, no benefits
array
init
symm
25 Lesson 3 Need controlled simplification
- select(update(x,i,v), j) ? (ij) ? v select(x,
j) - rules for _?_ _
- array, symm no difference to before
- init dramatic improvement for all ATPs (?
eliminates all array accesses)
array
init
symm
26 Lesson 3 Need controlled simplification
- domain-specific simplifications (mostly
symmetry)symm(update(m,k,v)) ? ?i,i
select(update(m,k,v),i,j) select(update(m,k,v),j
,i) - array, init no substantial difference to before
(but less VCs) - symm dramatic improvement for all ATPs
array
init
symm
27 Lesson 4 Need axiom selection
- Application domain theories are often large but
largely disjoint - core theory formalizes underlying programming
language - array and init essentially part of core theory
- property-specific theories
- introduce symbols only occurring in tasks for the
given property - contain lemmas only relevant to tasks for the
given property - intuitively not required for other properties
- should have no detrimental effect on ATP
performance
28 Lesson 4 Need axiom selection
- Example init with redundant symm-axioms
- explicit symmetry-predicate ?i,i
select(m,i,j) select(m,j,i) ? symm(m) symm(n)
? symm(m) ? symm(madd(n, m)) - implicit symmetry ?i,i select(n,i,j)
select(n,j,i) ?select(m,i,j) select(m,j,i) ?
select(madd(n,m),i,j) select(madd(n,m),j,i)
init imp. symm
init
init exp. symm
29 Lesson 4 Need axiom selection
- ATPs shouldnt flatten structure in the domain
theory - similar to distinction conjecture vs. axioms
- coarse selection based on file structuring can be
controlled by application - detailed selection based on signature of
conjecture might be benificial (cf. Reif /
Schellhorn 1998)
30 Lesson 5 Need built-in theory support
- Applications use the same domain theories over
and over again - It's disgraceful that we have to define integers
using succ's,and make up our own array syntax - significant engineering effort
- no standards ? no reuse
- hand-crafted axiomatizations are
- typically incomplete
- typically sub-optimal for ATP
- typically need generators to handle occurring
literals - can add 3succ(succ(succ(0))) but what about
1023? - shell scripts generate axioms for
- Pressburgerization, finite domains (small
numbers only) - ground orders (gt/leq), ground arithmetic (all
occurring numbers) - often inadequate (i.e., wrong)
- often inconsistent
31 Lesson 5 Need built-in theory support
- FOL ATPs should steal from SMT-LIB and HO
systems and - provide libraries of standard theories
- TPTP TFF (typed FOL and built-in arithmetics) is
a good start - Your ATP should support that!
- SMT-LIB is a good goal to aim for
- theories can be implemented by
- axiom generation
- decision procedures
- Your ATP should support that!
32 Lesson 6 Need ATP PlugPlay
- No single off-the-shelf ATP is optimal for all
problems - combine ATPs with different preprocessing steps
- clausification
- simplification
- axiom selection
- ATP combinators
- sequential composition
- parallel competition
-
- TPTPWorld is a good ATP harness but not a gluing
framework
33Conclusions
- (Software engineering) applications are tractable
- no fancy logic, just POFOL and normal theories
- different characteristics than TPTP (except for
SWC and SWV -) - Support theories!
- theory libraries, built-in symbols
- light-weight support sufficient
- ATP customization
- exploit regularities from application
- user-controlled pre-processing
- grab low-hanging fruits!
- Success Proof and Integration
- need robust tools
- need PlugPlay framework