Title: Counterexample Guided Abstraction Refinement via Program Execution
1Counterexample Guided Abstraction Refinement via
Program Execution
- Edmund Clarke
- Daniel Kroening
- Alex Groce
- Carnegie Mellon University
2Testing
- The program is executed
- Inputs are taken from a test vector
- Advantages
- Scales to large programs
- Applies to all programs
- Problems
- Where to get good test vectors?
- When are you done?
3Getting Inputs from an Abstraction
- The program performs input
- Typically done by calling API functions
- E.g., UNIX API
- read(), getc(), fread(), fgets(),
- But also time(), getuid(),
- These values are usually taken from
- Random source
- Manually created test vector
- A good test vector makes all the difference!
4Getting Inputs from an Abstraction
- Idea Ask a model checker what input will get us
to a possible bug - Problem Original program too big for any model
checker - Idea Use an abstract model
- Predicate Abstraction has been applied
successfully to ANSI-C
5Motivation Predicate Abstraction
- Software has too many state variables
- - State Space Explosion
- Graf/Saïdi 97 Predicate Abstraction
- Promoted by success of SLAM project at MSR
- Idea Only keep track of predicates on data
- Abstraction function
6Predicate Abstraction
Concrete States
Predicates
Abstract transitions?
7Under- vs. Overapproximation
- How to abstract the transitions?
- Depends on the property we want to show
- Typically done in a conservative manner
- Existential abstraction
- - Preserves safety properties
8Predicate Abstraction
Abstract Transitions
?
?
?
Property
Property holds. Ok.
9Predicate Abstraction for Software
- How do we get the predicates?
- Automatic abstraction refinement!
Kurshan et al. 93
Ball, Rajamani 00
Clarke et al. 00
10Predicate Abstraction
- Abstract Model may contain spurious error traces
- These traces can be removed by refining the
abstract model - Numerous abstraction and refinement methods
available - Implemented in many existing tools
- SLAM
- MAGIC
- BLAST
11Problem with Existing Tools
- Existing tools (BLAST, SLAM, MAGIC) use a Theorem
Prover like Simplify - Theorem prover works on real or natural numbers,
but C uses bit-vectors ? false positives - Most theorem provers support only few
operators(, -, lt, , ), no bitwise operators - Idea Use SAT solver to do bit-vector!
12Abstraction with SAT
- Successfully used for abstraction of C programs
(CMU Techreport available) - There is now a version of SLAM that uses this
- Found previously unknown bug
- Create a SAT instance which relates initial value
of predicates, basic block, and the values of
predicates after the execution of basic block - SAT also for simulation and refinement
13Getting Inputs from an Abstraction
- Idea Use the abstract model
- Build a predicate abstraction of the program
- Set the initial state to be the input location
- Obtain an abstract error trace
- Concretize abstract error trace
- Error trace contains input value
14Getting Inputs from an Abstraction
ORIGINAL PROGRAM
int main() char buffer100 unsigned i0
int ch while((chgetchar())!EOF)
assert(ilt100) bufferich
What inputdo we need?
15Getting Inputs from an Abstraction
ABSTRACT PROGRAM(no predicates)
void main() while()
assert()
Obvious error trace
1
INITIALPC
2
16Getting Inputs from an Abstraction
CONCRETIZATION
int main() char buffer100 unsigned i0
int ch while((chgetchar())!EOF)
assert(ilt100) bufferich
Constraints
1
? ? EOF
2
i 100
Ask SAT Solverfor solution
17Getting Inputs from an Abstraction
- Problems during concretization
- There might not be any solution!
- Then the abstract error trace is spurious
- Use regular abstraction refinement
- This will make the abstract model more and more
detailed over time
18Testing
- The program is executed
- Inputs are taken from a test vector
- Advantages
- Scales to large programs
- Applies to all programs
- Problems
- Where to get good test vectors?
- When are you done?
?
19Another Benefit
- What if there exists no abstract error trace?
- Possibly after refinement
- We have a conservative abstraction
- - There is no way to get from theinput location
to any bug! - We can abort this test run (reset)
- A model checking person would say we backtrack
20Backtracking
- Go back to the previous input location
- Done by re-playing the last n-1 inputs
- No more locations left?We are done! There are
no bugs! - This addresses the completeness issuein testing
- Model checking guysNo storing of concrete
states needed!
21Overview
22CRunner Implementation
- Replace library calls with CRunner calls
- getchar, time, fscanf, etc.
- The new ones call the model checker to get an
input value - Results of previous calls are cached
- Recompile/relink program
- Run the program
23Experimental Results
- Artificial benchmark with buffer overflow after n
iterations
- Comparison against conventional predicate
abstraction tool - Run-times include the compilation time for the
execution method. - A star denotes a time-out.
24Experimental Results
- spamc Part of spamassassin
- Front-end written in C for efficiency reasons
- Reproduced previously known buffer overrun
- Error trace requires filling up buffer with 1024
entries - Non-trivial conditions on input
25Experimental Results
char buffer1024 ...
switch(m-gttype)\ ... case
MESSAGE_BSMTP total full_write(fd,
m-gtpre, m-gtpre_len) for(i 0 i lt
m-gtout_len ) \ jlimit (off_t)
(sizeof(buffer) /
sizeof(buffer) - 4) for(j 0 i lt
(off_t) m-gtout_len j lt jlimit ) \
if(i 1 lt m-gtout_len m-gtouti
'\bsn' m-gtouti1
'.') \ if(j gt jlimit - 4)
break / avoid overflow
/ bufferj
m-gtouti bufferj
m-gtouti bufferj
'.' \ else \
bufferj m-gtouti ...
26Experimental Results
- sendmail
- Mail gateway for Unix machines
- Reproduced previously known bug
- Requires non-obvious email-message
- Bug too deep for conventional predicate
abstraction tools - Exploits type conversion bug for char(255)
-
27Questions?