Title: Discriminative Model Checking
1Discriminative Model Checking
- Peter Niebert
- Doron Peled
- Amir Pnueli
- CAV 2008
2Discriminative Model Checking
- Peter Niebert
- Doron Peled
- Amir Pnueli
- CAV 2008
Warwning inside this talk hides another talk!
Automatic Generation of Programs Using Model
Checking and Genetic Programming
Gal Katz Doron
Peled
3Which logic to use?
- Linear each execution is an alternating sequence
of states/actions. - Use LTL/Buchi automata.
- Counterexample if property fails.
- Branching a tree repsresents all executions,
including the points where they branch. - Allows expressing possibility, e.g., of services.
4Linear Temporal Logic
5Computation Tree Logic
EG p
AF p
p
p
p
p
p
p
p
p
p
p
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
6Our point of view
- Linear time is sufficient for specifying most
properties. - A counterexample is often not enough
- Gives very little clue about the location of the
error. - Does not give information about how good and bad
executions are related to each other. - Thus, for analysis beyond finding the existence
of an error, we promote a deeper search.
7Our suggestion
- Primary or base specification ? in LTL, for the
base property. - Analysis specification, quantifies over
executions that satisfy or do not satisfy the
base specification. - Syntaxp ?\/? ?? ?? ??? ???? (and
others) - Semantics- ??? there exists a continuation
satisfying the property ?, where ? holds from the
beginning. - ??? there exists a continuation not
satisfying the property ?, where ? holds from the
beginning.
8Semantics illustration
- Semantics- ??? there exists a continuation
satisfying the property ?, where ? holds from the
beginning. - ???? there exists a continuation
not satisfying the property ?, where ? holds
from the beginning.
? holds
? holds
. . .
. . .
. . .
. . .
9Examples for specifications
- Bad executions depend on infinitely many bad
choices ??ltgt??true - Before executing a, there are good and bad
executions. Once a is executed, things things are
persistently bad ?((Execa/\??true)W(Execa/\??fa
lse)) - Properties such as from some point all
continuations are good/bad.
10How to do model checking?
- We need to remember some information about the
path so far to verify that with the rest of the
computation it is (not) satisfying ?. - Suppose we would have run a Buchi automaton for
?, but with nondeterministic, maybe it is running
on the wrong branch to be completed. - Thus, we would be running a subset construction
(determinization) of the Buchi automaton. - At the point of branching, we continue with a
state consistent with one of the Buchi states in
the current subset. - Apply CTL model checking to this structure.
11Complexity
- EXSPACE-complete even for AG ??true
- Reduction shown for related logic mCTLKV LICS
2006 (this logic has different semantics, where
quantification always start from the initial
state). - But EXSPACE-complete in size of LTL formula,
PSPACE-complete in size of branching formula.
12Application
- Why do we need such an analysis?
- and now we go to another lecture
13Automatic Generation of Programs Using Model
Checking and Genetic Programming
TACAS 2008
14Agenda
- Introduction motivation
- Genetic Programming
- Model Checking
- Combined method
- Application to mutual exclusion
- Conclusions future work
15Introduction
- Genetic programming
- A methodology for automatic programming inspired
by Darwinian evolution Koza 92. - Used for automatic generation of programs in
various fields. - Mostly used for optimization related problems.
- Fitness is usually calculated by checking program
performance against test cases. - Less used for problems with a strict
specification.
16Introduction (2)
- Model Checking
- An automatic formal verification technique used
mainly with finite-state software and hardware
systems. - Can be used to verify communication and
concurrent protocols. - Models are checked against a strict
specification. The result is either - A confirmation that the model satisfies the
specification, or - A counterexample of that fact.
17Introduction (3)
- How to construct a model from the spec.?
- Synthesis
- Transforms spec. directly to a model that
satisfies it. - Complicated.
- Currently not practical for automatic program
generation. - Brute-force enumeration
- All possible programs of a specific domain and
size are generated and model-checked. - All existing solutions will eventually be found.
- Very time-intensive. Not practical for programs
with more than few lines of code.
18Our MethodCombining GP Model Checking
User
1. Specification
2. Configuration
6. Final Model / Results
GP Engine
EnhancedModel Checker
3. Initial population
4. Verification results
5. New programs
19Main Steady-state GP Algorithm
- Create initial program population.
- Randomly choose µ programs.
- Create ? new programs by applying genetic
operations to the above µ programs. - Calculate fitness function for µ ? programs,
and use it to select µ new programs. - Replace the old µ programs by the selected ones.
- Repeat steps 2-5 until either
- a perfect solution is found, or
- maximum allowed number of iterations is reached.
20Program Representation
- Programs are represented as trees.
- Internal nodes represent expressions or
instructions with parameters (assignment, while,
if, block). - Terminal nodes represent constants or expressions
without any parameter (0, 1, 2, me, other). - Strongly-typed GP is used Montana 95.
While (A2 ! 0) Ame 1
21Initial Population Creation
- Population usually contains 100 1000 programs.
- Program are created recursively using the grow
method KOZA 92. - The root is randomly selected from instruction
nodes. - Offspring are randomly selected from allowed node
or terminals as long as rules are preserved. - If max allowed tree depth is reached, a terminal
must be chosen.
22Genetic Operations
- At each iteration of the GP algorithm, the
following genetic operations are applied to the
selected programs - Reproduction programs are copied without any
change - Mutation
- Crossover
23Mutation Operation
- The main operation we use.
- Allows performing small modifications to an
existing program by the following method - Randomly choose a program node (internal, or
leaf). - According to the node type, apply one of the
following operations with respect to the chosen
node (strong typing must be kept)
24Replacement Mutation type (a)
while
- Replace the sub-tree rooted by node with a new
randomly generated sub-tree. - Can change a single node or an entire sub-tree.
assign
!
0
A
A
1
me
2
While (A2 ! 0) Ame 1
While (A2 ! 0) Ame A0
25Insertion Mutation type (b)
- Add an immediate parent to the selected node.
- Randomly create other offspring to the new
parent, if needed. - According to the selected parent type, can cause
- Insertion of code,
- Wrapping code with a while loop,
- Extending Boolean expressions.
While (A2 ! 0) Ame 1
While (A2 ! 0) A2 other Ame 1
26Reduction Mutation Type (c)
- Replace the selected node by one of its
offspring. - Delete the remaining offspring of the node.
- Has the opposite effect of the previous insertion
mutation, and reduces the program size.
27Deletion Mutation Type (d)
while
- Delete the sub-tree rooted by the node.
- Update ancestors recursively.
!
0
A
2
While (A2 ! 0) Ame 1
28Crossover Operation
- Creates new programs by merging building blocks
of two existing programs. - Crossover steps are
- Randomly choose a node from the 1st program.
- Randomly choose a node from the 2nd program, that
has the same type as the 1st node. - Exchange between the sub-trees rooted by the two
nodes, and use the two newly created programs.
29Crossover Example
block
if
assign
!
1
A
me
A
me
2
A2 me while (ame other)
If (Ame ! 1) a0 other
A2 me a0 other
If (Ame ! 1) while (ame other)
30Crossover (cont.)
- Heavily used by traditional GP Koza.
- Tries to mimic biological sexual recombination,
but - Unlike biology (and unlike GA), GP lacks the
notion of genes Banzhaf et al. 01. - Often acts only as a macro-mutation.
- Various methods were developed in order to turn
it into a more fruitful operation (Brood,
Inteligent crossover). - Still, not a significant operation for small
programs like those of Mutual Exclusion.
31Selection
- At each iteration, selection is applied to all µ
? programs (over-production selection). - Program are selected using a fitness-proportional
(roulette) method Holland 92. - Elitism is used to ensure that the best program
is always selected. - Similar to Evolution Strategies Rechenberg 94
and Brood Recombination method Tackett 94 -
better protection from harmful operations.
32Model Checking
33?-automata
- Runs on infinite words, and consist of
- A finite alphabet S,
- A finite set of states S,
- A set of initial states S0 ? S,
- A transition relation ? ? S x S,
- A labeling function L S ? ?,
- An acceptance condition O.
- In this version, the labels are on the states
instead of on the arcs.
34Acceptance conditions
- For a run p, inf(p) denotes the states appearing
infinitely on p. - Buchi condition
- A set of states F ? S,
- A run p over A is accepted if inf(p) n F ? Ø
- Streett condition
- A set of k pairs (Ei,Fi), 1 i k, Ei, Fi ? S,
- A run p over A is accepted if for all pairs
- inf(p) n Ei ? Ø ? inf(p) n Fi ? Ø.
35?-automata Closure
- Buchi automata can be converted into Streett
automata, and vice versa. - Both Buchi and Streett automata are closed under
intersection and complement. - Streett automata are less simple to use, but are
closed under determinization, while Buchi
automata are not.
36Building Programs State-graph
- Each state consists of values of variables,
program counters, buffers, etc. - Edges represent atomic transitions caused by
program instructions.
- Can be built by a DFS algorithm.
- Can be decomposed into SCCs Tarjan 72.
37Converting Model to ?-automaton
- We use the states, initial state and transitions
of the programs state-space. - Acceptance condition can allow all runs, or
impose fairness conditions. - Streett automata can be used in order to define
various fairness conditions (weak strong).
38Safety Properties
- Basic properties can be checked by simply
analyzing the state graph - Invariants can be checked on every visited
state. - Deadlocks states without outgoing edges.
- Unreachable code instructions that are not
represented on any transition. - Liveness properties require a more complicated
process.
39Specification
- We use Linear Temporal Logic (LTL) Pnueli 77 to
define specification properties. - LTL formulas are interpreted over an infinite
sequences of states, and consist of - Propositional variables,
- Logical connectives, such as ? , ? , ? , ?, and
- Temporal operators, such as
- ?(p) p will eventually occur.
- ?(p) p always occurs.
- A model M satisfies a formula f (M f) if every
(fair) run of M satisfies f.
40Converting specification to ?-automaton
- Every LTL property can be converted into a Buchi
automaton with a size exponential to the LTL
formula size Vardi Wolper 94. - For deterministic Streett automata, a
determinization process is also required Safra
88. - May result in a doubly exponential blowup from
LTL property.
41The Model Checking Process Vardi Wolper 86
- Both model and speciation are converted to
?-automata over the same alphabet. - The alphabet is 2AP, where AP denotes a set of
atomic propositions that may hold on the system
states. - Every word accepted by M (a fair run) should be
accepted by the spec, therefore we have to check
whether L(M) ? L(f(.
42Model Checking Results
- Its easier to check whether
- L(M) n L(f( Ø, or
- L(M) n L(?f( Ø.
- Case 1
- Intersection is empty.
- M satisfies f .
- Case 2
- Intersection is not empty.
- Runs contained in the intersection can be used
for generating counterexamples.
43Checking for Non-Emptiness
- Easy with Buchi automata
- Decompose intersection graph into maximal SCCs
reachable from the root. - Check ff an accepting state from F occurs
infinitely often inside a reachable SCC.
- More complicated with Streett automata.
- Alg. can be used for a single SCC or an entire
automaton
44Model Checking and GP
- Can standard model checking results be used as a
GP fitness function? - Yes, but it was done so far with a limited
success Johnson 07. - A fitness function with just two values is a poor
one. - We wish to analyze the model checking graph in
order to quantify the level of satisfaction. - When using nondeterministic Buchi automata, a
single program computation may have multiple
accepting and non-accepting paths ? difficult to
analyze. - Deterministic Streett automata are not more
expensive, but ensure symmetry between accepting
and non-accepting paths.
45Enhanced Model Checking Algorithm
- The idea
- We assume that an hostile scheduler (or
environment) chooses the execution path. - For each spec. property, we check the amount of
work the scheduler has to make in order to cause
a property violation. - The results are used for setting the fitness
level scores.
46Fitness Level 0
A
- All SCCs are empty (not accepting).
- Property is never satisfied.
- No scheduler choices are needed.
C
B
E
D
47Fitness Level 1
A
- At least one accepting SCC.
- At least one empty bottom SCC.
- Finite number of scheduler choices can lead the
execution into the empty BSCC (D in the example). - The program will stay there forever.
- BSCC with only 1 node means a deadlock ? gets
worse score.
C
B
E
D
48Fitness Level 2
A
- All BSCCs are accepting.
- At least one empty SCC.
- Infinite scheduler choices are needed for keeping
the program inside the empty SCC (B in the
example).
C
B
E
D
49Fitness Level 3
- All SCCs are accepting.
- There still may be SCCs that are not universal,
and contains violating paths. - Therefore, the graph universality is checked.
- If the graph is not universal, we are still at
level 2. - Otherwise, level 3 is assigned.
- In this case, even infinite scheduler choices
cannot cause a violation, since the property is
always satisfied.
A
C
B
E
D
50Overall Fitness Function
- Fitness levels scores are calculated for each
specification property. - How to merge into a single fitness function?
- Naïve summing can bias the results, since some
properties may be trivially satisfied when more
basic properties are violated. - Thus, spec. properties are divided into levels,
starting from level 1 for most basic properties. - As long as not all properties at level i are
satisfied, properties at higher level gets
fitness of 0. - This algorithm also saves running time by
skipping unneeded checks.
51Parsimony
- GP programs tend to grow up over time to the
maximal allowed tree size (bloating). - Large portions of the code become introns
(junk DNA). - To avoid that, we use parsimony as a secondary
fitness measure. - Number of program nodes small factor is
subtracted from the fitness score. - The factor should be carefully chosen.
- Should encourage programs to reduce their size,
but - Should not harm the evolutionary process.
- Therefore, programs cannot get a score of 100,
but only get close to it. The run can be stopped
when all properties are satisfied. - Programs can be reduces either by mutations, or
directly by detecting dead code by the model
checking process, and then removing it.
52Vacuity
?(p ? ?q)
- A special care is needed for implication
properties of the form ?(p ? ?q). - Some (or all) executions may be vacuously
satisfied if p never happens. - We are usually interested only on runs when p
eventually occurs. - Other runs are neither good nor bad. They are
irrelevant. - Thus, in these cases, the program automata is
first intersected with the property ?p. - Some SCC might be marked irrelevant.
p?q
?p
p??q
?(p ? ?q)
?p
- If all SCCs are irrelevant, fitness level 0 is
assigned. - A similar mechanism is used for excluding unfair
runs.
53The Mutual Exclusion Problem
- Originally described by Dijkstra 65.
- Many variants and solutions exist.
- Modeled using the following program parts
- Non Critical Section
- Pre Protocol
- Critical Section
- Post Protocol
- We wish to automatically generate correct code
for the pre and post protocol parts.
54Spec. Properties
- The specification includes the following LTL
properties - The properties are converted into Streett
automata.
55Runs Configuration
- 3 different sets of runs
- The following parameters were used
- Population size 150
- Max number of iterations 2000
- µ 5
- ? 150
56An Example of a Run (1st variant)
Score 0.0
- Randomly created.
- Does not satisfy mutual exclusion property.
- Higher level properties are set to 0.
57An Example of a Run (1st variant)
Score 66.77
- Randomly created.
- While loop guarantees mutual exclusion.
- Only process 0 can enter the critical section.
58An Example of a Run (1st variant)
Score 75.77
- Last line changed by a mutation.
- The naïve mutual exclusion algorithm.
- Processes uses a turn flag, but depend on each
other. - A local maximum point in the search space.
59An Example of a Run (1st variant)
Score 70.17
- An important building block common to many
algorithms. - Each process set its own flag and wait for
others flag, but - The flag is not turned off correctly.
- Might eventually deadlock, thus, properties 4 and
5 get fitness level of 1.
60An Example of a Run (1st variant)
Score 76.10
- Last line is replaced by a mutation.
- Now, process 0 correctly turns its flag off.
- Property 5 is fully satisfied
61An Example of a Run (1st variant)
Score 92.77
- A single node is changed by a mutation.
- Both processes turn off their flag.
- Properties 4 and 5 are fully satisfied.
- Still, deadlock occurs if both processes enter
simultaneously.
62An Example of a Run (1st variant)
Score 93.20
- A mutation added a line to the empty while loop.
- This turns the deadlock into a live lock, and
causes a slight fitness improvement.
63An Example of a Run (1st variant)
Score 94.37
- Another line is added to the while loop.
- No more dead or live locks, but property can
still be violated by some infinite scheduler
choices.
64An Example of a Run (1st variant)
Score 96.50
- Created by some random mutations.
- All properties are satisfied.
- Still, not the shortest solution.
65An Example of a Run (1st variant)
Score 97.10
- Created by more mutations.
- The shortest found algorithm.
- Identical to the known One bit protocol Burns
Lynch 93.
66Fitness Graph
- Best fitness is alternately improved by
- Major leaps due to changes in fitness levels.
- Small improvements caused by parsimony pressure.
67More experiments
- Successfully found Dekker's algorithm. Dijkstra
65. - Successfully found Petersons algorithm.
Peterson Fisher 77. - Found a shorter algorithm than Dekker's.
68Performance
- First variant was easiest to solve.
- Other variants are much harder to find.
- Still, much better than brute-force methods.
- Less significant on small programs (Peterson).
- Crucial on large programs (Dekker).
69Conclusions and Future Work
- GP and model checking were successfully combined.
To achieve that, a specific tool was developed. - Found solutions are guaranteed to completely
satisfy the specification. - Scoring system can be further refined.
- More information can be extracted from the model
checking results, for assisting the evolutionary
process. - A similar method can be used for correcting a
given program, or at least showing where the
error is. - Next step use discriminative model checking
properties to refine grading and to find where in
program to make changes.