Applying Model Checking To Large Programs - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Applying Model Checking To Large Programs

Description:

Extract the code to be checked. Provide an environment model ... Incremental states in effect extract TCP relevant state. A larger state space ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 42
Provided by: Micro244
Category:

less

Transcript and Presenter's Notes

Title: Applying Model Checking To Large Programs


1
Applying Model Checking To Large Programs
  • Madan Musuvathi
  • Microsoft Research

2
The Model Checking Problem
  • A system model S
  • A property P
  • Check if S satisfies P

3
The Model Checking Problem
  • A system model S
  • An environment E
  • A property P
  • Check if S in E satisfies P

4
In Previous Lectures
  • A system model S
  • An environment E
  • A property P
  • Check if S in E satisfies P

5
When Applied to Large Systems
  • A system model S
  • An environment E
  • A property P
  • Check if S in E satisfies P

6
Model Checking An Engineer's View
  • Given a system and its environment
  • Expose nondeterminism
  • Environment nondeterminism inputs, timers,
    events
  • Internal nondeterminism arising from
    abstractions
  • Systematically explore all states of the system
  • Do this exploration intelligently
  • If lucky, you find a bug
  • If luckier, you verify the system

7
Explicit State Model Checking
  • Explicitly generate the individual states
  • Systematically explore the state space
  • State space Graph that captures all behaviors
  • Model checking Graph search
  • Generate the state space graph "on-the-fly"
  • State space is typically much larger than the
    reachable set of states

8
Guarded Transition System
  • System State Transitions
  • Readily models event-driven systems

9
The Algorithm
Hashtable states_seen Queue pending insert
init_state into pending while(pending is not
empty) current pending.remove() for each
enabled transition T restore_state(current)
execute transition T successor
save_state() if(successor in
states_seen) continue check successor for
correctness insert successor into pending
queue
10
How to write a model checker in an hour
  • Specify the system and the environment as a class
  • State member fields
  • Transitions member functions
  • Each member function has a Boolean guard function
  • Capturing state provide serialization functions
  • GetState() returns the state in a buffer
  • SetState() copies the state from a buffer
  • Implement the search algorithm

11
State Explosion Problem
  • Simple descriptions result in (very) large state
    spaces
  • State space reduction techniques
  • Identify behaviorally equivalent states
  • Process symmetry reduction
  • Heap symmetry reduction
  • Identify behaviorally equivalent transition
    orderings
  • Partial-order reduction

12
How to write a model checker in a week
  • Specify the system and the environment as a class
  • State member fields
  • Transitions member functions
  • Each member function has a Boolean guard function
  • Capturing state provide serialization functions
  • GetState() returns the state in a buffer
  • SetState() copies the state from a buffer
  • Implement the search algorithm
  • Implement some state space reduction techniques

13
Practical Challenges
  • Reduce manual intervention
  • How to specify the system?
  • What is the environment?
  • Guarantees
  • Soundness
  • If the tool terminates without finding a bug (of
    a certain type), then the program has no bugs
  • Preciseness
  • If the tool reports an error, then it is indeed a
    real error
  • Orthogonal to the difficulty of model checking
    algorithms

14
Specifying the Model
  • Conventional model checkers require an
    intermediate description (or "model")
  • Describes the system at a high level
  • Throws away implementation details
  • Good for checking designs, rather than
    implementations
  • Success stories hardware, cache-coherence
    protocols
  • Problems
  • Specifying a model is HARD for large systems
  • As the system evolves model has to be updated
  • What you check is not what you run!
  • Manual errors can miss or introduce errors

15
Automatically Extract the Model
  • Statically analyze the code to generate a model
  • Models usually mimic the implementation

Murphi model
FLASH
Rule "PI Local Get (Put)" 1Cache.State
Invalid ! Cache.Wait 2 ! DH.Pending
3 ! DH.Dirty gt Begin 4 Assert
!DH.Local 5 DH.Local true 6 CC_Put(Home,
Memory) EndRule
void PILocalGet(void) // ... Boilerplate
setup 2 if (!hl.Pending) 3 if
(!hl.Dirty) 4! // ASSERT(hl.Local)
... 6 PI_SEND(F_DATA, F_FREE, F_SWAP,
F_NOWAIT, F_DEC, 1) 5 hl.Local 1
16
Automatic Extraction
  • FeaVer C program -gt Promela (SPIN) model
  • User provided patterns to extract features
  • Bandera Java -gt Bandera model
  • Sophisticated property-driven slicing techniques
  • Can throw away unrelated parts, if applicable
  • Problems
  • Not all primitives are available in the modeling
    language
  • Pointers, dynamic object creation, dynamic
    threads, exceptions
  • A precise-enough slice could be as large as the
    program iteself

17
Code as the model
  • Directly execute the code
  • Pioneered by Verisoft
  • State-less model checking
  • Explicit model checkers
  • Java Path Finder (Java)
  • CMC (C/C)
  • State space can be infinite (or very large)
  • Try exploring as much behaviors as possible
  • Focus on precision

18
Model Checking Testing ?
  • Almost!
  • Systematic exploration of nondeterminism
  • Testing random walks in the state space
  • Model checking systematic graph search
  • Forces the user to expose more nondeterminism
  • A call to malloc() can fail, a packet can get
    lost
  • State space reduction techniques identify
    redundant tests

19
Specifying the System
  • Similar to building a unit-test framework
  • Extract the code to be checked
  • Provide an environment model
  • Includes entities that the implementation
    interacts with
  • Calls to libraries, network, timers manual input
  • Code environment is a closed system
  • An executable that you can run
  • Provide correctness properties

20
Identify the Transitions
  • Transition is a code execution between two
    non-deterministic choices
  • Atomic execution of a thread between two schedule
    points
  • Execution of an event handler
  • Model checker should get control at these choice
    points

21
Capturing the State
  • State of the program is captured by global
    variables, stack, heap, and registers
  • Need a way to capture the state of the
    environment model

22
Backtracking
  • Physically reset the state to an older version
  • Java Pathfinder, CMC
  • Go to the initial state and reexecute
  • Fork a separate process at initial state
    (Verisoft)
  • Some systems have a natural 'reset'
  • Unload and reload a driver
  • Reformat the disk

23
Experience with CMC
  • Three AODV implementations
  • 35 implementation bugs, 1 specification bug
  • Linux TCP
  • 4 bugs, 90 protocol coverage
  • Three Linux filesystems
  • 32 bugs in total
  • 10 serious ones (such as deleting "/")

24
Environment Problem
  • Where to separate the system and the environment
  • Need a faithful abstraction of the environment
  • Enough nondeterminism to trigger interesting
    behaviors in the system
  • Not too much nondeterminism to trigger false
    behaviors
  • An Example
  • System Linux TCP implementation
  • Environment Kernel, network (driver hardware),

25
Extracting Linux TCP from the Kernel
  • Conventional wisdom
  • Extract TCP along a minimal, narrow interface
  • Minimizes the model state
  • Provide a kernel library
  • Implements stubs for all kernel functions TCP
    requires
  • Never worked!
  • The narrowest interfaces still had 150 interface
    fns
  • These interfaces are not documented
  • Errors in stubs can cause subtle but false errors
  • Model checkers are good in finding subtle errors!
  • Errors in stubs can miss errors

26
Extracting Linux TCP from the Kernel
  • Solution (hard learned)
  • Extract along well-defined interfaces
  • Minimize errors in stub implementations
  • These interfaces change infrequently
  • Do so even if it stresses model checking
  • Well defined interfaces around TCP
  • The system call interface (kernel user
    processes)
  • The hardware abstraction layer (kernel
    hardware)
  • Extracting at these two interfaces
  • Forces CMC to run the entire Linux kernel

27
Running the Entire Kernel in CMC
  • Linux kernel has to run in user space
  • Has been done before (UML User Mode Linux)
  • CMC needs to handle much larger states
  • Approximately 300 kilobytes
  • Incremental states in effect extract TCP relevant
    state
  • A larger state space
  • Restrict the environment to trigger TCP events
    only
  • Compensated by the ease of environment model
    generation
  • Approach not possible when model checking with an
    intermediate description

28
Specifying Properties
  • Assertion in the code
  • Trigger automatically as we are running the code
  • Heap related errors
  • Build your own memory allocator
  • Check for leaks, double-free
  • Purify-style dynamic techniques
  • Reading uninitialized variables, access after
    free
  • Checking for resource leaks
  • Check if you reached the initial state if you
    should have
  • Identify idempotent sequences
  • CreateFile(A) followed by DeleteFile(A)

29
Some properties are hard to specify
  • Real systems have ambigous / incomplete
    specifications
  • TCP congestion control should does not use up
    "too much " network bandwidth
  • A file system should not lose files
  • Difficult to check in the presence of crashes
  • Identify properties that are easy to check
  • A file system is in a bad state if its own fsck()
    cannot recover from it

30
State Space Reduction Techniques
  • Downscaling
  • Hash Compaction
  • Identifying State Symmetries

31
Downscaling
  • Check smaller versions of the model
  • Example
  • Run with only 3-4 nodes in the network
  • Send just 3 data packets
  • Find bugs involving complex interactions in
    smaller instances
  • Potentially miss bugs present only in larger
    instances

32
Hash Compaction
  • Compact states in the hash table Stern, 1995
  • Compute a signature for each state
  • Only store the signature in the hashtable
  • Signature is computed incrementally
  • Partial signature cached at each page
  • Might miss errors due to collisions
  • Orders of magnitude memory savings
  • Compact 100 kilobyte state to 4-8 bytes
  • Possible to search 10 million states

33
State Symmetries
  • Explore one out of a (large) set of equivalent
    states
  • Canonicalize states before hashing

Canonical State
Hash Signature
Current State
Hash table
Successor States
  • State transformations can be approximate
  • But, use the original state for further state
    exploration
  • Thus, approximations do not generate false errors!

34
Heap Canonicalization
  • Heap objects can be allocated in different order
  • Depends on the order events happen
  • Relocate heap objects to a unique representation

state1
state2
Canonical Representation
  • Essentially
  • Find a canonical representation for each heap
    graph
  • By abstracting the concrete values of pointers

35
Heap Canonicalization Algorithm
  • Basic algorithm Iosif 01
  • Do a deterministic graph traversal of the heap
    (bfs / dfs)
  • Relocate objects in the order visited
  • CMC extensions
  • How to do it incrementally?
  • Should not traverse the entire heap in every
    transition
  • How to do it for C objects?
  • Type information is not available at run time

36
Iosifs Canonicalization Algorithm
  • Do a deterministic graph traversal of the heap
    (bfs / dfs)
  • Relocate objects to a canonical location
  • Determined by the dfs (or bfs) number of the
    object
  • Hash the resulting heap

r
0
2
4
6
r
2
6
s
s
Canonical Heap
Heap
37
Two Linked List Example
Heap
Canonical Heap
0
2
4
6
r
r
2
6
s
s
Partial hash values
Transition Insert b
0
2
4
6
8
r
r
s
s
38
A Much Larger Example Linux Kernel
Heap
Canonical Heap
p
Network
File- system
Core OS
Core OS
Network
Filesystem
p
An object insertion here
Affects the canonical location of objects here
39
Incremental Heap Canonicalization
  • Access Chain
  • A path from the root to an object in the heap
  • Bfs Access Chain
  • Shortest of all access paths
  • Break ties lexicographically
  • Note Bfs access chain is a shortest path from a
    global variable
  • Canonical location of an object is a function of
    its bfs access chain

r
g
f
f
a
b
h
g
c
  • Access chain of c
  • ltr,f,ggt
  • ltr,g,hgt
  • ltr,f,f,hgt
  • Bfs access chain of c
  • ltr,f,ggt

40
Revisiting Two Linked Lists Example
ltrgt 0 ltsgt 4
ltr,ngt 2 lts,ngt 6
ltr,n,ngt 8
Relocation Function Table
r,s are root vars n is the next field
0
2
4
6
r
r
2
6
s
s
0
2
4
6
8
r
r
s
s
Heap
Canonical Heap
41
And on the much larger example
Heap
Canonical Heap
p
Network
File- system
Core OS
Filesystem
Core OS
p
Core OS
Filesystem
Changes here do not affect the canonical
location of p
  • Canonical location of p does not change
  • Unless its Bfs Access Chain changes
  • For small changes to the graph
  • Shortest path of most objects remains the same
Write a Comment
User Comments (0)
About PowerShow.com