Title: Runtime Verification and Software Fault Protection with Eagle
1Runtime Verification and Software Fault
Protectionwith Eagle
- Allen Goldberg
- Klaus Havelund
- Kestrel Technology
- Palo Alto, USA
2Overview
- Run-time Monitoring
- About EAGLE
- Software Fault Protection
- Summary
3Motivation for Runtime Verification
- Model checking and Theorem Proving are rigorous,
but - Not fully automated
- model creation is often manual
- Abstraction is often manual in the case of model
checking - Lemma discovery is often manual in the case of
theorem proving - Therefore not very scalable
- Applied Testing is scalable and widely used, but
is ad hoc - Lack of formal coverage
- Lack of formal conformance checking
- Combine Formal Methods and Testing
4Run-time Verification
- Combine temporal logic specification and testing
- Specify properties in some temporal logic.
- Instrument program to generate events.
- Monitor properties against a trace of events
emitted by the running program.
5A Model-Based Verification Architecture
Model
test property generation
test inputs
behavioral properties
Implemented system under test
instrumentation
Rqmts
event stream
reports
dispatch
Deadlock
Dataraces
Observer
instrumentation
6Static and Dynamic Analysis
Specification
Test case generation
Runtime verification
Program
Input
Output
Program instrumentation
7So many logics
- What is the most basic, yet, general
specification language suitable for monitoring?
EAGLE is our answer.
8Logics
- Assertions (Java 1.4)
- Pre-post conditions, invariants (Eiffel, JML)
- Temporal logic (Mac, Temporal Rover)
- Time lines (Smith, Dillon)
- Live Sequence Charts (Harel)
- Automata (Buchi, Alternating, Timed)
- Regular expressions (Rosu, Lee)
- Process algebra (Jass)
- Mathematical modeling/programming languages
- (Alloy, Prolog, Maude, Haskell, Specware)
9Eagles Core Concepts
- Finite trace monitoring logic
- Prop. logic Three temporal connectives
- Next _at_F
- Previous F
- Concatenation F1F2
- Recursive parameterized rules
- Always(Term t) t /\ _at_Always(t) .
Each state is a environment mapping variables to
values
10Introducing EAGLE
- Can encode
- future time temporal logic
- past-time logic
- extended regular expressions
- ยต-calculus
- real-time, data-binding, statistics.
- Monitoring formulas are evaluated online over a
given input trace, on a state by state basis - identify failure as early as reasonably possible
- Evaluation proceeds by checking facts about the
past and generating obligations about the future.
11Basic Propositional Example
- // LTL rules
- max Alws(Term t) t /\ _at_ Alws(t) .
- min Even(Term t) t \/ _at_ Even(t) .
- min Prev(Term t) t \/ Prev(t) .
- // Monitor
- Mon M Alws(ygt0 -gt (Prev(xgt0) /\ Even(zgt0))) .
ygt0
xgt0
zgt0
12Data Bindings
- min R(int k) Prev(xk) /\ Even(z k) .
- // Monitor
- mon M Alws(ygt0 -gt R(y)) .
- mon Wrong Alws(ygt0 -gt (Prev(xy) /\
Even(zy))) .
ygt0
zk
xk
k y
13Time is Just Data
min Before(long t, Term F) clock lt t
/\ (F \/ _at_ Before(t,F)) . min Within(long
t, Term F) Before(tclock, F) .
clock is a variable defined in the state
mon M Alws(xgt0 -gt Within(4,ygt0)) .
lt 4
lt 4
14Automata
open
access
init
S1
S2
close
max S1() init -gt _at_ S1() /\ open -gt _at_
S2() /\ (access \/ close) . min S2()
access -gt _at_ S2() /\ close -gt _at_ S1()
/\ (init \/ open) . mon M S1() .
15Timed Automata
open
access
x 0
init
S1
S2
close
x lt 10
max S1() init -gt _at_ S1() /\ open -gt _at_
S2(clock) /\ (access \/ close) . min
S2(long t) clock lt t10 /\ access -gt _at_
S2() /\ close -gt _at_ S1() /\
(init \/ open) .
16Combining Automatae and Temporal Logic
open /\ Prev(init)
access
init
S1
S2
close
max S1() init -gt _at_ S1() /\ open -gt
(Prev(init) /\ _at_ S2()) /\ (access \/
close) . min S2() access -gt _at_ S2() /\
close -gt _at_ S1() /\ (init \/ open) . mon
M S1() .
17EAGLE by example statistical logics
- Monitor that state property F holds with at least
probability p - max Empty() false .
- min Last() _at_Empty() .
- min A(Term F, float p, int f, int t)
- ( Last() /\ (( F /\ (1 f/t) gt p) \/
- (F /\ (1 (f1)/t) gt p)) )
- \/
- (Last() /\ (( F ? _at_A(F, p, f, t1)) /\
- (F ? _at_A(F, p, f1, t1))) .
- mon AtLeast (Term F, float p) A(F, p, 0, 1) .
18Regular Expressions
Property File accesses are always enclosed by
open and close operations.
M (idleopenaccessclose)
max S(Term t) t /\ _at_ S(t) . // Star min P(Term
t) t /\ _at_ P(t) . // Plus
mon M S(S(idle())open()S(access())close()
) .
19Grammars
Property Locks are acquired and released nested.
lock release lock lock release
release
- // Match rule
- max Match (Term l, Term r)
- Empty() \/ (lMatch(l,r)rMatch(l,r)) .
- // Monitor
- mon M Match(lock(),release()) .
20EAGLE by example beyond regular
- Monitor a sequence of login and logout events
at no point should there be more logouts than
logins and they should match at the end of the
trace. - min Match (Term F1, Term F2)
- Empty() \/
- F1 Match(F1, F2) F2 Match(F1, F2)
- mon M1 Match(login, logout)
21Syntax
22Semantics
23Some EAGLE facts
- EAGLE-LTL (past and future). Monitoring formula
of size m has space complexity bounded by m2 2m
log m - EAGLE with data binding has worst case dependent
on length of input trace - EAGLE without data is at least Context Free
24EAGLE interface
User defines these classes
class MyState extends EagleState int x,y
update(Event e) x e.x y e.y
class Observer Monitors mons State state
eventHandler(Event e) state.update(e)
mons.apply(state)
e1 e2 e3
class Monitors Formula M1, M2
apply(State s) M1.apply(s)
M2.apply(s)
class Event int x,y
25Eagle Implementation
- Built an initial prototype implementation and
used it for a number of applications - test case monitoring scenario
- monitoring the behavior of a planning system
- High performance algorithm
- pay for what you use
- e.g. state machine
- symbolic manipulation -gt automata-like solutions
26Online Algorithm
- Start with the initial formula M to monitor
- see a new state
- Transform M to M so that M is valid on the
whole trace iff M is valid on the
remaining trace
27Basic Algorithm Future Time LTL
- max Alws(Term t) t /\ _at_ Alws(t) .
- min Even(Term t) t \/ _at_ Even(t) .
- mon M Alws(ygt0 -gt Even(zgt0))) .
- M0 Alws(ygt0 -gt Even(zgt0)))
- M1 Alws(ygt0 -gt Even(zgt0)))
- M2 Alws(ygt0 -gt Even(zgt0))) /\ Even(zgt0)
- M3 Alws(ygt0 -gt Even(zgt0))) /\ Even(zgt0)
- M4 Alws(ygt0 -gt Even(zgt0)))
- M5 Alws(ygt0 -gt Even(zgt0))) /\ Even(zgt0)
y-1 z0
y2 z0
y-1 z0
y-1 z2
y2 z0
28Toward an Algorithm
- The previous example suggests how monitoring may
be reduced to state machine execution - For propositional future time LTL by employing
normal forms and simplification, the set of
derivable monitors is finite. This is the idea
behind generating a Buchi automata for model
checking propositional future time LTL. - However Eagle has,
- Past time operators
- Data parameters
29Past Time
- max Alws(Term t) t /\ _at_ Alws(t) .
- min Prev(Term t) t \/ Prev(t) .
- mon M Alws(ygt0 -gt (Prev(xgt0))) .
- Cache C Prev(xgt0)
- C0 false M0 Alws(ygt0 -gt (Prev(xgt0)))
- c1 false M1 Alws(ygt0 -gt (Prev(xgt0)))
- c2 true M2 false
y-1 x0
y2 x2
y-1 x0
y-1 x2
y2 x0
30Algorithm with Past Time
- Static Analysis
- determine what formulas are needed in cache
- Evaluate formulas in cache in state 0
- for each state
- Evaluate monitors, referring to cache to look up
values of formulas headed by - update the cache
- End of trace
- evaluate monitor for beyond the last state
31Improvements
- Conjunctive normal form
- Subsumption (a \/ b) /\ a gt a
- Term caching
- assign a unique id to each normalized term
- Define various caches of the form cacheTermId
-gt TermId - subsumption cache
- rule application cache
- Normalization folded into evaluation
- Ordering evaluation of conjuncts and disjuncts
- Terms which do not (hereditarily) contain
instances of the _at_ operator will fully evaluate
to true/false. - Evaluate such terms first
32Improvements Automata Construction
- On the fly automata construction
- for terms that are conjunctions, disjunctions or
rule applications dynamically create a state
machine (decision tree) for evaluation. - map termId -gt Automata
?
true
wy
true
false
zy
true
false
result is termID2
xgt0
false
result is termID1
?
33Improvements Reflection Removal
- Removal of use of reflection for Java expression
evaluation - Alws(ygt0 -gt (Prev(xgt0))
- static boolean greater(int a, int b)
- Defined in user-defined State class
- Generate during static analysis an interface
class that dispatches on term id of methodCall
terms to directly call the user-supplied method - all arguments must be available, otherwise it
must be symbolically evaluated - must treat substitution instances
34On the way to
- An Eagle compiler
- static analysis of monitor specification to
generate an alternating automata with no term
manipulation and no use of reflection
35Instrument Specification
- An instrument specification is a collection of
rules lhs ? rhs - LHS conjunction of syntactic conditions on a
program point (local conditions only) - RHS set of actions that log reporting events
- Each bytecode statement is examined to see if
it satisfies RHS. Actions of all matching LHS are
collected and inserted into the bytecode at that
point.
36Predicates
Predicate Name Parameters Returns true iff current statement
InMethod String methodNameSpec is within method with name that matches methodNameSpec.
AtMethodStart String methodNameSpec is the first statement within method with name that matches method-NameSpec.
AtFieldAccess String fieldNameSpec Boolean onlyUpdates accesses a field whos name matches fieldNameSpec. If onlyUpdates is true, only write accesses are matched.
AtSyncStart enters a synchronized block, taking a lock.
AtStatementType String stmtType Is a statement of type stmtType, which ranges over 32 different statement types, such as assign,break, return, try, throw, while, etc.
InStatementRange int lowerBound int upperBound is within the range of line numbers lowerBound and upperBound
37Reportable Events
Action Name Parameters Inserts code that reports
ReportMethedStart String methodNameSpec name of method entered, whos name matches ethodNameSpec.
ReportField String fieldNameSpe boolean onlyUpdates name of field that is accessed, whos name matches fieldNameSpec. If onlyUpdates is true, only write accesses are matched.
ReportLocal String varNameSpec boolean onlyUpdates name of local variable that is accessed, whos name matches var-NameSpec. If onlyUpdates is true,only write accesses are matched.
ReportSyncStart String classNameSpec identity and class (name) of object that is locked, where the class name matches classNameSpec.
38Reportable Events
Action Name Parameters Inserts code that reports
ReportTimeStamp the current time.
ReportProgramPoint the current line number.
ReportExpression String expressionName String imports String expressionBody byte expressionType String parameters the value of expressionBody, and an identifer expressionName. Parameters and expressionType indicate the expressions free parameters and the result type. Expression is evaluated locally
39Using AOP for instrumentation
F true false F F /\ F f \/ F F ?
F _at_ F F FF E E gt F E
F E NmNm.Id(Nm,)returns Nm Id
Java identifier Nm Id?
40Example Using AOP
observer BufferObs max Always(Term t ) t /\
_at_ Always(t) . min Eventually(Term t) t \/ _at_
Eventually(t) . min Previously(term t) t \/
Previously(t) . var Buffer b var Object
o monitor InOut Always(put(b?,o?) gt
Eventually(get(b) returns o)) . monitor OutIn
Always(get(b?) returns o? gt
Previously(put(b,o))) .
41Software Fault Protection Motivation
- To obtain higher levels of assurance and quality
for safety and mission critical software.
Marginal cost to remove next error
3 errors/1KLOC
10 errors/1KLOC
Residual error rate per line of code
42Approach
- Instead of climbing the error curve, build
mechanisms into the code to protect against
inevitable residual software errors. - Software Fault Protection.
43Three Levels of Fault Protection
- Caution and Warning Systems
- Warning generated for off nominal measurements of
system state and resources - Autonomic response
- Circuit breaker
- MER low battery warning stopped repeated reboot
cycle and transition into safe mode. - Model Based Diagnosis fault detection isolation
and repair - HAL reports a module will go critical in 7 days
requiring EVA
44Differences between System Engineering and
Software
- Software does not wear out, but suffers from
design and coding errors. - Software has not been traditionally designed with
using fault containment concepts - E.g. on MER hardware exceptions generated by out
of memory faults were not explicitly dealt with. - A natural concept of component based on
physicality exists for hardware. These components
have known failure modes - E.g. a valve may be stuck on or a pipe may leak.
45Component
- Component is
- Organizing notion for system composition/decomposi
tion - Goal is to identify a failed component and
possibly reconfigure components maximizing
functional capability - Spectrum of component choices
- Low granularity a statement is a component.
- Medium granularity a procedure/method is a
component. - High granularity a component in a modeling
framework.
46Model
Software Fault protection is like VV in one
important respect
Consistency check between code and model
- It must be cost effective to formulate model
- Model must be useful in identifying and isolating
faults. - Model has a delicate relationship to code
- structural limited behavioral code synthesis,
- monitor behavioral properties that are beyond
synthesis.
47Approach
- High granularity components
- difficult to model, diagnose and repair at lower
levels of granularity - Addresses higher-payoff, system of system and
system integration issues. - Model
- structural component/connector models properties
- interface behavior
- patterns of event interactions over connectors
- real time response
- data validity (pre- post- conditions)
- resource usage
- memory, object creation, disk, communication
devices, quality of service, deadlock and other
concurrency problems
48Software Fault Protection Fault Detection,
Isolation and Repair (FDIR)
- Detection instrument and monitor system and
identify an error state - Isolation identify the faulty component
- Repair take corrective action
49Detection
- Detection property violation
- Use Eagle to define and monitor propertie
50Isolation (Diagnosis)
- Process of moving from symptom to identification
or isolation of faulty component. - Our simplistic approach
- Associate a failed component with each property.
51Repair
- Do no harm
- generate a problem report
- Micro-reboot reset or re-initialize a component
to repair an inconsistent data state. - Reconfiguration with reduced functionality
- Replace module with a less capable backup
- 1.5 version programming
52Example System
Consumer
Producer 2
- Every request must be responded to the consumer
within 50 ms. - Consumer must stay alive. It does not stop
processing requests. - Queue must stay alive. It does not stop receiving
and forwarding transactions - Any transaction processed by the consumer did
indeed originate from one of the producers. - Queue size remains bounded at size at most 50.
- Requests have priority that the Queue must respect
53The State
class State extends EagleState static String
con static int ident static int
clock static void update(String event) ...
static boolean B1()return
con.equals("B1") static boolean B2()return
con.equals("B2") static boolean C() return
con.equals("C") static boolean D() return
con.equals("D")
54Linear Temmporal Logic
max A(Term t) t /\ _at_ A(t) min E(Term t) t
\/ _at_ A(t) min P(Term t) t \/ P(t) min
U(Term t1, Term t2) t2 \/ (t1 /\
_at_U(t1,t2)) max W(Term t1, Term t2)
A(t1) \/ U(t1,t2)
55Some Basic Definitions
min B12() B1() \/ B2() min B2(int id) B2()
/\ ident id min C(int id) C() /\ ident id
56Extensions of E and P withData Constraints
min Et(Term t, int time) E(t /\ clock
lt time) min Eti(Term t, int time, int id)
E(t /\ clock lt time /\ ident id) min
Pi(Term t, int id) P(t /\ ident id)
57Requirement 1
- requiredResponse
- Every message from a producer (communicated on
- either B1 or B2) must be responded to and
acknowledged (on D) by - the consumer within 50 seconds.
mon requiredResponse A(B12() ? Eti(D(),
clock 50, ident))
58Requirement 2
stayAliveQueue Whenever a message is sent to the
queue from a producer (on either B1 or B2), a
message (not necessarily the same) must be
consumed by the consumer (on C) within 32
seconds.
mon stayAliveQueue A(B12() ? Et(C(), clock
32))
59Requirement 3
limitedSize There are never more that 40
messages in the queue.
max CountSize(int size) B12() ? (size lt 40
_at_ CountSize(size 1)) /\ C() ? _at_
CountSize(size - 1) /\ D() ? _at_ CountSize(size)
mon limitedSize CountSize(0)
60Requirement 4
C D Alternation The consumer should alternate
between consuming messages (on C) and
acknowledging messages (on D).
max S1() (C() -gt _at_S2()) /\ (C() -gt
_at_S1()) /\ D() Min S2() (D() -gt
_at_S1()) /\ (D() -gt _at_S2()) /\
C() mon C_D_Alternation S1()
61Requirement 5
stayAliveConsumer Every message consumed (on C)
by the consumer is processed and acknowledged
(on D) within 30 seconds.
mon stayAliveConsumer A(C() ? Et(D(), clock
30))
62Requirement 6
- noJunkConsumed
- Every message consumed by the coonsumer (on C)
has previously - Been produced (on Bor B).
mon noJunkConsumed A(C() ? Pi(B12(), ident))
63Requirement 7
- noJunkAcknowledged
- Every message processed and acknowledged by the
consumer (on D) has - previously been consumed by the consumer (on C).
mon noJunkAcknowledged A(D() -gt Pi(C(),
ident))
64Requirement 8
- orderPreserved
- The queue behaves as a FIFO queue wrt. messages
produced by - producer 1. That is, the consumer consumes (on C)
messages produced - on B1 in the order in which they were produced.
max R0() (B1() -gt R1(ident,0)) /\ _at_R0() max
R1(int id1,int id2) C(id1) \/ ((B1() -gt
R2(id1,ident)) /\ (B1() -gt
_at_R1(id1,id2))) max R2(int id1,int id2)
_at_R2(id1,id2) max R2(int id1,int id2)
W(C(id2),C(id1)) mon orderPreserved R0()
65Requirement 9
- B1HasPriority
- Messages produced by producer 1 (on B1) have
priority over messages - produced by producer 2 (on B2). That is, as long
as there are pending - B1 messages in the queue, no B2 message will be
consumed (on C) by - the consumer.
min CForB1BeforeCForB2(int id) U((C() /\
identFromB2(ident)), C(id)) min identFromB2(int
id) P(B2(id)) mon B1HasPriority A(B1()
-gt CForB1BeforeCForB2(ident))
66Example Implementation
- Each component is a process
- Components communicate using Java Messaging
Service - Instrumentation by listening to message traffic
- FDIR component can start, reset, terminate, or
replace components
67Summary
- EAGLE is a succinct but highly expressive finite
trace monitoring logic. Can elegantly encode any
monitoring logic we have investigated. - EAGLE can be efficiently implemented, but users
must remain aware of expensive features. - EAGLE demonstrated by integration within a formal
test environment, showing the benefit of novel
combinations of formal methods and test. - EAGLE has a natural application in fault
protection, but this is a very preliminary idea
yet to be validated as useful.