Title: Generating Optimal Monitors for Extended Regular Expressions
1Generating Optimal Monitors for Extended Regular
Expressions
- Grigore Rosu
- University of Illinois at Urbana-Champaign, USA
Joint work with Koushik Sen
2Why (Extended) Regular Expressions?
- Ordinary programmers and software engineers
understand and use regular expressions - Perl, Python, etc.
- Safety policies are often regular patterns on
sequences of states/events - (idle open (read write) close)
- Complementation needed to say what should not
happen (any start1 ( end1) start2 any)
3Extended Regular Expressions (ERE)
- Regular expressions with complement
- Language of an ERE
- Intersection R n R (R R)
R F e A ? ? R R R R R R
L(F) F L(R R) L(R) ? L(R)
L(e) e L(R R) ww w?
L(R), w? L(R) L(A) A L(R)
(L(R)) L(R) ? \ L(R)
4ERE Monitoring Problem
- Given w? ? and R, is w ? L(R)?
- We want to do it incrementally!
- From now on, n is the length of the word/trace w
and m is the size of the ERE R - n is typically much much larger than m
5What is known (I)
- If R does not contain negations, then
- Transform R into an NFA of size O(m) (Aho90)
- Solution in time O(nm) and space O(m)
- Improved by Mayers92 (JACM) time/space O(nm /
log n) - Transform R into a DFA of size O(2m) (Aho90)
- Solution in time O(nm) and space O(2m)
- Note transitions in a DFA take logarithmic time
- Negations and their nesting make the membership
problem highly non-trivial
6Problems with Negation (I)
- How to complement an NFA?
- Just complementing the set of final states is
wrong!
A
A
L(A) ab
L(A) ab,a, e
7Problems with Negation (II)
- DFAs can be complemented safely by just
complementing the set of final states, but - NFA -gt DFA implies exponential state blowup!
- For k nested negations, 2(2((2m))) states
- This makes the monitoring problem non-elementary
more complex in the context of (nested) negations
8Challenges and Talk Overview
- What is the lower space/time bound of the ERE
monitoring problem (to process one event)? - ?(2cm½ ) for space (RTA03)
- What is a reasonable upper bound for the ERE
monitoring problem (to process one event)? - Rewriting algorithm in O(22m2) space/time
(RTA03) - Not synchronous
- How to generate optimal monitors for ERE?
- Optimal monitor generation by coinduction
9Idea of an Event-Consuming Algorithm
- Consume each event as it arrives, generating a
new ERE monitoring requirement - Use the notion of derivative
- Ra is the ERE that should hold after seeing
event a, in order for R to hold now - Algorithm A stores an ERE R, and when an event a
arrives it replaces R by Ra at the end of
trace A checks whether e?R - How can we generate Ra efficiently?
- How can we store Ra compactly?
10ERE Syntax
- Sorts Ere and Event subsort Event lt Ere
- Operations
- F -gt Ere
- e -gt Ere
- __ Ere Ere -gt Ereassoc comm id empty
- _ _ Ere Ere -gt Ereassoc id nil
- _ Ere -gt Ere
- _ Ere -gt Ere
11Derivatives
- Related work
- Antimirov and Mosses
- Operations
- __ Ere Event -gt Ere
- _?__ Bool Ere Ere -gt Ere
- e?_ Ere -gt Bool
- Equations
- (R1 R2)a R1a R2a
- (R1 R2)a R1a R2 (e?R) ? R2a F
- (R)a Ra R
- (R)a (Ra)
- ea F
- Fa F
- ba (b a) ? e F
Obvious!
12Three Important Simplifying Rules
- Without any other rules, Ra1a2an can grow
to unbounded size - Simplifying rules
- F R F
- R R R
- R1 R R2 R (R1 R2) R
- Let R be the rewriting system defined so far
13Theorems(RTA03 jointly with Viswanathan)
- R is terminating and ground Church-Rosser
modulo AC of __ and A of _ _ - L(nfAC(Ra)) w aw ? L(R) for all EREs R
- a1a2an ? L(R) iff e ? Ra1a2an
- Ra1a2an requires O(22m2) space and
O(n22m2) time, where m R
14Problems
- Previous algorithm is not synchronous!
- Unless we check for emptiness after processing
each event, which is very expensive - How to generate a minimal monitor for ERE
avoiding the highly exponential state explosion? - Solution Circular Coinduction?
- Related work by Rutten no negation
15Hidden LogicBehavioral Specification
- Behavioral specification
- Tuple (V, H, G, S, E), or simply (G, S, E)
- Sorts S V ? H
- V visible sorts (stay for data integers,
reals, chars, etc.) - H hidden sorts (stay for states, objects,
blackboxes, etc.) - Operations G ? S
- S is an S-signature
- G is a subsignature of S of behavioral operations
- E is a set of S-equations
16Contexts and Experiments
- G-context is a G-term with a hidden slot
- G-experiment is a G-context of visible result
visible if G-experiment
operations in G
z h
17Behavioral Equivalence
- Models called hidden S-algebras A, A,
- Behavioral equivalence on A a a
- Identity on visible carriers
- a h a iff A?(a) A?(a) for any G-experiment
?
G
visible
A?(a)
A?(a)
G
G
18Behavioral Satisfaction
- a S-equation, A a hidden
S-algebra - A behaviorally satisfies , written
- iff ?(t) h ?(t) for any map ? X ? A
-
-
A
( X) t h t
A
G
A
19Proving Behavioral Equivalence
- Behavioral satisfaction known to be p2 complete
- No way to automatically prove any truth
- No way to automatically disprove any falsity
- Hidden logics are incomplete
- Coinduction and context induction very strong
- Both require human support
- Circular coinduction is an automatic procedure
- Tuned and tested on hundreds of examples
- Streams, Protocols (ABP), Pattersons mutual
exclusion, etc. - Supported by BOBJ, prototyped in Maude
0
20Circular Coinduction in a Nutshell
Explanation? (4) Context induction Nodes above
form induction hypothesis
Moreover, all the behavioral equalities on the
proof graph are true lemma descovery!
Explanation? (1) All possibilities to
distinguish the two are exhaustively explored
Explanation? (2) Any experiment can be
consumed bottom-up, ending in a visible node
Explanation? (3) Congruent binary relation R
is built but behavioral equiv. is the largest!
- Derive the original proof goal until end up in
circles
Modulo substitutions, special contexts
and equational reasoning
? ?
?
? ?
5 5
?
? ?
9 9
0 0
21Behavioral Specification of EREs
- B (V, H, G, S, E) where
- V contains Event and Bool
- H contains Ere
- S contains F, e, __, _ _, _, _
- E contains all equations defined before
- G contains
- e?_ Ere -gt Bool
- __ Ere Event -gt Ere
Theorem B beh. satisfies R R iff L(R)
L(R)
22(a b) (a b)
Moreover, all the equivalences in the proof graph
below are true!
Theorem Circular Coinduction is a decision
procedure for ERE language equality
(a b) (a b)
(a b) b (a b)
true true
(a b) a b (a b)
(a b) a b (a b)
(a b) b (a b)
true true
true true
23Generating Minimal DFAs for EREs
R
a
b
Rb
Ra
a
b
R
R
a
b
Ra
24Implementation
- BOBJ cannot be used because it does not return
the set of circularities - Implemented a specialized circular coinduction
algorithm in Maude - Web server at http//fsl.cs.uiuc.edu
- A PERL CGI script which calls Maude
- Generates JPEG, PS, and DOT versions of DFA
25Conclusion and Future Work
- Exponential complexity in monitoring is
unavoidable when negation is added to regular
expressions (EREs) - Few rewriting rules provide the best trace
membership algorithm known for EREs - Not synchronous
- Generation of minimal DFAs for EREs by circular
coinduction (CC) - To be part of PathExplorer at NASA Ames?
- What is the complexity of the coinductive
algorithm?