Title: Modelbased Monitoring, Diagnosis and Recovery
1Model-based Monitoring, Diagnosis and Recovery
- James Kurien
- Autonomy and Robotics Area
- NASA Ames Research Center
- Thesis Committee
Pandu Nayak PurpleYogi USRA/RIACS
Tom Dean Brown University
Leslie Kaelbling MIT
Also includes joint work with Dave Smith, NASA ARC
2An Observation on Machine Evolution
1965
1995
1954
1995
- Basic principles are often 50 or 100 years old
- The basic mechanical design often persists for
decades - Yet efficiency, robustness, reliability radically
improve
- One difference on-board sensing, computation
actuation - computer controlled, closed loop electronic fuel
injection - anti-lock brakes, traction control, air bags
3NASA Research Challenges
- Every machine is unique
- Some machines must survive years without repair
- Failure and redundancy complicate control
- Relatively short down time can destroy a mission
- Development operation costs must be contained
- Challenge Easily developed, highly capable
control systems
4Problem Statement
- Given
- A model of a physical system such as a spacecraft
- The internal actions taken and observations
received thus far - Determine
- The most likely internal states of the system
- The commands needed to move to a desirable state
Configuration Goals
Model
Action Selection
State Estimate
Command
Observations
5Typical Domain
- Engineers model the local, qualitative behavior
of system components - Components are things like valves, switches,
tanks, engines - Properties of interest are transmission of flow,
voltage, etc - Goals are produce acceleration, maintain
pointing ability, etc
Spacecraft Propulsion System Model
6Assumptions
- The outcome of actions is non-deterministic
- The world is only partially observable
- Time is discrete
- Finite number of states and actions
- Partially Observable Markov Decision Processes
7Belief State
- A belief state is a probability distribution over
the possible states of the system - B(s) is the probability of being in state s given
a stream of commands and observations - For a POMDP, there is a simple procedure for
updating B(s) given a command, an observation,
and the previous B(s) for all s
8The Problem with Belief States
Spacecraft Propulsion System Model
Robot Navigation Model
1
- Few variables
- Typically 104 states
- Smooth
- Hundreds of variables
- Typically 10150 states
- Not smooth
Can we compute B(s) for a subset of the state
space?
9Valve Driver Example
Valve Driver
command
Flowv1
Valve1
Pump
Flowv2
Valve2
- Valve Driver (VDU) sends command to valves
- Pump pressurizes the valves
- Flow measured at each valve
- VDU may hang, valves may stick shut
10Track the Most Likely State?
Probability
Time
- Start with a known state
- Track most likely outcomes given current
observations - Was very fast when used on Deep Space 1
- MLS may become arbitrarily unlikely or
inconsistent -
11The Whole Idea
Incremental Belief State Generation
Probability
Observe no flow.
time
- Avoid committing to a small number of
trajectories - Build a structure that compactly represents all
evolutions - Generate additional trajectories in likelihood
order as needed or as time allows
12Basic Approach
13Compositional, Consistency-Based POMDP
- Transitions specified by non-deterministic
automata -
- Observations specified by propositional logic
model -
- Computational Leverage
- Assume transition probabilities of automata are
independent - States that are logically inconsistent with
observations have zero probability
14Encoding the Valve Automaton
Cmdinopen
open
closed
Flowzero
Flow?Pressure
Cmdinclose
stuck
Flowzero
- State constraints are quite easy to represent
propositionally - Valveclosed ? Flowzero
- How can we represent stochastic transitions?
15Building a One Step Model
Propositional Constraint System
Time t
Time t1
off
vdu
tvdu
???
cmdin
open
none
cmdout
closed
v1
tv1
???
Flowv1
zero
v2
closed
tv2
???
Flowv2
zero
16Trajectory Tracking
t1
t0
t2
t0
t1
t2
off
on
on
off
vdu
vdu
tvdu
tvdu
nom
nom
cmdin
open
on
none
cmdin
on
open
none
open
none
cmdout
none
none
cmdout
closed
open
closed
v1
closed
v1
tv1
tv1
nom
nom
Flowv1
zero
zero
high
Flowv1
zero
closed
v2
closed
open
closed
v2
tv2
tv2
nom
nom
Flowv2
zero
zero
high
Flowv2
zero
zero
zero
- The assignments to t capture every possible
trajectory - Trajectories can be enumerated in prior
probability order - P(trajectory) S P(t assignment)
- Each trajectory can be checked for agreement with
observations
17Naive Tracking Solution
- To track the n most likely trajectories
- for t0 t ! ? t
- Create full copy of the model for the new time
step t - Assign command and observation variables
- Loop over assignments to t in order of prior
probability - Install t assignment into model and predict
observations - Check for consistency with actual observations
- Throw assignment out if inconsistent
- If n consistent trajectories found, exit loop
- Report the t-length assignments
18An algorithm with only 2 problems!
- Problem 1 Space
- Representation grows linearly with t
- Source of the problem is the algorithm step
- Create full copy of the model for the new time
step t - Approach Develop an approximate representation
- Problem 2 Time
- Search space grows exponentially with t
- Checks an exponential number of obviously wrong
candidates - Source of the problem is the algorithm steps
- Loop over assignments to t in order of prior
probability - Check for consistency with actual observations
- Approach Conflict-based algorithm rather than
generate test
19Approximate Representations
20Minimizing Each Time Step
- Need we distinguish a failure at time t from one
at t-1?
- State at t2 is independent of whether V1 stuck
at t1 or t0 - State of V1 at t3 depends upon if VDU hung at
t2 or t3 - Intuition command transmission allows future
observations to witness the exact time of a past
failure
21Minimizing Each Time Step
- Static analysis tells us when to avoid
introducing variables
zero
zero
- This is proven to be a conservative approximation
- We are dropping observations
- Does not discard any consistent trajectories but
allows some imposters - Imposter trajectories should be knocked out by
future observation stream - Future work Add back in observations at critical
time points - Future work Discard observations for search, use
them in a fast soundness check
22History Truncation
t7
t3
- Many assignments to old t remain untried after
many observations - They represent an exponential of increasing
unlikely trajectories - We commit to the n initial trajectories that look
likely thus far
23Complete Representation
Complete Model
Conservative Approximation
Gross Approximation
Time
Present
hung
off
on
on
off
vdu
off
tvdu
nom
nom
nom
nom
nom
reset
on
cmdin
open
close
off
cmdout
open
off
close
closed
v1
open
closed
closed
open
closed
tv1
nom
nom
nom
nom
nom
high
zero
Flowv1
zero
open
v2
closed
closed
closed
open
closed
tv2
nom
nom
nom
nom
nom
high
zero
Flowv2
zero
Variables
24In-Situ Propellant Production
25ISPP Model Growth
26Circuit Breaker (CB) Cascade
- 15 Circuit Breakers, 8 Lights, 1 Power Source
- Familiar domain, but with many ambiguous
diagnoses
27Circuit Breaker Model Growth
28Search Algorithms
29Desired Search Properties
- Assume all failures are equally likely. The
algorithm uses infinitesimal probabilities. This
is easier to explain. - Let n be the minimal number of failures in a
consistent trajectory -
- Track all consistent trajectories with n failures
- Little computation as long as an n-failure
trajectory is consistent - If no trajectories with n failures remain, track
all with n1 failures - Focused search that avoids obviously wrong
trajectories
30Focusing on Consistent Trajectories
- Suppose we consider the all nominal trajectory
when Flowv1,2 zero - We discover the following is inconsistent with
the model - Flowv1 zero ? tvdu,0nom ? tv1,1nom ? tv1,0nom
- No consistent trajectory contains tvdu,0nom,
tv1,1nom, tv1,0nom -
- tvdu,0nom, tv1,1nom, tv1,0nom is a nogood
or conflict
31Conflict Coverage
- Suppose we start with the following nogoods
- tvdu,0nom, tv1,1nom, tv1,0nom,
tvdu,0nom, tv2,1nom, tv2,0nom - Assign n of the t to failure values, covering all
nogoods - NP-hard hitting set problem
- Equivalent to finding all n-failure trajectories
- At each time t, extend trajectories without
adding failures - Check trajectories for consistency (generating
more nogoods) - If any n-failure trajectories remain consistent
- If all n-failure trajectories become
inconsistent, we know we need at least an
(n1)-failure trajectory
32Initial Experiments
- Conflict coverage run on ISPP and circuit breaker
models - ISPP runs were 30 steps
- Circuit breaker runs were 620 steps
- Multiple runs with multiple failure injections
per run - Representation parameters
- Full model for 1 time step
- No history truncation
- Conservative approximation portion allowed to
grow unboundedly
33Single Diagnosis Results
- History has little performance impact for a
single diagnosis - Rank is low, conflict set is small. Hitting set
problem is easy. - nogood length does not seem to have a big impact
in this case
CPU time used per time step, for 620 step CB
simulation with failure on step 519 Apologies for
this graph
34Failure Independence
- Time to diagnose a failure at time t often
independent of any failures diagnosed at previous
time steps
- Horrible ISPP valve failure followed by a simple
heater failure
35Failure Interaction
- Time to diagnose a failure at time t can be
increased by failures diagnosed at previous time
steps - Determining factor seems fair
- May be mitigated by history cutoff, simple
algorithm changes
- A simple, repairable circuit breaker failure,
occurring 40 times.
36Failure Independence
- Imagine diagnosing horrible valve failure (V),
then encountering the heater failure (H)
nogood set
0. Rule out nominal trajectory, introducing
nogoods 1. Generate hitting set, yielding V
candidates 2. Check consistency of V candidates,
introducing more nogoods 3. Occurrence of H
rules out V trajectories, generating more
nogoods 4. Generate hitting set, yielding (VH)
candidates
- Checking consistency of V hitting set introduces
more nogoods - These nogoods memoize the previous hitting set
consistency computation - Recomputing hitting set for V takes near-zero
time, returns no inconsistent candidates - Performance suggests hitting set for H is solved
independently
37Action Selection
38The Safe Planning Problem
- Desire Respect uncertainty when acting
System Evolution
time
- A plan is conformant if all states reach the goal
- Safe planning generalizes conformant planning
- A plan is safe if some states reach the goal, and
no safety constraints are violated regardless - If we dont reach the goal, we safely get
information
39Conformant Techniques
Start state 1
Start state 2
Start state 3
Start state 4
- Start states represented as explicit disjunction
or BDD - Conformant graphplan (Smith Weld, AAAI98)
- Conformant planning via symbolic model checking
(Cimatti Roveri, ECP99) - Planner considers the result of an action in all
states - Can we use our bag of tricks to do better?
40Conflict-based Repair
BlackBox (Kautz Selman, IJCAI 99)
BlackBox is a fast planner for generating
plans from a known initial state
Graphplan/SAT translator
Domain Action Model
SAT Solver
Plan
41Summary
42Progress in State Identification
- Formulation of the trajectory tracking problem
- Conditions that equate trajectory tracking
state tracking - Identification as specialization of POMDP
- Identification as a generalization of model-based
diagnosis - Formulation of the transition system
representation - Development of scalable approximations
- Conservative (completeness) proof
- Development of conflict coverage search
- Soundness check approximations (in progress)
- Implementation and experimental results
43Progress in Action Selection
- Formulation of the safe planning problem
- Conflict-based approach to safe or conformant
planning - Safe planning algorithms (in progress)
44Proposed Future Work
-
- State Identification
- Soundness check on diagnoses generated from
approximation - Re-introduction of most relevant observation
variables - Finite horizon experiments
- Interleave coverage generation and consistency
checking - Action Selection
- Algorithms for safe planning
- First approach will be mapping to SATPLAN
- Algorithm implementations
- Experimental results in safe conformant
planning
45Related Work
- State identification
- Livingstone (Williams and Nayak, AAAI96)
- Belief state approximation (Boyen Koller,
UAI98) - Conflict-directed search (deKleer Williams AIJ
Vol 32) - Planning
- Conformant graphplan (Smith Weld, AAAI98)
- Conformant planning via symbolic model checking
(Cimatti Roveri, ECP99) - Blackbox (Kautz Selman, IJCAI 99)
- General
- Infinitesimals (Goldszmidt Pearl, KR92)
- POMDP (Sondik 1971, PhD, Cheng 1988, PhD,
Littman, Cassandra, Kaelbling, ML95)
46Acknowledgements
- U.S. taxpayers fund this work
- NASA supports this line of research and provides
flight opportunities - Brian Williams Pandu Nayak developed
Livingstone (AAAI 96) - My committee, Dave Smith, Daniel J. Clancy,
Shirley Pepke and anonymous reviewers provided
valuable input