Title: Structured Models for Decision Making
1Structured Models forDecision Making
- Daphne Koller
- Stanford University
- koller_at_cs.Stanford.edu
MURI Program on Decision Making under
Uncertainty July 18, 2000
2Roadmap
Bayes Nets
PRMs
Static
Encapsulation Reuse
Dynamic PRMs
DBNs
Dynamic
Encapsulation Approximation
Relational MDPs
Factored MDPs
Decision Problem
Factored Policy Iteration, Efficient PRM inference
3Outline
- Probabilistic Relational Models
- Representing complex domains
- Structural uncertainty
- Temporal models
- Decision making
4Basic units of knowledge
entities properties relations
attributes
5So what?
- Set of entities and relations between them is
determined at BN design time - structure must be known in advance
- hard to adapt to changes
- BNs for complex domains are large unstructured
- ? very hard to build
- No ability to generalize
- across similar individuals
- across related situations
6Probabilistic Relational Models
- Combine advantages of predicate logic BNs
- natural domain modeling objects, properties,
relations - generalization over a variety of situations
- compact, natural probability models.
- Integrate uncertainty with relational model
- properties of domain entities can depend on
properties of related entities - uncertainty over relational structure of domain.
7Real-World Case Study
Battlefield situation assessment for missile units
- several locations
- many units
- each has detailed model
- Example object classes
- Battalion
- Battery
- Vehicle
- Location
- Weather.
- Example relations
- At-Location
- Has-Weather
- Sub-battery/In-battalion
- Sub-vehicle/In-battery
8Scud Battery Simplified PRM
Under Fire
Launcher
(Launcher.status ok)
Next Mission
9SCUD Battery Model
10Cargo Vehicle Group
11Original BN SCUD Battery
- Disadvantages
- A lot more complex
- must include relevant attributes of related
objects - Hard to transfer information between different BN
models
Built by IET, Inc.
12Situation Models
- Complex situations can be described compactly by
specifying objects and relations between them - Class model is instantiated for each object, with
probabilistic dependencies induced by relations
13Example reasoning pattern
Scud-Battalion-Charlie
under_fire
under_fire
heavy
0.06
0.44
0.28
0.33
Battery1
hit
hit
Group-TLs
Loc
TL1
TL2
damaged
damaged
good
hide-support
hide-support
rep_damaged
rep_damaged
reported_damaged
reported_damaged
none
14Inference in PRMs
PRM
Situation description
Induces
BN over attributes
15Exploit Structure for Inference
- Encapsulation objects interact in limited ways
- Inference can be encapsulated within objects,
with communication limited to interfaces - Reuse objects from same class have same model
- Inference from one can be reused for others
16Effects of exploiting structure
6000
flat BN
no reuse
with reuse
5000
4000
running time in seconds
3000
2000
1000
0
1
2
3
4
5
6
7
8
9
10
vehicles of each type / battery
17Extension Structural Uncertainty
- Uncertainty about model structure
- Set of objects is that radar signal from a tank
- Relations between objects location of
SCUD-Battalion-C - Task 1 Seamless integration w. probabilistic
model - structural variables can depend on other
variables. - Task 2 Efficient Inference
- Use approximate inference to simplify model
- variational methods to summarize multiple
potential influences - MCMC for traversing possible relationships
- Use structured inference (encapsulation/reuse) on
simplified model
18Outline
- Probabilistic Relational Models
- Temporal models
- Structured belief-state tracking
- Dynamic PRMs time, events and actions
- Decision making
19Dynamic Bayesian Nets
Action(t2)
Action(t1)
Action(t)
...
Velocity(t2)
Velocity(t1)
Velocity(t)
Position(t2)
Position(t1)
Position(t)
Observed_pos(t)
Observed_pos(t1)
Observed_pos(t2)
- Compact representation of system dynamics
- discrete, continuous, hybrid
- Generalization of Kalman filters
20Tracking System State
Task Maintain Belief state distribution over
current state given evidence so far
Action(t2)
Action(t1)
Action(t)
...
Velocity(t2)
Velocity(t1)
Velocity(t)
Position(t2)
Position(t1)
Position(t)
- In discrete/hybrid systems, belief state
representation is exponential in of state
variables - In hybrid systems, of distinct hypotheses grows
exponentially over time
21Approximate Tracking
- Decompose belief state along subsystem lines
- Maintain belief state as product of marginals
- In hybrid systems, keep mixture of hypotheses for
every subsystem - Merge hypotheses associated with similar density
22Case Study Diagnosis Tracking for Five-Tank
System
F1o
F5o
F23
observables
- State space per time slice
- eleven-dimensional continuous space
- 227 discrete failure modes
23The doomsday scenario
24Algorithm Performance
Omniscient Kalman Filter
25Dynamic PRMs
- Goal Model complex structured systems
- that evolve over time
- where agents take compound structured actions
- construct effective scalable inference
algorithm - Easy part Add time relation to PRMs
- Allows notion of current and previous state
- Maintains notions of structured objects and
relations - Challenges
- Appropriate representation for actions, events
- Modeling changes in domain structure (objects,
relations) - Effective inference that exploits structure
26Dynamic PRMs Event Models
Events Discrete points where the system
undergoes a discontinuous change
- Events can be triggered by external events
- an agents action
- or by system dynamics
- e.g., a unit reaches its destination
- Events can influence the system structure
- discrete change in continuous dynamics
- truck velocity goes to 0 when destination is
reached - modification of relational structure
- aircraft taking off is no longer on aircraft
carrier - creation / deletion of objects
- units entering/leaving battlespace
27Dynamic PRMs Adding Actions
- Use relational / hierarchical action
representation - class hierarchy for Move action
- an instantiation of a particular action is
related to object moving, road taken, origin,
destination - Actions can depend on and influence attributes of
related objects - duration of Move action may depend on road
condition, influence status of moving objects - Actions are like events, can change domain
structure - Complex actions can be composed of simpler ones
- Effects of complex action derived from that of
subactions
28Inference in Dynamic Systems
- Main tasks
- situation monitoring
- prediction
- Goal Exploit structure as we did in PRMs
- First step Encapsulation
- Exploit structure of weakly interacting
subsystems - Applied successfully to Dynamic Bayesian Nets
29Tracking in Dynamic PRMs
- Use relational structure to guide belief state
approximation - direct dependencies only between related objects
- Deal with dynamic structure
- relations and even domain objects change over
time - want to adjust our approximation to context
- structural uncertainty critical
- Event-driven tracking
- no reason to use fine-grained model of boring
bits - but fast forward requires ability to propagate
dynamics over variable-length segments
30Outline
- Probabilistic Relational Models
- Temporal models
- Decision making
- Planning in factored MDPs
- Planning in relational MDPs
31What is a Markov Decision Process?
- An MDP is a controlled dynamic process
- Stochastic transition between states
- Actions affect system dynamics
- Rewards or costs are associated with states
- Objective Drive process to regions of high
reward - MDP solutions are policies
- Policies assign an action to every state
32MDP Policies Value Functions
Suppose an expert told you the value of each
state
V(s1) 10
V(s2) 5
s1
s1
0.7
0.5
s2
s2
0.3
0.5
Action 2
Action 1
33Greedy Policy Construction
Pick action with highest expected future value
Expectation over next-state values
34Bootstrapping Policy Iteration
Idea Greedy selection is useful even with
suboptimal V
Guess V
Repeat until policy doesnt change
? greedy(V)
V value of acting on ?
Guaranteed to find globally optimal policy if V
is defined over explicit states, i.e., if V is
exponential
Exploit Structure with Factored Policy Iteration
35Factored MDPs DBNS Rewards
t
t1
Rewards have small sets of parent variables too
X
Y
Total reward adds sub-rewards RR1R2
Z
36Linearly Decomposable Value Functions
Note Overlapping is allowed!
Approximate high-dimensional value function with
combination of lower-dimensional functions
Motivation Multi-attribute utility theory
(Keeney Raifa)
37Decomposable Value Functions
Linear combination of restricted domain functions
- Each basis function hi is the status of some
small part(s) of a complex system - status of a machine
- inventory of a store
- status of a subgoal
38Exploiting Structure
X
Key operation backprojection of a basis
function thru a DBN transition
Y
Z
Structure allows us to consider operations
over small subsets of variables, not the entire
state space.
39Policy Format
Factored value functions ? compact
action effect descriptions
Action 1
Action 2
Sorted result values form a decision list
If then action 1 else if then action
2 else if then action 1
40Factored Policy Iteration Summary
Structure induces decision-list policy
Guess V
? greedy(V)
V value of acting on ?
Key operations isomorphic to BN inference
- Time per iteration reduced from O((2n)3) to
O(Cbk3) - Cb cost of Bayes net inference (function of
structure) - k number of basis functions (k ltlt 2n)
41Run Times
70000
States
Seconds
3n3
60000
50000
40000
CPU Seconds/States
30000
20000
10000
0
4
6
8
10
12
14
16
State Variables
Note Nearly optimal policy found in all cases (?
6).
42Planning in Relational MDPs
- Replace DBN transition model with dynamic PRM
- Generalize factored policy iteration
- Define basis functions via relational formulas
- Replace BN inference with PRM inference as key
step - Exploit hierarchical structure of complex actions
by encapsulating decision making along hierarchy - Potential benefits
- Tractable approximate planning in relational
domains - Unification of classical and stochastic planning
43Conclusions Past Present
- PRMs compactly represent complex systems with
multiple interacting objects - coherent (probabilistic) semantics
- structured representation modularity reuse.
- Scalable inference that exploits structure
- Tracking algorithms for DBNs that exploit system
decomposition - Planning algorithms in MDPs that exploit
structure of system and of value functions
Theme Representation inference scale up,
if we exploit structure
44Conclusions Future
- Better inference for densely connected PRMs
- Extending PRMs with time, events, actions
- Exploit structure for inference in dynamic PRMs
- system decomposition into subsystems
- relational context
- varying time granularity
- Planning in dynamic PRMs
- extend factored policy iteration to PRMs
- exploit hierarchical action decomposition
45Acknowledgements
- Students postdocs
- Nir Friedman (? Hebrew U.)
- Dirk Ormoneit
- Ron Parr (? Duke)
- Xavier Boyen
- Urszula Chajewska
- Lise Getoor
- Carlos Guestrin
- Uri Lerner
- Uri Nodelman
- Avi Pfeffer (? Harvard)
- Eran Segal
- Benjamin Taskar
- Simon Tong
- Brian Milch (? Berkeley)
- Ken Takusagawa (? MIT)
- Support
- PECASE Award via ONR YIP
- DARPAs HPKB Program
- MURI Program Integrated Approach to Intelligent
Systems - Sloan Faculty Fellowship
- DARPAs IA Program under subcontract to SRI
International - DARPAs DMIF Program under subcontract to IET
Inc. - ONR grant
Postdocs
PhD students
Ugrad
http//robotics.stanford.edu/koller/