A Hybridized Planner for Stochastic Domains PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: A Hybridized Planner for Stochastic Domains


1
A Hybridized Planner for Stochastic Domains
  • Mausam and Daniel S. Weld
  • University of Washington, Seattle
  • Piergiorgio Bertoli
  • ITC-IRST, Trento

2
Planning under Uncertainty(ICAPS03 Workshop)
  • Qualitative (disjunctive) uncertainty
  • Which real problem can you solve?
  • Quantitative (probabilistic) uncertainty
  • Which real problem can you model?

3
The Quantitative View
  • Markov Decision Process
  • models uncertainty with probabilistic outcomes
  • general decision-theoretic framework
  • algorithms are slow
  • do we need the full power of decision theory?
  • is an unconverged partial policy any good?

4
The Qualitative View
  • Conditional Planning
  • Model uncertainty as logical disjunction of
    outcomes
  • exploits classical planning techniques ? FAST
  • ignores probabilities ? poor solutions
  • how bad are pure qualitative solutions?
  • can we improve the qualitative policies?

5
HybPlan A Hybridized Planner
  • combine probabilistic disjunctive planners
  • produces good solutions in intermediate times
  • anytime makes effective use of resources
  • bounds termination with quality guarantee
  • Quantitative View
  • completes partial probabilistic policy by using
    qualitative policies in some states
  • Qualitative View
  • improves qualitative policies in more important
    regions

6
Outline
  • Motivation
  • Planning with Probabilistic Uncertainty (RTDP)
  • Planning with Disjunctive Uncertainty (MBP)
  • Hybridizing RTDP and MBP (HybPlan)
  • Experiments
  • Conclusions and Future Work

7
Markov Decision Process
  • lt S, A, Pr, C, s0, G gt
  • S a set of states
  • A a set of actions
  • Pr prob. transition model
  • C cost model
  • s0 start state
  • G a set of goals
  • Find a policy (S ! A)
  • minimizes expected cost to reach a goal
  • for an indefinite horizon
  • for a fully observable
  • Markov decision process.

Optimal cost function, J, optimal policy
8
Example
2
Longer path
s0
Goal
All states are dead-ends
2
Wrong direction, but goal still reachable
9
Optimal State Costs
2
2
3
3
4
4
1
3
2
1
1
4
1
0
3
1
2
1
1
Goal
8
8
2
7
7
6
10
Optimal Policy
3
2
1
4
0
3
2
1
Goal
11
Bellman Backup
  • Create better approximation to cost function _at_ s

12
Bellman Backup
Trialsimulate greedy policy update visited
states
  • Create better approximation to cost function _at_ s

13
Bellman Backup
Real Time Dynamic Programming(Barto et al. 95
Bonet Geffner03)
Repeat trials until cost function converges
Trialsimulate greedy policy update visited
states
  • Create better approximation to cost function _at_ s

14
Planning with Disjunctive Uncertainty
  • lt S, A, T, s0, G gt
  • S a set of states
  • A a set of actions
  • T disjunctive transition model
  • s0 the start state
  • G a set of goals
  • Find a strong-cyclic policy (S ! A)
  • that guarantees reaching a goal
  • for an indefinite horizon
  • for a fully observable
  • planning problem

15
Model Based Planner (Bertoli et. al.)
  • States, transitions, etc. represented logically
  • Uncertainty ? multiple possible successor states
  • Planning Algorithm
  • Iteratively removes bad states.
  • Bad dont reach anywhere or reach other bad
    states

16
MBP Policy
Sub-optimal solution
Goal
17
Outline
  • Motivation
  • Planning with Probabilistic Uncertainty (RTDP)
  • Planning with Disjunctive Uncertainty (MBP)
  • Hybridizing RTDP and MBP (HybPlan)
  • Experiments
  • Conclusions and Future Work

18
HybPlan Top Level Code
  • 0. run MBP to find a solution to goal
  • run RTDP for some time
  • compute partial greedy policy (?rtdp)
  • compute hybridized policy (?hyb) by
  • ?hyb(s) ?rtdp(s) if visited(s) gt
    threshold
  • ?hyb(s) ?mbp(s) otherwise
  • clean ?hyb by removing
  • dead-ends
  • probability 1 cycles
  • evaluate ?hyb
  • save best policy obtained so far

repeat until 1) resources exhaust or 2)a
satisfactory policy found
19
First RTDP Trial
0
  • run RTDP for some time

2
0
0
0
0
0
0
0
0
0
0
0
Goal
0
0
0
0
0
0
0
2
0
0
0
20
Bellman Backup
0
  • run RTDP for some time

2
0
0
0
0
0
0
0
0
0
0
Goal
0
0
0
0
0
0
0
Q1(s,N) 1 0.5 0 0.5 0 Q1(s,N) 1 Q1(s,S)
Q1(s,W) Q1(s,E) 1 J1(s) 1 Let greedy
action be North
2
0
0
0
21
Simulation of Greedy Action
0
  • run RTDP for some time

2
0
0
0
0
0
0
0
0
0
1
0
Goal
0
0
0
0
0
0
0
2
0
0
0
22
Continuing First Trial
0
  • run RTDP for some time

2
0
0
0
0
0
0
0
0
1
0
Goal
0
0
0
0
0
0
0
2
0
0
0
23
Continuing First Trial
0
  • run RTDP for some time

2
0
0
1
0
0
0
0
0
0
Goal
1
0
0
0
0
0
0
0
2
0
0
0
24
Finishing First Trial
  • run RTDP for some time

2
1
0
0
1
0
0
0
0
0
0
Goal
1
0
0
0
0
0
0
0
2
0
0
0
25
Cost Function after First Trial
2
  • run RTDP for some time

2
0
1
0
1
0
0
0
0
0
0
Goal
1
0
0
0
0
0
0
0
2
0
0
0
26
Partial Greedy Policy
2
2. compute greedy policy (?rtdp)
2
1
0
1
Goal
1
27
Construct Hybridized Policy w/ MBP
2
3. compute hybridized policy (?hyb) (threshold
0)
2
1
0
0
1
Goal
1
28
Evaluate Hybridized Policy
2
2
5. evaluate ?hyb 6. store ?hyb
2
1
0
3
3
0
1
4
4
Goal
1
5
After first trial
J(?hyb) 5
29
Second Trial
2
2
0
1
0
1
0
0
0
0
0
2
Goal
1
0
1
0
1
0
0
0
2
0
0
0
30
Partial Greedy Policy
0
2
1
1
1
31
Absence of MBP Policy
2
2
0
1
MBP Policy doesnt exist! no path to goal
0
1
0

2
1
Goal
1
1
32
Third Trial
2
2
0
1
0
1
0
0
0
0
0
Goal
1
2
0
1
0
1
0
1
0
2
0
1
3
33
Partial Greedy Policy
1
1
0
2
1
3
34
Probability 1 Cycles
repeat find a state s in cycle ?hyb(s)
?mbp(s) until cycle is broken
1
1
0
2
1
0
3
35
Probability 1 Cycles
repeat find a state s in cycle ?hyb(s)
?mbp(s) until cycle is broken
1
1
0
2
1
0
3
36
Probability 1 Cycles
repeat find a state s in cycle ?hyb(s)
?mbp(s) until cycle is broken
1
1
0
2
1
0
3
37
Probability 1 Cycles
repeat find a state s in cycle ?hyb(s)
?mbp(s) until cycle is broken
1
1
0
2
1
0
3
38
Probability 1 Cycles
2
2
0
1
0
1
repeat find a state s in cycle ?hyb(s)
?mbp(s) until cycle is broken
1
Goal
1
0
2
1
0
3
39
Error Bound
2
2
2
1
0
3
3
0
1
4
4
J(s0) 5 J(s0) 1 ) Error(?hyb) 5-1 4
Goal
1
5
After 1st trial
J(?hyb) 5
40
Termination
  • when a policy of required error bound is found
  • when the planning time exhausts
  • when the available memory exhausts

Properties
  • outputs a proper policy
  • anytime algorithm (once MBP terminates)
  • HybPlan RTDP, if infinite resources available
  • HybPlan MBP, if extremely limited resources
  • HybPlan better than both, otherwise

41
Outline
  • Motivation
  • Planning with Probabilistic Uncertainty (RTDP)
  • Planning with Disjunctive Uncertainty (MBP)
  • Hybridizing RTDP and MBP (HybPlan)
  • Experiments
  • Anytime Properties
  • Scalability
  • Conclusions and Future Work

42
Domains
NASA Rover Domain Factory Domain
Elevator domain
43
Anytime Properties
RTDP
44
Anytime Properties
RTDP
45
Scalability
46
Conclusions
  • First algorithm that integrates disjunctive and
    probabilistic planners.
  • Experiments show that HybPlan is
  • anytime
  • scales better than RTDP
  • produces better quality solutions than MBP
  • can interleaved planning and execution

47
Hybridized Planning A General Notion
  • Hybridize other pairs of planners
  • an optimal or close-to-optimal planner
  • a sub-optimal but fast planner
  • to yield a planner that produces
  • a good quality solution in intermediate running
    times
  • Examples
  • POMDP RTDP/PBVI with POND/MBP/BBSP
  • Oversubscription Planning A with greedy
    solutions
  • Concurrent MDP Sampled RTDP with single-action
    RTDP
Write a Comment
User Comments (0)
About PowerShow.com