A Hybridized Planner for Stochastic Domains presentation

About This Presentation

Transcript and Presenter's Notes

Title: A Hybridized Planner for Stochastic Domains

1
A Hybridized Planner for Stochastic Domains

Mausam and Daniel S. Weld
University of Washington, Seattle
Piergiorgio Bertoli
ITC-IRST, Trento

2
Planning under Uncertainty(ICAPS03 Workshop)

Qualitative (disjunctive) uncertainty
Which real problem can you solve?

Quantitative (probabilistic) uncertainty
Which real problem can you model?

3
The Quantitative View

Markov Decision Process
models uncertainty with probabilistic outcomes
general decision-theoretic framework
algorithms are slow
do we need the full power of decision theory?
is an unconverged partial policy any good?

4
The Qualitative View

Conditional Planning
Model uncertainty as logical disjunction of
outcomes
exploits classical planning techniques ? FAST
ignores probabilities ? poor solutions
how bad are pure qualitative solutions?
can we improve the qualitative policies?

5
HybPlan A Hybridized Planner

combine probabilistic disjunctive planners
produces good solutions in intermediate times
anytime makes effective use of resources
bounds termination with quality guarantee
Quantitative View
completes partial probabilistic policy by using
qualitative policies in some states
Qualitative View
improves qualitative policies in more important
regions

6
Outline

Motivation
Planning with Probabilistic Uncertainty (RTDP)
Planning with Disjunctive Uncertainty (MBP)
Hybridizing RTDP and MBP (HybPlan)
Experiments
Conclusions and Future Work

7
Markov Decision Process

lt S, A, Pr, C, s0, G gt
S a set of states
A a set of actions
Pr prob. transition model
C cost model
s0 start state
G a set of goals

Find a policy (S ! A)
minimizes expected cost to reach a goal
for an indefinite horizon
for a fully observable
Markov decision process.

Optimal cost function, J, optimal policy
8
Example
2
Longer path
s0
Goal
All states are dead-ends
2
Wrong direction, but goal still reachable
9
Optimal State Costs
2
2
3
3
4
4
1
3
2
1
1
4
1
0
3
1
2
1
1
Goal
8
8
2
7
7
6
10
Optimal Policy
3
2
1
4
0
3
2
1
Goal
11
Bellman Backup

Create better approximation to cost function _at_ s

12
Bellman Backup
Trialsimulate greedy policy update visited
states

Create better approximation to cost function _at_ s

13
Bellman Backup
Real Time Dynamic Programming(Barto et al. 95
Bonet Geffner03)
Repeat trials until cost function converges
Trialsimulate greedy policy update visited
states

Create better approximation to cost function _at_ s

14
Planning with Disjunctive Uncertainty

lt S, A, T, s0, G gt
S a set of states
A a set of actions
T disjunctive transition model
s0 the start state
G a set of goals

Find a strong-cyclic policy (S ! A)
that guarantees reaching a goal
for an indefinite horizon
for a fully observable
planning problem

15
Model Based Planner (Bertoli et. al.)

States, transitions, etc. represented logically
Uncertainty ? multiple possible successor states
Planning Algorithm
Iteratively removes bad states.
Bad dont reach anywhere or reach other bad
states

16
MBP Policy
Sub-optimal solution
Goal
17
Outline

Motivation
Planning with Probabilistic Uncertainty (RTDP)
Planning with Disjunctive Uncertainty (MBP)
Hybridizing RTDP and MBP (HybPlan)
Experiments
Conclusions and Future Work

18
HybPlan Top Level Code

0. run MBP to find a solution to goal
run RTDP for some time
compute partial greedy policy (?rtdp)
compute hybridized policy (?hyb) by
?hyb(s) ?rtdp(s) if visited(s) gt
threshold
?hyb(s) ?mbp(s) otherwise
clean ?hyb by removing
dead-ends
probability 1 cycles
evaluate ?hyb
save best policy obtained so far

repeat until 1) resources exhaust or 2)a
satisfactory policy found
19
First RTDP Trial
0

run RTDP for some time

2
0
0
0
0
0
0
0
0
0
0
0
Goal
0
0
0
0
0
0
0
2
0
0
0
20
Bellman Backup
0

run RTDP for some time

2
0
0
0
0
0
0
0
0
0
0
Goal
0
0
0
0
0
0
0
Q1(s,N) 1 0.5 0 0.5 0 Q1(s,N) 1 Q1(s,S)
Q1(s,W) Q1(s,E) 1 J1(s) 1 Let greedy
action be North
2
0
0
0
21
Simulation of Greedy Action
0

run RTDP for some time

2
0
0
0
0
0
0
0
0
0
1
0
Goal
0
0
0
0
0
0
0
2
0
0
0
22
Continuing First Trial
0

run RTDP for some time

2
0
0
0
0
0
0
0
0
1
0
Goal
0
0
0
0
0
0
0
2
0
0
0
23
Continuing First Trial
0

run RTDP for some time

2
0
0
1
0
0
0
0
0
0
Goal
1
0
0
0
0
0
0
0
2
0
0
0
24
Finishing First Trial

run RTDP for some time

2
1
0
0
1
0
0
0
0
0
0
Goal
1
0
0
0
0
0
0
0
2
0
0
0
25
Cost Function after First Trial
2

run RTDP for some time

2
0
1
0
1
0
0
0
0
0
0
Goal
1
0
0
0
0
0
0
0
2
0
0
0
26
Partial Greedy Policy
2
2. compute greedy policy (?rtdp)
2
1
0
1
Goal
1
27
Construct Hybridized Policy w/ MBP
2
3. compute hybridized policy (?hyb) (threshold
0)
2
1
0
0
1
Goal
1
28
Evaluate Hybridized Policy
2
2
5. evaluate ?hyb 6. store ?hyb
2
1
0
3
3
0
1
4
4
Goal
1
5
After first trial
J(?hyb) 5
29
Second Trial
2
2
0
1
0
1
0
0
0
0
0
2
Goal
1
0
1
0
1
0
0
0
2
0
0
0
30
Partial Greedy Policy
0
2
1
1
1
31
Absence of MBP Policy
2
2
0
1
MBP Policy doesnt exist! no path to goal
0
1
0

2
1
Goal
1
1
32
Third Trial
2
2
0
1
0
1
0
0
0
0
0
Goal
1
2
0
1
0
1
0
1
0
2
0
1
3
33
Partial Greedy Policy
1
1
0
2
1
3
34
Probability 1 Cycles
repeat find a state s in cycle ?hyb(s)
?mbp(s) until cycle is broken
1
1
0
2
1
0
3
35
Probability 1 Cycles
repeat find a state s in cycle ?hyb(s)
?mbp(s) until cycle is broken
1
1
0
2
1
0
3
36
Probability 1 Cycles
repeat find a state s in cycle ?hyb(s)
?mbp(s) until cycle is broken
1
1
0
2
1
0
3
37
Probability 1 Cycles
repeat find a state s in cycle ?hyb(s)
?mbp(s) until cycle is broken
1
1
0
2
1
0
3
38
Probability 1 Cycles
2
2
0
1
0
1
repeat find a state s in cycle ?hyb(s)
?mbp(s) until cycle is broken
1
Goal
1
0
2
1
0
3
39
Error Bound
2
2
2
1
0
3
3
0
1
4
4
J(s0) 5 J(s0) 1 ) Error(?hyb) 5-1 4
Goal
1
5
After 1st trial
J(?hyb) 5
40
Termination

when a policy of required error bound is found
when the planning time exhausts
when the available memory exhausts

Properties

outputs a proper policy
anytime algorithm (once MBP terminates)
HybPlan RTDP, if infinite resources available
HybPlan MBP, if extremely limited resources
HybPlan better than both, otherwise

41
Outline

Motivation
Planning with Probabilistic Uncertainty (RTDP)
Planning with Disjunctive Uncertainty (MBP)
Hybridizing RTDP and MBP (HybPlan)
Experiments
Anytime Properties
Scalability
Conclusions and Future Work

42
Domains
NASA Rover Domain Factory Domain
Elevator domain
43
Anytime Properties
RTDP
44
Anytime Properties
RTDP
45
Scalability
46
Conclusions

First algorithm that integrates disjunctive and
probabilistic planners.
Experiments show that HybPlan is
anytime
scales better than RTDP
produces better quality solutions than MBP
can interleaved planning and execution

47
Hybridized Planning A General Notion

Hybridize other pairs of planners
an optimal or close-to-optimal planner
a sub-optimal but fast planner
to yield a planner that produces
a good quality solution in intermediate running
times
Examples
POMDP RTDP/PBVI with POND/MBP/BBSP
Oversubscription Planning A with greedy
solutions
Concurrent MDP Sampled RTDP with single-action
RTDP

Write a Comment

User Comments (0)

About PowerShow.com

A Hybridized Planner for Stochastic Domains PowerPoint PPT Presentation