Title: Classical Situation
1Classical Situation
hell
heaven
- World deterministic
- State observable
2MDP-Style Planning
hell
heaven
- World stochastic
- State observable
3Stochastic, Partially Observable
hell?
heaven?
sign
Sondik 72 Littman/Cassandra/Kaelbling 97
4Stochastic, Partially Observable
hell
heaven
heaven
hell
sign
sign
5Stochastic, Partially Observable
heaven
hell
?
?
hell
heaven
sign
sign
sign
6Robot Planning Frameworks
7MDP-Style Planning
hell
heaven
- World stochastic
- State observable
8Markov Decision Process (discrete)
r1
0.1
s2
0.9
0.7
0.1
0.3
0.99
r0
s3
0.3
s1
r20
0.3
0.4
0.2
s5
s4
r0
0.8
r-10
Bellman 57 Howard 60 Sutton/Barto 98
9Value Iteration
- Value function of policy p
- Bellman equation for optimal value function
- Value iteration recursively estimating value
function - Greedy policy
Bellman 57 Howard 60 Sutton/Barto 98
10Value Iteration for Motion Planning(assumes
knowledge of robots location)
11Continuous Environments
From A Moore C.G. Atkeson The Parti-Game
Algorithm for Variable Resolution Reinforcement
Learning in Continuous State spaces, Machine
Learning 1995
12Approximate Cell Decomposition Latombe 91
From A Moore C.G. Atkeson The Parti-Game
Algorithm for Variable Resolution Reinforcement
Learning in Continuous State spaces, Machine
Learning 1995
13Parti-Game Moore 96
From A Moore C.G. Atkeson The Parti-Game
Algorithm for Variable Resolution Reinforcement
Learning in Continuous State spaces, Machine
Learning 1995
14Robot Planning Frameworks
15Stochastic, Partially Observable
16A Quiz
actions
states
size belief space?
sensors
3 s1, s2, s3
3 s1, s2, s3
23-1 s1, s2, s3, s12, s13, s23, s123
2-dim continuous p(Ss1), p(Ss2)
?-dim continuous
?-dim continuous
aargh!
17Introduction to POMDPs (1 of 3)
p(s1)
Sondik 72, Littman, Kaelbling, Cassandra 97
18Introduction to POMDPs (2 of 3)
100
-100
100
-40
80
0
b
a
b
a
-100
s2
s1
s2
s1
p(s1)
Sondik 72, Littman, Kaelbling, Cassandra 97
19Introduction to POMDPs (3 of 3)
100
-100
100
-40
80
0
b
a
b
a
-100
c
80
s2
s1
s2
s1
p(s1)
20
Sondik 72, Littman, Kaelbling, Cassandra 97
20Value Iteration in POMDPs
Substitute b for s
- Value function of policy p
- Bellman equation for optimal value function
- Value iteration recursively estimating value
function - Greedy policy
21Missing Terms Belief Space
- Expected reward
- Next state density
Bayes filters! (Dirac distribution)
22Value Iteration in Belief Space
23Why is This So Complex?
State Space Planning (no state uncertainty)
Belief Space Planning (full state uncertainties)
24Augmented MDPs
uncertainty (entropy)
conventional state space
Roy et al, 98/99
25Path Planning with Augmented MDPs
Conventional planner
Probabilistic Planner
Roy et al, 98/99
26Robot Planning Frameworks