Hierarchical POMDP Solutions - PowerPoint PPT Presentation

About This Presentation
Title:

Hierarchical POMDP Solutions

Description:

Belief states constitute a sufficient statistic for making decisions (Markov ... Usually agents don't require the entire belief space ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 60
Provided by: theo1
Category:

less

Transcript and Presenter's Notes

Title: Hierarchical POMDP Solutions


1
Hierarchical POMDP Solutions
  • Georgios Theocharous

2
Sequential Decision Making Under Uncertainty
What is the optimal policy?
3
Manufacturing Processes(Mahadevan, Theocharous
FLAIRS 98)
  • Reward
  • Reward for consuming
  • Penalize for filling buffers
  • Penalize for machine breakdown
  • Actions
  • Produce
  • Maintenance
  • What is the optimal policy?

4
Foveated Active Vision(Minut)
  • States
  • Objects
  • Observations
  • Local features
  • Reward
  • Reward for finding object
  • Actions
  • Where to saccade next
  • What features to use
  • What is the optimal policy?

5
Many More Partially Observable Problems
  • Assistive technologies
  • Web searching, preference elicitation
  • Sophisticated Computing
  • Distributed file access, Network trouble-shooting
  • Industrial
  • Machine maintenance, manufacturing processes
  • Social
  • Education, medical diagnosis, health care
    policymaking
  • Corporate
  • Marketing, corporate policy
  • .

6
Overview
  • Learning models of partially observable problems
    is far from a solved problem
  • Computing policies for partially observable
    domains is intractable
  • We Propose hierarchical solutions
  • Learn models using less space and time
  • Compute robust policies that cannot be computed
    by previous approaches

7
How?Spatial and Time Abstractions Reduce
Uncertainty
Spatial abstraction
MIT
Temporal abstraction
8
Outline
  • Sequential decision-making under uncertainty
  • A Hierarchical POMDP model for robot navigation
  • Heuristic macro-action selection in H-POMDPs
  • Near Optimal macro-action selection for arbitrary
    POMDPs
  • Representing H-POMDPs as DBNs
  • Current and Future directions

9
A Real System Robot Navigation
10
Belief States(Probability Distributions over
states)
True State
Belief State
11
Belief States(Probability Distributions over
states)
True State
Belief State
12
Belief States(Probability Distributions over
states)
True State
Belief State
13
Learning POMDPs
  • Given As and Zs compute
  • Ts and Os
  • Estimate probability distribution
  • over hidden states
  • Count number of times a state
  • was visited
  • Update T and O and repeat.
  • It is an Expectation Maximization algorithm
  • An iterative procedure for doing maximum
    likelihood parameter estimation over hidden state
    variables
  • Converges to local maxima

A1
A2
T(S1i,A1a,S2j)
S1
S3
S2
Z1
Z2
Z3
O(O2z,S2i,A1a)
14
Planning in POMDPs
  • Belief states constitute a sufficient statistic
    for making decisions (Markov property holds
    Astrom 1965)
  • Bellman equation

Since we have an infinite state space, the
problem becomes computationally
intractable (PSPACE hard for finite
horizon) (UNDECIDABLE for infinite horizon)
15
Our SolutionSpatial and Temporal Abstraction
  • Learning
  • A hierarchical Baum-Welch algorithm, which is
    derived from the Baum-Welch algorithm for
    training HHMMs (with Rohanimanesh and Mahadevan,
    ICRA 2001)
  • Structure learning from weak priors (with
    Mahadevan IROS 2002)
  • Inference can be done in linear time by
    representing H-POMDPs as Dynamic Bayesian
    Networks (DBNs) (with Murphy and Kaelbling, ICRA
    2004)
  • Planning
  • Heuristic macro-action selection (with Mahadevan,
    ICRA 2002)
  • Near optimal macro-action selection (with
    Kaelbling, NIPS 2003)
  • Structure Learning and Planning combined
  • Dynamic POMDP abstractions (with Mannor and
    Kaelbling)

16
Outline
  • Sequential decision-making under uncertainty
  • A Hierarchical POMDP model for robot navigation
  • Heuristic macro-action selection in H-POMDPs
  • Near Optimal macro-action selection for arbitrary
    POMDPs
  • Representing H-POMDPs as DBNs
  • Current and Future directions

17
Hierarchical POMDPs
18
Hierarchical POMDPs
  • ABSTRACT
  • STATES

ACTIONS
(Fine, Singer, Tishby, MLJ 98)
19
Experimental Environments
600 states
1200 states
20
The Robot Navigation Domain
  • The robot Pavlov in the real MSU environment
  • The Nomad 200 simulator

21
Learning Feature Detectors(Mahadevan,
Theocharous, Khaleeli MLJ 98)
  • 736 hand-labeled-grids
  • 8-fold cross-validation
  • Classification error (m7.33, s3.7)

22
Learning and Planning in H-POMDPs for Robot
Navigation
INITIAL H-POMDP
LEARNING HAND CODING
COMPILATION
TOPOLOGICAL MAP
PLANNING
ENVIRONMENT
PLANNING
PLANNING
EXECUTION
EM
TRAINED H-POMDP
NAVIGATION SYSTEM
23
Outline
  • Sequential decision-making under uncertainty
  • A Hierarchical POMDP model for robot navigation
  • Heuristic macro-action selection in H-POMDPs
  • Near Optimal macro-action selection for arbitrary
    POMDPs
  • Representing H-POMDPs as DBNs
  • Current and Future directions

24
Planning in H-POMDPs(Theocharous, Mahadevan
ICRA 2002)
Abstract actions
  • Hierarchical MDP solutions (using the options
    framework Sutton, Precup, Singh, AIJ)
  • Heuristic POMDP solutions
  • MLS

Primitive actions
Beliefs b(s)
0.35
0.3
0.2
0.1
0.05
4,
10,
23,
49,
100,
40
10
5
100
20
p(b) go-west
v(go-west)
v(go-east)
25
Plan Execution
26
Plan Execution
27
Plan Execution
28
Plan Execution
29
Intuition
  • Probability distribution at the higher level
    evolves more slowly
  • The agent does not decide what the best
    macro-action to do every time step
  • Long term actions result in robot localization

30
F-MLS Demo
31
H-MLS Demo
32
Hierarchical is More Successful
Unknown initial position
Success
Environment
Algorithm
MLS
MLS
QMDP
QMDP
33
Hierarchical Takes Less Time to Reach Goal
Unknown initial position
?
Average Steps to Goal
Environment
Algorithm
QMDP
MLS
QMDP
MLS
34
Hierarchical Plans are Computed Faster
Planning Time
Environment
Goal 2
Algorithm
Goal 1
Goal 2
Goal 1
35
Outline
  • Sequential decision-making under uncertainty
  • A Hierarchical POMDP model for robot navigation
  • Heuristic macro-action selection in H-POMDPs
  • Near Optimal macro-action selection for arbitrary
    POMDPs
  • Representing H-POMDPs as DBNs
  • Current and Future directions

36
Near Optimal Macro-action Selection(Theocharous,
Kaelbling NIPS 2003)
  • Usually agents dont require the entire belief
    space
  • Macro-actions can reduce belief space even more
  • Tested in large scale robot navigation
  • Only small part of the belief-space is required
  • Learn approximate POMDP policies fast
  • High success rate
  • Better policies
  • Does information gathering

37
Dynamic Grids
38
The Algorithm
True trajectory
True belief state
Resulting next true belief state
Simulation trajectories from g of macro
A (estimation of value at g)
Value of b is interpolated from its neighbors
Nearest grid point to b
39
Experimental Setup
40
Fewer Number of States
41
Fewer Steps to Goal
42
More Successful
43
Information Gathering

44
Information Gathering(scaling up)
45
Dynamic POMDP Abstractions(Theocharous, Mannor,
Kaelbling)
Entropy thresholds
start
goal
Localization macros
46
Fewer Steps to Goal
47
Outline
  • Sequential decision-making under uncertainty
  • A Hierarchical POMDP model for robot navigation
  • Heuristic macro-action selection in H-POMDPs
  • Near Optimal macro-action selection for arbitrary
    POMDPs
  • Representing H-POMDPs as DBNs
  • Current and Future directions

48
Dynamic Bayesian Networks
STATE POMDP
FACTORED DBN POMDP
of parameters
of parameters
49
DBN Inference
L
1
50
Representing H-POMDPs as Dynamic Bayesian
Networks(Theocharous, Murphy, Kaelbling ICRA
2004)
FACTORED DBN H-POMDP
STATE H-POMDP
51
Representing H-POMDPs as Dynamic Bayesian
Networks(Theocharous, Murphy, Kaelbling ICRA
2004)
FACTORED DBN H-POMDP
STATE H-POMDP
52
Representing H-POMDPs as Dynamic Bayesian
Networks(Theocharous, Murphy, Kaelbling ICRA
2004)
FACTORED DBN H-POMDP
STATE H-POMDP
53
Representing H-POMDPs as Dynamic Bayesian
Networks(Theocharous, Murphy, Kaelbling ICRA
2004)
FACTORED DBN H-POMDP
STATE H-POMDP
54
Representing H-POMDPs as Dynamic Bayesian
Networks(Theocharous, Murphy, Kaelbling ICRA
2004)
FACTORED DBN H-POMDP
STATE H-POMDP
55
Complexity of Inference
FACTORED DBN H-POMDP
STATE H-POMDP
DBN H-POMDP
STATE POMDP
56
Hierarchical Localizes better
Original
Factored DBN tied H-POMDP
Factored DBN H-POMDP
DBN H-POMDP
STATE POMDP
Before training
57
Hierarchical Fits Data Better
Original
Factored DBN tied H-POMDP
Factored DBN H-POMDP
DBN H-POMDP
STATE POMDP
Before training
58
Directions for Future Research
  • In the future we will explore structure learning
  • Bayesian model selection approaches
  • Methods for learning compositional hierarchies
    (recurrent nets, hierarchical sparse n-grams)
  • Natural language acquisition methods
  • Identifying isomorphic processes
  • Online learning
  • Interactive Learning
  • Application to real world problems

59
Major Contributions
  • The H-POMDP model
  • Requires less training data
  • Provides better state estimation
  • Fast planning
  • Macro-actions in POMDPS reduce uncertainty
  • Information gathering
  • Application of the algorithms to large scale
    Robot navigation
  • Map Learning
  • Planning and execution
Write a Comment
User Comments (0)
About PowerShow.com