Hierarchical Methods for Planning under Uncertainty - PowerPoint PPT Presentation

1 / 61

About This Presentation

Title:

Hierarchical Methods for Planning under Uncertainty

Description:

R(a=open-right, s=tiger-left) = 10. R(a=open-left, s=tiger-left) = -100 ... The tiger problem: An action hierarchy. Pinvestigate={S0, Ainvestigate, O0, Minvestigate} ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 62

Provided by: Joelle4

Category:

more less

Transcript and Presenter's Notes

Title: Hierarchical Methods for Planning under Uncertainty

1
Hierarchical Methods forPlanning under
Uncertainty

Thesis Proposal
Joelle Pineau
Thesis Committee
Sebastian Thrun, Chair
Matthew Mason
Andrew Moore
Craig Boutilier, U. of Toronto

2
Integrating robots in living environments
The robots role - Social interaction - Mobile
manipulation - Intelligent reminding -
Remote-operation - Data collection / monitoring
3
A broad perspective
Belief state
OBSERVATIONS
STATE
USER WORLD ROBOT
ACTIONS
GOAL Selecting appropriate actions
4
Why is this a difficult problem?
UNCERTAINTY
Cause 1 Non-deterministic effects of
actions Cause 2 Partial and noisy sensor
information Cause 3 Inaccurate model of the
world and the user
5
Why is this a difficult problem?
UNCERTAINTY
Cause 1 Non-deterministic effects of
actions Cause 2 Partial and noisy sensor
information Cause 3 Inaccurate model of the
world and the user
A solution Partially Observable Markov Decision
Processes (POMDPs)
6
The truth about POMDPs

Bad news
Finding an optimal POMDP action selection policy
is computationally intractable for complex
problems.

7
The truth about POMDPs

Bad news
Finding an optimal POMDP action selection policy
is computationally intractable for complex
problems.
Good news
Many real-world decision-making problems exhibit
structure inherent to the problem domain.
By leveraging structure in the problem domain, I
propose an algorithm that makes POMDPs tractable,
even for large domains.

8
How is it done?

Use a Divide-and-conquer approach
We decompose a large monolithic problem into a
collection of loosely-related smaller problems.

9
Thesis statement
Decision-making under uncertainty can be made
tractable for complex problems by exploiting
hierarchical structure in the problem domain.
10
Outline

Problem motivation
Partially observable Markov decision processes
The hierarchical POMDP algorithm
Proposed research

11
POMDPs within the family of Markov models
12
What are POMDPs?
Components Set of states s?S Set of actions
a?A Set of observations o?O
S2
0.5
Pr(o1)0.9 Pr(o2)0.1
0.5
S1
a1
Pr(o1)0.5 Pr(o2)0.5
S3
1
a2
Pr(o1)0.2 Pr(o2)0.8
POMDP parameters Initial belief
b0(s)Pr(sos) Observation probabilities
O(s,a,o)Pr(os,a) Transition probabilities
T(s,a,s)Pr(ss,a) Rewards R(s,a)
HMM

MDP
13
A POMDP example The tiger problem
Reward Function R(alisten)
-1 R(aopen-right, stiger-left)
10 R(aopen-left, stiger-left) -100
14
What can we do with POMDPs?

1) State tracking
After an action, what is the state of the world,
st ?
2) Computing a policy
Which action, aj, should the controller apply
next?

Not so hard.
Very hard!
St-1
st
...
World
at-1
Control layer
ot
??
bt-1
??
...
Robot
15
The tiger problem State tracking
b0
Belief vector
S1 tiger-left
S2 tiger-right
Belief
16
The tiger problem State tracking
b0
Belief vector
S1 tiger-left
S2 tiger-right
Belief
obsgrowl-left
actionlisten
17
The tiger problem State tracking
b1
b0
Belief vector
S1 tiger-left
S2 tiger-right
Belief
obsgrowl-left
actionlisten
18
Policy Optimization

Which action, aj, should the controller apply
next?
In MDPs
Policy is a mapping from state to action, ? si ?
aj
In POMDPs
Policy is a mapping from belief to action, ? b ?
aj
Recursively calculate expected long-term reward
for each state/belief
Find the action that maximizes the expected
reward

19
The tiger problem Optimal policy
open-left
open-right
listen
Optimal policy
Belief vector
S1 tiger-left
S2 tiger-right
20
Complexity of policy optimization

Finite-horizon POMDPs are in worse-case doubly
exponential
Infinite-horizon undiscounted stochastic POMDPs
are EXPTIME-hard, and may not be decidable
(?n????).

21
The essence of the problem

How can we find good policies for complex POMDPs?
Is there a principled way to provide near-optimal
policies in reasonable time?

22
Outline

Problem motivation
Partially observable Markov decision processes
The hierarchical POMDP algorithm
Proposed research

23
A hierarchical approach to POMDP planning

Key Idea Exploit hierarchical structure in the
problem domain to break a problem into many
related POMDPs.
What type of structure?
Action set partitioning

subtask
abstract action
24
Assumptions

Each POMDP controller has a subset of Ao.
Each POMDP controller has full state set S0,
observation set O0.
Each controller includes discriminative reward
information.
We are given the action set partitioning graph.
We are given a full POMDP model of the problem
So,Ao,Oo,Mo.

25
The tiger problem An action hierarchy
act
open-left
investigate
open-right
listen
PinvestigateS0, Ainvestigate, O0,
Minvestigate Ainvestigatelisten, open-right
26
Optimizing the investigate controller
open-right
listen
Locally optimal policy
Belief vector
S1 tiger-left
S2 tiger-right
27
The tiger problem An action hierarchy
PactS0, Aact, O0, Mact Aactopen-left,
investigate
act
But... R(s, ainvestigate) is not defined!
open-left
investigate
open-right
listen
28
Modeling abstract actions
Insight Use the local policy of corresponding
low-level controller. General form R( si, ak)
R ( si, Policy(controllerk,si) ) Example
R(stiger-left,ak investigate)
Policy (investigate,stiger-left) open-right
open-right listen open-left tiger-left
10 -1 -100 tiger-right -100
-1 10
29
Optimizing the act controller
investigate
open-left
Locally optimal policy
Belief vector
S1 tiger-left
S2 tiger-right
30
The complete hierarchical policy
open-left
open-right
listen
Hierarchical policy
Belief vector
S1 tiger-left
S2 tiger-right
31
The complete hierarchical policy
Optimal policy
open-left
open-right
listen
Hierarchical policy
Belief vector
S1 tiger-left
S2 tiger-right
32
Results for larger simulation domains
33
Related work on hierarchical methods

Hierarchical HMMs
Fine et al., 1998
Hierarchical MDPs
DayanHinton, 1993 Dietterich, 1998 McGovern et
al., 1998 ParrRussell, 1998 Singh, 1992.
Loosely-coupled MDPs
Boutilier et al., 1997 DeanLin, 1995 Meuleau
et al. 1998 SinghCohn, 1998 WangMahadevan,
1999.
Factored state POMDPs
Boutilier et al., 1999 BoutilierPoole, 1996
HansenFeng, 2000.
Hierarchical POMDPs
Castanon, 1997 Hernandez-GardiolMahadevan,
2001 Theocharous et al., 2001
WieringSchmidhuber, 1997.

34
Outline

Problem motivation
Partially observable Markov decision processes
The hierarchical POMDP algorithm
Proposed research

35
Proposed research

1) Algorithmic design
2) Algorithmic analysis
3) Model learning
4) System development and application

36
Research block 1 Algorithmic design

Goal 1.1 Developing/implementing hierarchical
POMDP algorithm.
Goal 1.2 Extending H-POMDP for factorized state
representation.
Goal 1.3 Using state/observation abstraction.
Goal 1.4 Planning for controllers with no local
reward information.

37
Goal 1.3 State/observation abstraction

Assumption 2
Each POMDP controller has full state set S0, and
observation set O0.
Can we reduce the number of states/observations,
S and O?

38
Goal 1.3 State/observation abstraction

Assumption 2
Each POMDP controller has full state set S0, and
observation set O0.
Can we reduce the number of states/observations,
S and O?
Yes! Each controller only needs subset of
state/observation features.
What is the computational speed-up?

39
Goal 1.4 Local controller reward information

Assumption 3
Each controller includes some amount of
discriminative reward information.
Can we relax this assumption?

40
Goal 1.4 Local controller reward information

Assumption 3
Each controller includes some amount of
discriminative reward information.
Can we relax this assumption?
Possibly. Use reward shaping to select
policy-invariant reward function.
What is the benefit?
H-POMDP could solve problems with sparse reward
functions.

41
Research block 2 Algorithmic analysis

Goal 2.1 Evaluating performance of the H-POMDP
algorithm.
Goal 2.2 Quantifying the loss due to the
hierarchy.
Goal 2.3 Comparing different possible
decompositions of a problem.

42
Goal 2.1 Performance evaluation

How does the hierarchical POMDP algorithm compare
to
Exact value function methods
Sondik, 1971 Monahan, 1982 Littman, 1996
Cassandra et al, 1997.
Policy search methods
Hansen, 1998 Kearns et al., 1999 NgJordan,
2000 BaxterBartlett, 2000.
Value approximation methods
ParrRussell, 1995 Thrun, 2000.
Belief approximation methods
Nourbakhsh, 1995 KoenigSimmons, 1996
Hauskrecht, 2000 RoyThrun, 2000.
Memory-based methods
McCallum, 1996.
Consider problems from POMDP literature and
dialogue management domain.

43
Goal 2.2 Quantifying the loss

The hierarchical POMDP planning algorithm
provides an approximately-optimal policy.
How near-optimal is the policy?
Subject to some (very restrictive) conditions
The value function of top-level controller
is an upper-bound on the value
of the approximation.
Can we loosen the restrictions? Tighten the
bound?
Find a lower-bound?

Vtop(b)?Vactual(b)
44
Goal 2.3 Comparing different decomposition

Assumption 4
We are given an action set partitioning graph.
What makes a good hierarchical action
decomposition?
Comparing decompositions is the first step
towards automatic decomposition.

Replace
Manufacture
Examine
Inspect
Manufacture
Replace
Examine
Inspect
45
Research block 3 Model learning

Goal 3.1 Automatically generating good action
hierarchies.
Assumption 4 We are given an action set
partitioning graph.
Can we automatically generate a good hierarchical
decomposition?
Maybe. It is being done for hierarchical MDPs.
Goal 3.2 Including parameter learning.
Assumption 5 We are given a full POMDP model
of the problem.
Can we introduce parameter learning?
Yes! Maximum-likelihood parameter optimization
(Baum-Welch) can be used for POMDPs.

46
Research block 4 System development and
application

Goal 4.1 Building an extensive dialogue manager

Remote-control command
Facemail operations
Teleoperation module
Reminder message
Status information
Reminding module
Touchscreen input Speech utterance
Touchscreen message Speech utterance
User
Robot sensor readings
Motion command
Robot module
Dialogue Manager
47
An implemented scenario
Problem size S288, A14, O15 State
Features RobotLocation, UserLocation,
UserStatus, ReminderGoal,
UserMotionGoal, UserSpeechGoal
Patient room
Robot home
Physiotherapy
Test subjects 3 elderly residents in assisted
living facility
48
Contributions

Algorithmic contribution A novel POMDP
algorithm based on hierarchical structure.
Enables use of POMDPs for much larger problems.
Application contribution Application of POMDPs
to dialogue management is novel.
Allows design of robust robot behavioural
managers.

49
Research schedule
fall 01 spring/summer 02 spring/summer/fall
02 ongoing fall 02 / spring 03

1) Algorithmic design/implementation
2) Algorithmic analysis
3) Model learning
4) System development and application
5) Thesis writing

50
Questions?
51
A simulated robot navigation example
()
()
Domain size S11, A6, O6
52
A dialogue management example
- SayTime
Act
CheckHealth
- AskHealth - OfferHelp
CheckWeather
Move
Greet
DoMeds
Phone
- GreetGeneral - GreetMorning - GreetNight -
RespondThanks
- AskGoWhere - GoToRoom - GoToKitchen -
GoToFollow - VerifyRoom - VerifyKitchen -
VerifyFollow
- AskWeatherTime - SayCurrent - SayToday -
SayTomorrow
- StartMeds - NextMeds - ForceMeds - QuitMeds
- AskCallWho - Call911 - CallNurse -
CallRelative - Verify911 - VerifyNurse -
VerifyRelative
Domain size S20, A30, O27
53
Action hierarchy for implemented scenario
Act
Remind
Assist
Rest
Move
Contact
Inform
54
Sondiks parts manufacturing problem
Decomposition2
Decomposition1
Replace
Manufacture
Examine
Inspect
5 more decompositions
55
Manufacturing task results
56
Using state/observation abstraction
Action Set
State Set
CheckHealth
- AskHealth - OfferHelp
ReminderGoalnone, medsX CommunicationGoalnone
, personX UserHealthgood, poor, emergency
DoMeds
Phone
- AskCallWho - CallHelp - CallNurse -
CallRelative - VerifyHelp - VerifyNurse -
VerifyRelative
Phone
CommunicationGoalnone, nurse, 911, relative
57
Related work on robot planning and control

Manually-scripted dialogue strategies
DeneckeWaibel, 1997 Walker et al., 1997.
Markov decision processes (MDPs) for dialogue
management
Levin et al., 1997 Fromer, 1998 Walker et al.,
1998 GoddeauPineau, 2000 Singh et al., 2000
Walker, 2000.
Robot interface
Torrance, 1996 Asoh et al., 1999.
Classical planning
FikesNilsson, 1971 Simmons, 1987
McAllesterRosenblitt, 1991 PenberthyWeld,
1992 Kushmerick, 1995 Velosoal., 1995
SmithWeld, 1998.
Execution architectures
Firby, 1987 Musliner, 1993 Simmons, 1994
BonassoKortenkamp, 1996

58
Decision-theoretic planning models
59
The tiger problem Value function solution
open-right
open-left
listen
V
belief
Stiger-left
Stiger-right
60
Optimizing the investigate controller
listen
open-right
V
belief
Stiger-left
Stiger-right
61
Optimizing the act controller
investigate
open-left
V
belief
Stiger-left
Stiger-right

Write a Comment

User Comments (0)