Problem size: |S|=576, |A|=19, |O|=18 presentation

About This Presentation

Transcript and Presenter's Notes

Title: Problem size: |S|=576, |A|=19, |O|=18

1
High-level robot behavior control using POMDPs
Joelle Pineau and Sebastian Thrun Carnegie Mellon
University
I - Background
III - Experimental Setup
Abstract This paper describes a robot controller
which uses probabilistic decision-making
techniques at the highest-level of behavior
control. The POMDP-based robot controller has the
ability to incorporate noisy and partial sensor
information, and can arbitrate between
information-gathering and performance-related
actions. We present a hierarchical variant of the
POMDP model which exploits structure in the
problem domain to accelerate planning. This POMDP
controller is implemented and tested onboard a
mobile robot in the context of an interactive
service task.
What are POMDPs? POMDPs model decision-theoretic
planning problems. The goal is to find an
action-selection strategy that maximizes reward,
even in the presence of state uncertainty.

Problem size S576, A19, O18
Action hierarchy

Introducing Pearl, the nursing assistant robot

State features
Robot location
Person Location
Person Status
Reminder Goal
Motion Goal
Conversation Goal
Observation features
Words from speech recognition
Button presses from touchscreen
Laser readings
Reminder messages

We need a high-level controller that can
select good behaviors or actions
share sensor information between modules
handle uncertainty
negotiate over goals from different specialized
modules
arbitrate between information-gathering and
performance actions

Formally, a POMDP is an n-tuple S,A,?,b,T,O,R
POMDP task 1 Track state After an
action, what is the state of the world? POMDP
task 2 Optimize policy Which action should the
controller apply next?

b(s) Pr(s t0)
T(s,a,s) Pr(s s,a)
O(s,a,o) Pr(o s,a)
R(s,a) ??

S Set of states
A Set of actions
? Set of observations

Top controller

Task domain Robot provides reminders and
guidance to elderly user.
Experimental scenario Robot must go meet
subjects in their apartment and take them to a
physiotherapy appointment, while also engaging
in appropriate social interaction.
Environment Nursing home near Pittsburgh, PA
Test subjects Six elderly residents in assisted
living facility.

People tracking/following
Autominder
Not so hard.
Speech recognitionsynthesis
Autonomous navigation
Very hard!
Our approach High-level robot behavior control
usingPartially Observable Markov Decision
Processes (POMDPs)
Exact policy optimization (task 2) is
computationally intractable for large problems
(20 states), therefore we need approximations.
II - Robot control using Hierarchical POMDPs
IV - Results
Assumptions
Approximating POMDPs for large domains
Performance results for three contrasting users.
Comparing performance of POMDP policy vs MDP
policy. The MDP policy does not consider
uncertainty during planning, and therefore is
unable to ask clarification actions. For all
users, performance is much better using the POMDP
policy.
Figure 1 Example of a successful guidance
experiment.

Each subtask is a separate POMDP.
Each subtask has a subset of all actions.
Primitive actions are placed in leaf nodes,
and are from the original action set.
Abstract actions are introduced in internal
nodes.
Each subtask has a non-trivial reward function
(i.e. R(s,a) is not constant).

Key Idea Exploit hierarchical structure in
the problem domain to break a large problem into
many related POMDPs. What type of structure?
Action set partitioning
(a) Pearl approaching subject
(b) Reminding of appointment
Planning with Hierarchical POMDPs
(c) Guidance through corridor
(d) Entering physiotherapy dept.

Given POMDP model M S, A, ?, b, T, O, R
and subtask hierarchy H
For each subtask h ? H
1) Set components
Ah ? children nodes
Sh ? S
?h ? ?
bh, Th, Oh, Rh
2) Minimize model
Sh ? zh(s0), , zh(sn)
?h ? yh(o0), , yh(op)
3) Solve subtask
?h ? bh, Th, Oh, Rh

Execution with Hierarchical POMDPs

Step 1 - Update belief
Step 2 - Traversing hierarchy top-down, for each
subtask
1) Get local belief
2) Consult local policy
3) If a is leaf node, terminate.
Else, go to that subtask.

(e) Pearl leaves
(e) Asking for weather forecast

Observations from nursing-home experiments
100 task completion rate amongst test subjects
overall high-level of excitement amongst
subjects
adaptive speed control for robot is necessary
improved speech recognition would be great
more verbal interaction during guidance is
recommended

Write a Comment

User Comments (0)

About PowerShow.com

Problem size: |S|=576, |A|=19, |O|=18 PowerPoint PPT Presentation