CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment presentation

About This Presentation

Transcript and Presenter's Notes

Title: CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment

1
CMPUT 551 Analyzing abstraction and
approximation within MDP/POMDP environment

Magdalena Jankowska (M.Sc. - Algorithms)
Ilya Levner (M.Sc - AI/ML)

2
Outline

MDP/POMDP environment
Research Direction
Maze Domain
Motivation iNRIIS
Previous Research

3
Introduction

Consider a model (system) of an environment and
an agent where
The agent receives observations about current
system state (inputs)
The agent can take actions that effect the system
state (outputs)
The agent receives rewards/penalties for taking
various actions in various system states.

4
Formal Definition
5
Markov Decision Process (MDP)

The agent needs only the percept from its current
state to calculate the optimal action
ie the action delivering maximum reward

6
Partially Observable Markov Decision Process
(POMDP)

The percept does not carry enough information to
enable the agent to compute optimal action.
However the whole (partial) history of percepts
may allow the agent to calculate the optimal
action.

7
Research Goals
8
Maze Domain
9

Produces 3-D forest models from areal images
Applies vision operators to data tokens until a
valid 3-D scene description is produced.

10
iNRIIS
11
Challenges

152 million geographic states with each state in
one of approximately 1000 conditions (seasonal,
lighting, meteorological).
There is no way to aquire perfect V

12
Challenges (cont)

Each image is approximately 1000 by 1000
1,000,000 pixels
To make lookahead feasibile need to extract
relevant features
The feature extraction process abstracts
(buckets) several states together.

13
Challenges (cont)

The real (stochiastic) vision operators take a
long time to run.
Need a quick approximation d(s,a,s) of
operators to make lookahead feasible

14
Solutions for Markov Decision Process

if we know the transition model
value/policy iteration
Temporal Difference learning
eg. TDGammon by Tesauro
if we do not know the transition model
learning value of each action in a given state
(Q-learning)
learning the model and the value of state
I.e. Learn V and d

15
State aggregation in MDP

Grouping similar states together
Boutilier and Dearden
Different kinds, e.g.
states grouped according to their values (exact
abstraction)
some irrelevant features abstracted away
(approximate abstraction)
Used in robot navigation

16
POMDP

Much more difficult
Ways to restore the Markov property
decision based on the history of observation
and/or actions
Used by Mitchell in pole balancing task
convert into MDP in the belief states

17
POMDP cont.

Solution of underlying MDP as heuristic
e.g GIB the worlds best Bridge program by
Ginsberg
Limited lookahead (can be combined with learning
updating of the heurisitc value of states/belief
states)

18
Research Goals
19
References

G.Boutilier, T.Dean, S.Hanks, Decision-Theoretic
Planning Structural Assumptions and
Computational Leverage, JAIR(11)1-94, 1999.
R.Dearden, G.Boutilier, Abstraction and
Approximate Decision Theoretic Planning,
Artificial Intelligence, 89(1)219-283, 1997
L.P.Kaelbling, M.L.Littman, Reinforcement
Learning A Survey, JAIR(4)237-285, 1996.
L.P.Kaelbling,A.R.Cassandra, M.L.Littman,
Learning Policies for partially observable
environments Scaling up, Proceedings of the 12th
International Conference on Machine Learning,
1995.

Write a Comment

User Comments (0)

About PowerShow.com

CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment PowerPoint PPT Presentation