CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment PowerPoint PPT Presentation

presentation player overlay
1 / 19
About This Presentation
Transcript and Presenter's Notes

Title: CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment


1
CMPUT 551 Analyzing abstraction and
approximation within MDP/POMDP environment
  • Magdalena Jankowska (M.Sc. - Algorithms)
  • Ilya Levner (M.Sc - AI/ML)

2
Outline
  • MDP/POMDP environment
  • Research Direction
  • Maze Domain
  • Motivation iNRIIS
  • Previous Research

3
Introduction
  • Consider a model (system) of an environment and
    an agent where
  • The agent receives observations about current
    system state (inputs)
  • The agent can take actions that effect the system
    state (outputs)
  • The agent receives rewards/penalties for taking
    various actions in various system states.

4
Formal Definition
5
Markov Decision Process (MDP)
  • The agent needs only the percept from its current
    state to calculate the optimal action
  • ie the action delivering maximum reward

6
Partially Observable Markov Decision Process
(POMDP)
  • The percept does not carry enough information to
    enable the agent to compute optimal action.
  • However the whole (partial) history of percepts
    may allow the agent to calculate the optimal
    action.

7
Research Goals
8
Maze Domain
9
  • Produces 3-D forest models from areal images
  • Applies vision operators to data tokens until a
    valid 3-D scene description is produced.

10
iNRIIS
11
Challenges
  • 152 million geographic states with each state in
    one of approximately 1000 conditions (seasonal,
    lighting, meteorological).
  • There is no way to aquire perfect V

12
Challenges (cont)
  • Each image is approximately 1000 by 1000
    1,000,000 pixels
  • To make lookahead feasibile need to extract
    relevant features
  • The feature extraction process abstracts
    (buckets) several states together.

13
Challenges (cont)
  • The real (stochiastic) vision operators take a
    long time to run.
  • Need a quick approximation d(s,a,s) of
    operators to make lookahead feasible

14
Solutions for Markov Decision Process
  • if we know the transition model
  • value/policy iteration
  • Temporal Difference learning
  • eg. TDGammon by Tesauro
  • if we do not know the transition model
  • learning value of each action in a given state
    (Q-learning)
  • learning the model and the value of state
  • I.e. Learn V and d

15
State aggregation in MDP
  • Grouping similar states together
  • Boutilier and Dearden
  • Different kinds, e.g.
  • states grouped according to their values (exact
    abstraction)
  • some irrelevant features abstracted away
    (approximate abstraction)
  • Used in robot navigation

16
POMDP
  • Much more difficult
  • Ways to restore the Markov property
  • decision based on the history of observation
    and/or actions
  • Used by Mitchell in pole balancing task
  • convert into MDP in the belief states

17
POMDP cont.
  • Solution of underlying MDP as heuristic
  • e.g GIB the worlds best Bridge program by
    Ginsberg
  • Limited lookahead (can be combined with learning
    updating of the heurisitc value of states/belief
    states)

18
Research Goals
19
References
  • G.Boutilier, T.Dean, S.Hanks, Decision-Theoretic
    Planning Structural Assumptions and
    Computational Leverage, JAIR(11)1-94, 1999.
  • R.Dearden, G.Boutilier, Abstraction and
    Approximate Decision Theoretic Planning,
    Artificial Intelligence, 89(1)219-283, 1997
  • L.P.Kaelbling, M.L.Littman, Reinforcement
    Learning A Survey, JAIR(4)237-285, 1996.
  • L.P.Kaelbling,A.R.Cassandra, M.L.Littman,
    Learning Policies for partially observable
    environments Scaling up, Proceedings of the 12th
    International Conference on Machine Learning,
    1995.
Write a Comment
User Comments (0)
About PowerShow.com