Learning Linear Predictive State Representations - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Learning Linear Predictive State Representations

Description:

Direct Sampling and ... cheese maze, shuttle, network, tiger, paint, bridge repair ... Direct Sampling method can achieve decent accuracy given ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 12
Provided by: Wol56
Category:

less

Transcript and Presenter's Notes

Title: Learning Linear Predictive State Representations


1
Learning Linear Predictive State Representations
  • Britton Wolfe
  • EECS 592
  • April 19, 2004

2
High-Level Problem
  • Build a model of an environment
  • Discrete-time scenarios
  • Can be used for planning
  • Different model types
  • POMDP posit hidden states of the system
  • Linear PSRs fully grounded in observables
  • IO-OOMs similar to PSRs, but less general

3
Linear PSRs
  • Test sequence of actions/observations
  • Succeeds if, given actions are taken,
    observations are seen
  • Core Tests a set of tests such that predictions
    for their success are sufficient for all test
    predictions
  • Predictions form the state of the model
  • Update Parameters
  • Matrices that allow updates of core tests
    predictions based on most recent
    action/observation

4
Learning Linear PSRs
  • Problem How to find a set of core tests and the
    corresponding update parameters from interacting
    in an environment?
  • Current Methods
  • Singh, Littman, Jong, Pardoe, Stone (ICML 2003)
    learning the parameters given the core tests
  • Singh, James (forthcoming) learning both core
    tests and parameters using an artificial reset

5
Goal of Project
  • Two algorithms
  • Based on Jaegers algorithm for learning IO-OOMs
    by sampling a training sequence
  • Direct Sampling and Intermediate Model
  • Question Can these algorithms generate models
    that can make predictions about the occurrence of
    some tests T with an MSE less than 0.001, using a
    training sequence of length kp?

6
Amount of Training Data
  • Depends on the current problem
  • kp - allow more data for more complex problems
  • Inversely proportional to
  • Lowest transition probability
  • Lowest observation probability
  • Proportional to
  • Square of number of states
  • Number of actions
  • Ranged from 320 to 600,000

7
Evaluation Tests T
  • Test sequence used random walk policy
  • Timepoints to evaluate the PSR chosen randomly
  • Measure predictions for observing each possible
    observation, given the last action taken
  • Compute MSE against accurate model

time
a1 o1 a6 o2 a3 o1
Eval Pr(a1,o1) and Pr(a1,o2)
Eval Pr(a3,o1) and Pr(a3,o2)
8
Methods
  • Test sequence length 30000
  • Expected eval points 30000/24 1250
  • Results were averaged over all eval points
  • Training sequences 1000, 32 million
  • 20 epochs for each training length for each
    problem
  • Problems 4x3 maze, cheese maze, shuttle,
    network, tiger, paint, bridge repair (problems
    from http//www.cs.brown.edu/research/ai/pomdp/ex
    amples/index.html)

9
Direct Sampling Results
  • No problem strictly met its kp deadline
  • Six problems median eventually crept below 0.001
  • Four problems mean never below 0.001

10
Intermediate Model Results
  • Method took significantly longer to run than
    Direct Sampling
  • Did not demonstrate significant improvement in
    predictions

11
Summary
  • Direct Sampling method can achieve decent
    accuracy given substantial training data
  • Intermediate Model method takes prohibitively
    long to run, does not appear to give better
    precision, and has severe trouble with small
    training sets
  • Future directions
  • Analyze why the methods achieve different
    performance on the different problems
Write a Comment
User Comments (0)
About PowerShow.com