Predictive State Representations - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Predictive State Representations

Description:

none – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 22
Provided by: Kai93
Category:

less

Transcript and Presenter's Notes

Title: Predictive State Representations


1
Predictive State Representations
Duke University Machine Learning
Group Discussion Leader Kai Ni September 09, 2005
2
Outline
  • Predictive State Representations (PSR) Model
  • Constructing a PSR from a POMDP
  • Learning parameters for PSR
  • Conclusions

3
Two Popular Methods
  • There are two dominant approaches in
    controlling/AI area.
  • The generative-model approach
  • Typified by POMDP, more general, unlimited
    memory
  • Strongly dependent on a good model of system.
  • The history-based approach
  • Typified by k-order Markov methods, simple and
    effective
  • Limited by history extension.

4
The Position of PSR
Figure 1 Data flow in a) POMDP and other
recursive updating of state representation, and
b) history-based state representation.
  • The predictive state representation (PSR)
    approach
  • Like the generative-model approach in that it
    updates the state representation recursively
  • Like the history-based approach in that its
    representations are grounded in data

5
What is a PSR
  • A PSR looks to the future and represents what
    will happen.
  • A PSR is a vector of predictions for a specially
    selected set of action-observation sequences,
    called tests
  • One test for a1o1a2o2 after time k means
  • A PSR is a set of tests that is sufficient
    information to determine the prediction for all
    possible tests (a sufficient statistic).

6
The System-Dynamics Vector (1)
  • Given an ordering over all possible tests t1t2,
    the systems probability distribution over all
    tests, defines an infinite system-dynamics vector
    d.
  • The ith elements of d is the prediction of the
    ith test
  • The predictions in d have some properties

7
The System-Dynamics Vector (2)
Figure 2 a) Each of ds entries corresponds to
the prediction of the test. b)Properties of the
predictions imply structure in d.
8
System-Dynamics Matrix (1)
  • To make the structure explicit, we consider a
    matrix, D, whose columns correspond to tests and
    whose rows correspond to histories.
  • Each element is a history-conditional prediction
  • The first history is the zero length history,
    thus the system-dynamics vector d is the first
    row of the matrix D.

9
System-Dynamics Matrix (2)
Figure 3 The rows in the system-dynamics matrix
correspond to all possible histories (pasts),
while the columns correspond to all possible
tests (futures). The entries in the matrix are
the probabilities of futures given pasts.
  • All the entries of matrix D are uniquely
    determined by the vector d because both the
    numerator and the denominator are elements of d.

10
POMDP and D
  • The system-dynamics matrix D is not a model of
    the system but should be viewed as the system
    itself.
  • D can be generated from a POMDP model by
    generating each tests prediction as follows
  • Theorem A POMDP with k nominal states cannot
    model a dynamical system with dimension greater
    than k.
  • The dimension of a dynamic system equal to the
    rank of D

11
The Idea of Linear PSR
  • For any D with rank k, there must exist k
    linearly independent columns and rows. We
    consider the set of columns and let the tests
    corresponding to these columns be Q q1 q2
    qk, called core tests.
  • For any h, the prediction vector p(Qh)
    p(q1h) p(qkh is a predictive state
    representation. It forms a sufficient statistic
    for the system. All other tests can be calculated
    from the linear dependence
  • p(th) p(Qh)Tmt, where mt is the weight
    vector for test t.

12
Update the core tests
  • The predictive vector can be update recursively
    after new action-observation pair is added.

Figure 4 An example of system-dynamics matrix.
The set Q t1, t3, t4 forms a set of core
tests. The equations in the ti column show how
any entry on a row can be computed from the
prediction vector of that row.
13
Constructing a PSR from a POMDP
  • POMDP updates its belief state by computing
  • Define a function u mapping tests to (1 x k)
    vectors by
  • u(?) 1 and u(aot) (TaOa,ou(t)T)T. We call
    u(t) the outcome vector for test t.
  • A test t is linearly independent of a set of
    tests S if u(t) is linearly independent of the
    set of u(S).

14
Searching Algorithm
Figure 5 Searching algorithm for finding a
linear PSR from a POMDP.
  • The cardinality of Q is bounded by k and no test
    in Q is longer than k action-observation pairs.
  • All other tests can be computed by

15
An Example of PSR
Figure 6 The float-reset problem
  • Any linear PSR of this system has 5 core tests.
    One such PSR has the core tests and the initial
    predictions
  • Q r1, f0r1, f0f0r1, f0f0f0r1, f0f0f0f0r1.
  • q(Qh) 1, 0.5, 0.5, 0.375, 0.375
  • After a float action, the last prediction is
    updated by
  • p(f0f0f0f0r1hf0) q(Qh).0625, -.0625, -.75,
    -.75, 1T

16
Learning PSR model
  • The parameters we need to learn are weight vector
    mao and weight matrix Mao with the ith column
    equal to maoqi
  • Using an Oracle
  • Parameters can be computed by
  • Build a PSR by querying the oracle for p(QH),
    p(aoH) and p(aoqiH)
  • Without an Oracle
  • Estimate an entry p(th) in D by performing a
    Bernoulli trial
  • Using suffix-history to get around the problem
    without reset

17
TD (temporal difference) Learning
  • Update long-term guess based on the next time
    step instead of waiting until the end.
  • t a1o1a2o2a3o3 and is the estimation
    of p(th). After takes action a1 and observe
    ok1, TD estimation is
  • and model parameters can be updated based on
    error.
  • Expand the Q to include all suffixes of the core
    tests, called Y.

18
Result (1)
Table 1. Domain and Core Search Statistics. The
Asymp column denotes the approximate asymptote
for the percent of required core tests found
during the trials for suffix-history (with
parameter 0.1). The Training column denotes the
approximate smallest training size at which the
algorithm achieved the asymptote value.
19
Result (2)
  • Average error between prediction and truth.

Figure 7 Comparison of Error vs. training length
for tiger problem
20
Conclusion
  • Predictive state representation (PSR) is a new
    way to model the dynamical systems. It is more
    general than both POMDPs and nth-order Markov
    models. PSR is grounded in data flow and is easy
    to learn.
  • The system-dynamics matrix provides an
    interesting way of looking at discrete dynamical
    systems.
  • The author propose suffix-history and TD
    algorithm for learning PSR without reset. Both of
    them have small prediction error.

21
Reference
  • M. L. Littman, R. S. Sutton and S. Singh,
    Predictive Representations of State, NIPS 2002
  • S. Singh, M. R. James and M. R. Rudary,
    Predictive State Representations A New Theory
    for Modeling Dynamical Systems, UAI 2004
  • B. Wolfe, M. R. James and S. Singh, Learning
    Predictive State Representations in Dynamical
    Systems Without Reset, ICML 2005
Write a Comment
User Comments (0)
About PowerShow.com