Kernelized Value Function Approximation for Reinforcement Learning

1 / 29
About This Presentation
Title:

Kernelized Value Function Approximation for Reinforcement Learning

Description:

Construct new model-based VFA. Equate novel VFA with previous work ... Model value directly with a GP ... GPRL: models s' as a GP. T&P: approximates k(s') given ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 30
Provided by: gavint6

less

Transcript and Presenter's Notes

Title: Kernelized Value Function Approximation for Reinforcement Learning


1
Kernelized Value Function Approximation for
Reinforcement Learning
  • Gavin Taylor and Ronald Parr
  • Duke University

2
Overview
3
Overview - Contributions
  • Construct new model-based VFA
  • Equate novel VFA with previous work
  • Decompose Bellman Error into reward and
    transition error
  • Use decomposition to understand VFA

reward error
transition error
Bellman Error
4
Outline
  • Motivation, Notation, and Framework
  • Kernel-Based Models
  • Model-Based VFA
  • Interpretation of Previous Work
  • Bellman Error Decomposition
  • Experimental Results and Conclusions

5
Markov Reward Processes
  • M(S,P,R,?)
  • Value V(s)expected, discounted sum of rewards
    from state s
  • Bellman equation
  • Bellman equation in matrix notation

6
Kernels
  • Properties
  • Symmetric function between two points
  • PSD K-matrix
  • Uses
  • Dot-product in high-dimensional space (kernel
    trick)
  • Gain expressiveness
  • Risks
  • Overfitting
  • High computational cost

7
Outline
  • Motivation, Notation, and Framework
  • Kernel-Based Models
  • Model-Based VFA
  • Interpretation of Previous Work
  • Bellman Error Decomposition
  • Experimental Results and Conclusions

8
Kernelized Regression
  • Apply kernel trick to least-squares regression
  • t target values
  • K kernel matrix, where
  • k(x) column vector, where
  • regularization matrix

9
Kernel-Based Models
  • Approximate reward model
  • Approximate transition model
  • Want to predict k(s) (not s)
  • Construct matrix K, where

10
Model-based Value Function
11
Model-based Value Function
Unregularized
Regularized
Whole state space
12
Previous Work
  • Kernel Least-Squares Temporal Difference Learning
    (KLSTD) Xu et. al., 2005
  • Rederive LSTD, replacing dot products with
    kernels
  • No regularization
  • Gaussian Process Temporal Difference Learning
    (GPTD) Engel, et al., 2005
  • Model value directly with a GP
  • Gaussian Processes in Reinforcement Learning
    (GPRL) Rasmussen and Kuss, 2004
  • Model transitions and value with GPs
  • Deterministic reward

13
Equivalency
GPTD noise parameter
GPRL regularization parameter
14
Outline
  • Motivation, Notation, and Framework
  • Kernel-Based Models
  • Model-Based VFA
  • Interpretation of Previous Work
  • Bellman Error Decomposition
  • Experimental Results and Conclusions

15
Model Error
  • Error in reward approximation
  • Error in transition approximation

expected next kernel values
approximate next kernel values
16
Bellman Error
Bellman Error a linear combination of reward and
transition errors
reward error
transition error
17
Outline
  • Motivation, Notation, and Framework
  • Kernel-Based Models
  • Model-Based VFA
  • Interpretation of Previous Work
  • Bellman Error Decomposition
  • Experimental Results and Conclusions

18
Experiments
  • Version of two room problem Mahadevan
    Maggioni, 2006
  • Use Bellman Error decomposition to tune
    regularization parameters

REWARD
19
Experiments
20
Conclusion
  • Novel, model-based view of kernelized RL built
    around kernel regression
  • Previous work differs from model-based view only
    in approach to regularization
  • Bellman Error can be decomposed into transition
    and reward error
  • Transition and reward error can be used to tune
    parameters

21
Thank you!
22
What about policy improvement?
  • Wrap policy iteration around kernelized VFA
  • Example KLSPI
  • Bellman error decomposition will be policy
    dependent
  • Choice of regularization parameters may be policy
    dependent
  • Our results do not apply to SARSA variants of
    kernelized RL, e.g., GPSARSA

23
Whats left?
  • Kernel selection
  • Kernel selection (not just parameter tuning)
  • Varying kernel parameters across states
  • Combining kernels (See Kolter Ng 09)
  • Computation costs in large problems
  • K is O(samples)
  • Inverting K is expensive
  • Role of sparsification, interaction
    w/regularization

24
Comparing model-based approaches
  • Transition model
  • GPRL models s as a GP
  • TP approximates k(s) given k(s)
  • Reward model
  • GPRL deterministic reward
  • TP reward approximated with regularized,
    kernelized regression

25
Dont you have to know the model?
  • For our experiments graphs Reward, transition
    errors calculated with true R, K
  • In practice Cross-validation could be used to
    tune parameters to minimize reward and transition
    errors

26
Why is the GPTD regularization term asymmetric?
  • GPTD is equivalent to TP when
  • Can be viewed as propagating the regularizer
    through the transition model
  • Is this a good idea?
  • Our contribution Tools to evaluate this question

27
What about Variances?
  • Variances can play an important role in Bayesian
    interpretations of kernelized RL
  • Can guide exploration
  • Can ground regularization parameters
  • Our analysis focuses on the mean
  • Variances a valid topic for future work

28
Does this apply to the recent work of Farahmand
et al.?
  • Not directly
  • All methods assume (s,r,s) data
  • Farahmand et al. include next states (s) in
    their kernel, i.e., k(s,s) and k(s,s)
  • Previous work, and ours, includes only s in the
    kernel k(s,s)

29
How is This Different from Parr et al. ICML 2008?
  • Parr et al. considers linear fixed point
    solutions, not kernelized methods
  • Equivalence between linear fixed point methods
    was fairly well understood already
  • Our contribution
  • We provide a unifying view of previous
    kernel-based methods
  • We extend the equivalence between model-based and
    direct methods to the kernelized case
Write a Comment
User Comments (0)