on Gaussian Processes - PowerPoint PPT Presentation

About This Presentation
Title:

on Gaussian Processes

Description:

Tutorial. on. Gaussian Processes. DAGS '07. Jonathan Laserson and Ben Packer ... Rasmussen's NIPS 2006 Tutorial. http://www.kyb.mpg.de/bs/people/carl/gpnt06.pdf ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 22
Provided by: simon66
Learn more at: http://ai.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: on Gaussian Processes


1
9/10/07
  • Tutorial
  • on Gaussian Processes
  • DAGS 07
  • Jonathan Laserson and Ben Packer

2
Outline
  • Linear Regression
  • Bayesian Inference Solution
  • Gaussian Processes
  • Gaussian Process Solution
  • Kernels
  • Implications

3
Linear Regression
  • Task Predict y given x

4
Linear Regression
  • Predicting Y given X

5
L2 Regularized Lin Reg
  • Predicting Y given X

6
Bayesian Instead of MAP
  • Instead of using wMAP argmax P(y,wX) to
    predict y, why dont we use entire distribution
    P(y,wX) to estimate P(yX,y,x)?
  • We have P(yw,X) and P(w)
  • Combine these to get P(y,wX)
  • Marginalize to get P(yX)
  • Same as P(y,yX,x)
  • Conditional Gaussian-gtJoint to get P(yy,X,x)

7
Bayesian Inference
  • We have P(yw,X) and P(w)
  • Combine these to get P(y,wX)
  • Marginalize to get P(yX)
  • Same as P(y,yX,x)
  • Joint Gaussian-gtConditional Gaussian

Error bars!
8
Gaussian Process
  • We saw a distribution over Y directly
  • Why not start from here?
  • Instead of choosing a prior over w and defining
    fw(x), put your prior over f directly
  • Since y f(x) noise, this induces a prior over
    y
  • Next How to put a prior on f(x)

9
What is a random process?
  • Its a prior over functions
  • A stochastic process is a collection of random
    variables, f(x), indexed by x
  • It is specified by giving the joint probability
    of every finite subset of variables f(x1), f(x2),
    , f(xk)
  • In a consistent way!

10
What is a Gaussian process?
  • Its a prior over functions
  • A stochastic process is a collection of random
    variables, f(x), indexed by x
  • It is specified by giving the joint probability
    of every finite subset of variables f(x1), f(x2),
    , f(xk)
  • In a consistent way!
  • The joint probability of f(x1), f(x2), , f(xk)
    is a multivariate Gaussian

11
What is a Gaussian Process?
  • It is specified by giving the joint probability
    of every finite subset of variables f(x1), f(x2),
    , f(xk)
  • In a consistent way!
  • The joint probability of f(x1), f(x2), , f(xk)
    is a multivariate Gaussian
  • Enough to specify mean and covariance functions
  • µ(x) Ef(x)
  • C(x,x) E (f(x)- µ(x)) (f(x)- µ(x))
  • f(x1), , f(xk) N( µ(x1) µ(xk), K)
    Ki,j C(xi, xj)
  • For simplicity, well assume µ(x) 0.

12
Back to Linear Regression
  • Recall Want to put a prior directly on f
  • Can use a Gaussian Process to do this
  • How do we choose µ and C?
  • Use knowledge of prior over w
  • w N(0, s2I)
  • µ(x) Ef(x) EwTx EwTx 0
  • C(x,x) E (f(x)- µ(x)) (f(x)- µ(x))
  • Ef(x)f(x)
  • xTEwwTx
  • xT(s2I)x s2xTx

Can have f(x) WTF(x)
13
Back to Linear Regression
  • µ(x) 0
  • C(x,x) s2xTx
  • f GP(µ,C)
  • It follows that
  • f(x1),f(x2),,f(xk) N(0, K)
  • y1,y2,,yk N(0,?2I K)
  • K s2XXT
  • Same as Least Squares Solution!
  • If we use a different C, well have a different K

14
Kernels
  • If we use a different C, well have a different K
  • What do these look like?
  • Linear
  • Poly
  • Gaussian

C(x,x) s2xTx
15
Kernels
  • If we use a different C, well have a different K
  • What do these look like?
  • Linear
  • Poly
  • Gaussian

C(x,x) (1xTx)2
16
Kernels
  • If we use a different C, well have a different K
  • What do these look like?
  • Linear
  • Poly
  • Gaussian

C(x,x) exp-0.5(x-x)2
17
End
18
Learning a kernel
  • Parameterize a family of kernel functions using ?
  • Learn K using gradient of likelihood

19
GP Graphical Model
20
Starting point
  • For details, see
  • Rasmussens NIPS 2006 Tutorial
  • http//www.kyb.mpg.de/bs/people/carl/gpnt06.pdf
  • Williamsons Gaussian Processes paper
  • http//www.dai.ed.ac.uk/homes/ckiw/postscript/hbtn
    n.ps.gz
  • GPs for classification (approximation)
  • Sparse methods
  • Connection to SVMs

21
Your thoughts
Write a Comment
User Comments (0)
About PowerShow.com