Title: Arizona State University DMML
1Kernel Methods Gaussian Processes
- Presented by Shankar Bhargav
2Gaussian Processes
- Extending role of kernels to probabilistic
discriminative models leads to framework of
Gaussian processes - Linear regression model
- Evaluate posterior distribution over W
- Gaussian Processes Define probability
distribution over functions directly
3Linear regression
- x - input vector
- w M Dimensional weight vector
- Prior distribution of w given by the Gaussian
form - Prior distribution over w induces a probability
distribution over function y(x)
4Linear regression
- Y is a linear combination of Gaussian distributed
variables given by elements of W, - where is the design matrix with elements
- We need only mean and covariance to find the
joint distribution of Y - where K is the Gram matrix with elements
5Gaussian Processes
- Defn. Probability distributions over functions
y(x) such that the set of values of y(x)
evaluated at an arbitrary set of points jointly
have a gaussian distribution - Mean is assumed zero
- Covariance of y(x) evaluated at any two values of
x is given by the kernel function
6Gaussian Processes for regression
- To apply Gaussian process models for regression
we need to take account of noise on observed
target values - Consider noise processes with gaussian
distribution -
with - To find marginal distribution over t we need to
integrate over Y - where covariance matrix C
- has elements
7Gaussian Processes for regression
- Joint distribution over
is given by - Conditional distribution of
is a Gaussian distribution with mean and
covariance given by - where and is NN covariance matrix
8Learning the hyperparameters
- Rather than fixing the covariance function we can
use a parametric family of functions and then
infer the parameter values from the data - Evaluation of likelihood function where
denotes the hyperparameters of Gaussian
process model - Simplest approach is to make a point estimate of
by maximizing the log likelihood function
9Gaussian Process for classification
- We can adapt gaussian processes to classification
problems by transforming the output using an
appropriate nonlinear activation function - Define Gaussian process over a function a(x), and
transform using Logistic sigmoid function
,we obtain a non-Gaussian stochastic process over
functions
10The left plot shows a sample from the Gaussian
process prior over functions a(x). The right plot
shows the result of transforming this sample
using a logistic sigmoid function.
Probability distribution function over target
variable is given by Bernoulli distribution on
one dimensional input space
11Gaussian Process for classification
- To determine the predictive distribution
- we introduce a Gaussian process prior over
vector , the Gaussian prior takes
the form - The predictive distribution is given by
- where
12Gaussian Process for classification
- The integral is analytically intractable so may
be approximated using sampling methods. - Alternatively techniques based on analytical
approximation can be used - Variational Inference
- Expectation propagation
- Laplace approximation
13Illustration of Gaussian process for
classification Optimal decision boundary
Green Decision boundary from Gaussian Process
classifier - Black
14Connection to Neural Networks
- For a broad class of prior distributions over w,
the distribution of functions generated by a
neural network will tend to a Gaussian process as
M -gt Infinity - In this Gaussian process limit the ouput
variables of the neural network become
independent.
15Thank you