Input Space versus Feature Space in Kernel-Based Methods - PowerPoint PPT Presentation

About This Presentation

Title:

Input Space versus Feature Space in Kernel-Based Methods

Description:

Input Space versus Feature Space in Kernel-Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of Computer ... – PowerPoint PPT presentation

Number of Views:210

Avg rating:3.0/5.0

Slides: 24

Provided by: csewebUcs

Learn more at: https://cseweb.ucsd.edu

Category:

more less

Transcript and Presenter's Notes

Title: Input Space versus Feature Space in Kernel-Based Methods

1
Input Space versus Feature Space in Kernel-Based
Methods
Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch,
Smola presented by Joe Drish Department of
Computer Science and Engineering University of
California, San Diego
2
Goals
Objectives of the paper

Introduce and illustrate the kernel trick
Discuss the kernel mapping from input space
to feature space F
Review kernel algorithms SVMs and kernel PCA
Discuss interpretation of the return from F to
after the dot product computation
Discuss the form of constructing sparse
approximations of feature space expansions
Evaluate and discuss the performance of SVMs and
PCA

Applications of kernel methods

Handwritten digit recognition
Face recognition
De-noising this paper

3
Definition

A reproducing kernel k is a function k ?
R.
The domain of k consists of the data patterns
x1, , xl ?
is a compact set in which the data lives
is typically a subset of RN

Computing k is equivalent to mapping data
patterns into a higher dimensional space F, and
then taking the dot product there. A feature map
? RN ? F is a function that maps the input data
patterns into a higher dimensional space F.
4
Illustration

Using a feature map ? to map the data from input
space into a higher
dimensional feature space F

F(X)
X
X
O
F(X)
F(O)
X
F(X)
O
F(O)
X
F(X)
O
F(O)
F(O)
O
F

5
Kernel Trick

We would like to compute the dot product in the
higher
dimensional space, or

?(x) ?(y).
To do this we only need to compute
k(x,y),
since
k(x,y) ?(x) ?(y).
Note that the feature map ? is never explicitly
computed. We avoid this, and therefore avoid a
burdensome computational task.
6
Example kernels
Gaussian
Polynomial
Sigmoid
Nonlinear separation can be achieved.
7
Nonlinear Separation
8
Mercer Theory
Input Space to Feature Space
Necessary condition for the kernel-mercer trick
NF is equal to the rank of ui uiT the outer
product ? is the normalized eigenfunction
analogous to a normalized eigenvector
9
Mercer Linear Algebra
Linear algebra analogy
Eigenvector problem Eigenfunction problem

A
k(x,y)
u,?
?,?

x and y are vectors
u is the normalized eigenvector
? is the eigenvalue
is the normalized eigenfunction

10
RKHS, Capacity, Metric

Reproducing kernel Hilbert space (RKHS)
Hilbert space of functions f on some set X such
that all evaluation functions are continuous, and
the functions can be reproduced by the kernel

Capacity of the kernel map
Bound on the how many training examples are
required for learning, measured by the
VC-dimension h

Metric of the kernel map
Intrinsic shape of the manifold to which the data
is mapped

11
Support Vector Machines
The decision boundary takes the form

Similar to single layer perceptron
Training examples xi with non-zero coefficients
?i are support vectors

12
Kernel Principal Component Analysis
KPCA carries out a linear PCA in the feature
space F The extracted features take the
nonlinear form
The
are the components of the k-th eigenvector of the
matrix
13
KPCA and Dot Products
Wish to find eigenvectors V and eigenvalues ? of
the covariance matrix
Again, replace
?(x) ?(y).
with
k(x,y).
14
From Feature Space to Input Space
Pre-image problem
Here, ? is not in the image.
15
Projection Distance Illustration
Approximate the vector ? ? F
16
Minimizing Projection Distance
z is an approximate pre-image for ? if
Maximize
For kernels where k(z,z) 1 (Gaussian), this
reduces to
17
Fixed-point iteration
So assuming a Gaussian kernel

?i are the eigenvectors of the centered Gram
matrix
xi are the input space
? is the width

Requiring no step-size, we can iterate
18
Kernel PCA Toy Example
Generated an artificial data set from three point
sources, 100 point each.
19
De-noising by Reconstruction, Part One