Title: Input Space versus Feature Space in Kernel-Based Methods
1Input Space versus Feature Space in Kernel-Based
Methods
Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch,
Smola presented by Joe Drish Department of
Computer Science and Engineering University of
California, San Diego
2Goals
Objectives of the paper
- Introduce and illustrate the kernel trick
- Discuss the kernel mapping from input space
to feature space F - Review kernel algorithms SVMs and kernel PCA
- Discuss interpretation of the return from F to
after the dot product computation - Discuss the form of constructing sparse
approximations of feature space expansions - Evaluate and discuss the performance of SVMs and
PCA
Applications of kernel methods
- Handwritten digit recognition
- Face recognition
- De-noising this paper
3Definition
- A reproducing kernel k is a function k ?
R. - The domain of k consists of the data patterns
x1, , xl ? - is a compact set in which the data lives
- is typically a subset of RN
Computing k is equivalent to mapping data
patterns into a higher dimensional space F, and
then taking the dot product there. A feature map
? RN ? F is a function that maps the input data
patterns into a higher dimensional space F.
4Illustration
- Using a feature map ? to map the data from input
space into a higher - dimensional feature space F
F(X)
X
X
O
F(X)
F(O)
X
F(X)
O
F(O)
X
F(X)
O
F(O)
F(O)
O
F
Â
5Kernel Trick
- We would like to compute the dot product in the
higher - dimensional space, or
?(x) ?(y).
To do this we only need to compute
k(x,y),
since
k(x,y) ?(x) ?(y).
Note that the feature map ? is never explicitly
computed. We avoid this, and therefore avoid a
burdensome computational task.
6Example kernels
Gaussian
Polynomial
Sigmoid
Nonlinear separation can be achieved.
7Nonlinear Separation
8Mercer Theory
Input Space to Feature Space
Necessary condition for the kernel-mercer trick
NF is equal to the rank of ui uiT the outer
product ? is the normalized eigenfunction
analogous to a normalized eigenvector
9Mercer Linear Algebra
Linear algebra analogy
Eigenvector problem Eigenfunction problem
A
k(x,y)
u,?
?,?
- x and y are vectors
- u is the normalized eigenvector
- ? is the eigenvalue
- is the normalized eigenfunction
10RKHS, Capacity, Metric
- Reproducing kernel Hilbert space (RKHS)
- Hilbert space of functions f on some set X such
that all evaluation functions are continuous, and
the functions can be reproduced by the kernel
- Capacity of the kernel map
- Bound on the how many training examples are
required for learning, measured by the
VC-dimension h
- Metric of the kernel map
-
- Intrinsic shape of the manifold to which the data
is mapped
11Support Vector Machines
The decision boundary takes the form
- Similar to single layer perceptron
- Training examples xi with non-zero coefficients
?i are support vectors
12Kernel Principal Component Analysis
KPCA carries out a linear PCA in the feature
space F The extracted features take the
nonlinear form
The
are the components of the k-th eigenvector of the
matrix
13KPCA and Dot Products
Wish to find eigenvectors V and eigenvalues ? of
the covariance matrix
Again, replace
?(x) ?(y).
with
k(x,y).
14From Feature Space to Input Space
Pre-image problem
Here, ? is not in the image.
15Projection Distance Illustration
Approximate the vector ? ? F
16Minimizing Projection Distance
z is an approximate pre-image for ? if
Maximize
For kernels where k(z,z) 1 (Gaussian), this
reduces to
17Fixed-point iteration
So assuming a Gaussian kernel
- ?i are the eigenvectors of the centered Gram
matrix - xi are the input space
- ? is the width
Requiring no step-size, we can iterate
18Kernel PCA Toy Example
Generated an artificial data set from three point
sources, 100 point each.
19De-noising by Reconstruction, Part One
- Reconstruction from projections onto the
eigenvectors from previous example - Generated 20 new points from each Gaussian
- Represented by their first n 1, 2, , 8
nonlinear principal components
20De-noising by Reconstruction, Part Two
- Original points are moving in the direction of
de-noising
21De-noising in 2-dimensions
- A half circle and a square in the plane
- De-noised versions are the solid lines
22De-noising USPS data patterns
Patterns 7291 train 2007 test Size 16 x 16
Linear PCA
Kernel PCA
23Questions