Input Space versus Feature Space in Kernel-Based Methods - PowerPoint PPT Presentation

About This Presentation
Title:

Input Space versus Feature Space in Kernel-Based Methods

Description:

Input Space versus Feature Space in Kernel-Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of Computer ... – PowerPoint PPT presentation

Number of Views:210
Avg rating:3.0/5.0
Slides: 24
Provided by: csewebUcs
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Input Space versus Feature Space in Kernel-Based Methods


1
Input Space versus Feature Space in Kernel-Based
Methods
Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch,
Smola presented by Joe Drish Department of
Computer Science and Engineering University of
California, San Diego
2
Goals
Objectives of the paper
  • Introduce and illustrate the kernel trick
  • Discuss the kernel mapping from input space
    to feature space F
  • Review kernel algorithms SVMs and kernel PCA
  • Discuss interpretation of the return from F to
    after the dot product computation
  • Discuss the form of constructing sparse
    approximations of feature space expansions
  • Evaluate and discuss the performance of SVMs and
    PCA

Applications of kernel methods
  • Handwritten digit recognition
  • Face recognition
  • De-noising this paper

3
Definition
  • A reproducing kernel k is a function k ?
    R.
  • The domain of k consists of the data patterns
    x1, , xl ?
  • is a compact set in which the data lives
  • is typically a subset of RN

Computing k is equivalent to mapping data
patterns into a higher dimensional space F, and
then taking the dot product there. A feature map
? RN ? F is a function that maps the input data
patterns into a higher dimensional space F.
4
Illustration
  • Using a feature map ? to map the data from input
    space into a higher
  • dimensional feature space F

F(X)
X
X
O
F(X)
F(O)
X
F(X)
O
F(O)
X
F(X)
O
F(O)
F(O)
O
F
 
5
Kernel Trick
  • We would like to compute the dot product in the
    higher
  • dimensional space, or

?(x) ?(y).
To do this we only need to compute
k(x,y),
since
k(x,y) ?(x) ?(y).
Note that the feature map ? is never explicitly
computed. We avoid this, and therefore avoid a
burdensome computational task.
6
Example kernels
Gaussian
Polynomial
Sigmoid
Nonlinear separation can be achieved.
7
Nonlinear Separation
8
Mercer Theory
Input Space to Feature Space
Necessary condition for the kernel-mercer trick
NF is equal to the rank of ui uiT the outer
product ? is the normalized eigenfunction
analogous to a normalized eigenvector
9
Mercer Linear Algebra
Linear algebra analogy
Eigenvector problem Eigenfunction problem

A
k(x,y)
u,?
?,?
  • x and y are vectors
  • u is the normalized eigenvector
  • ? is the eigenvalue
  • is the normalized eigenfunction

10
RKHS, Capacity, Metric
  • Reproducing kernel Hilbert space (RKHS)
  • Hilbert space of functions f on some set X such
    that all evaluation functions are continuous, and
    the functions can be reproduced by the kernel
  • Capacity of the kernel map
  • Bound on the how many training examples are
    required for learning, measured by the
    VC-dimension h
  • Metric of the kernel map
  • Intrinsic shape of the manifold to which the data
    is mapped

11
Support Vector Machines
The decision boundary takes the form
  • Similar to single layer perceptron
  • Training examples xi with non-zero coefficients
    ?i are support vectors

12
Kernel Principal Component Analysis
KPCA carries out a linear PCA in the feature
space F The extracted features take the
nonlinear form
The
are the components of the k-th eigenvector of the
matrix
13
KPCA and Dot Products
Wish to find eigenvectors V and eigenvalues ? of
the covariance matrix
Again, replace
?(x) ?(y).
with
k(x,y).
14
From Feature Space to Input Space
Pre-image problem
Here, ? is not in the image.
15
Projection Distance Illustration
Approximate the vector ? ? F
16
Minimizing Projection Distance
z is an approximate pre-image for ? if
Maximize
For kernels where k(z,z) 1 (Gaussian), this
reduces to
17
Fixed-point iteration
So assuming a Gaussian kernel
  • ?i are the eigenvectors of the centered Gram
    matrix
  • xi are the input space
  • ? is the width

Requiring no step-size, we can iterate
18
Kernel PCA Toy Example
Generated an artificial data set from three point
sources, 100 point each.
19
De-noising by Reconstruction, Part One
  • Reconstruction from projections onto the
    eigenvectors from previous example
  • Generated 20 new points from each Gaussian
  • Represented by their first n 1, 2, , 8
    nonlinear principal components

20
De-noising by Reconstruction, Part Two
  • Original points are moving in the direction of
    de-noising

21
De-noising in 2-dimensions
  • A half circle and a square in the plane
  • De-noised versions are the solid lines

22
De-noising USPS data patterns
Patterns 7291 train 2007 test Size 16 x 16
Linear PCA
Kernel PCA
23
Questions
Write a Comment
User Comments (0)
About PowerShow.com