Title: Kernel Methods and SVM
1Kernel Methods and SVMs
2Predictive Modeling
Goal learn a mapping y f(x?) Need 1. A
model structure 2. A score function 3. An
optimization strategy Categorical y ? c1,,cm
classification Real-valued y regression Note
usually assume c1,,cm are mutually exclusive
and exhaustive
3Simple Two-Class Perceptron
Initialize weight vector Repeat one or more times
(indexed by k) For each training data point
xi If endIf
gradient descent
4Perceptron Dual Form
Notice that ends up as a linear combination
of yjxj Thus
ve bigger for harder examples
This leads to a dual form of the learning
algorithm
5Perceptron Dual Form
Initialize weight vector Repeat until no more
mistakes For each training data point xi If
endIf
Note the training data only enter the algorithm
via This is generally true for linear models (eg
linear regression, ridge regression).
6Learning in Feature Space
We have already seen the idea of changing the
representation of the predictors is
called the feature space
7Linear Feature Space Models
Now consider models of the form equivalently
A kernel is a function K, such that for all x,z
?X where ? is a mapping from X to an inner
product feature space F
just need to know K, not ? !
8Making Kernels
What properties must K satisfy to be a kernel? 1.
Symmetry 2. Cauchy-Schwarz other
conditions
9Mercers Theorem
K pos. semi-definite
Mercers Theorem gives necessary and sufficient
conditions for a continuous symmetric function K
to admit this representation Mercer
Kernels This kernel defines a set of functions
HK, elements of which have an expansion as So,
some kernels correspond to infinite numers of
transformed predictor variables
10Reproducing Kernel Hilbert Space
Define an inner product in this function space
as Note then that This is the reproducing
property of HK Also note, Mercer kernel
implies
11Regularization and RKHS
A general class of regularization problems has
the form Suppose f lives in a RKHS with
Some loss function (e.g. squared loss)
Penalize complex f
and
Let
Then need to solve this easy problem
12RKHS Examples
For regression with squared error loss, have so
that generalizes smoothing splines Choosing
leads to the thin-plate spline models
13Support Vector Machine
Two-class classifier with the form parameters
chosen to minimize Many of the fitted ?s are
usually zero xs corresponding the the non-zero
?s are the support vectors.