Sparse Kernel Machines - PowerPoint PPT Presentation

1 / 52

About This Presentation

Title:

Sparse Kernel Machines

Description:

Christopher M. Bishop, Pattern Recognition and Machine Learning degree of belief data W * capture our beliefs about the ... – PowerPoint PPT presentation

Number of Views:470

Avg rating:3.0/5.0

Slides: 53

Provided by: via92

Category:

more less

Transcript and Presenter's Notes

Title: Sparse Kernel Machines

1
Sparse Kernel Machines

Christopher M. Bishop,
Pattern Recognition and Machine Learning

2
Outline

Introduction to kernel methods
Support vector machines (SVM)
Relevance vector machines (RVM)
Applications
Conclusions

3
Supervised Learning

In machine learning, applications in which the
training data comprises examples of the input
vectors along with their corresponding target
vectors are called supervised learning

(x,t) (1,60,pass) (2,53,fail) (3,77,pass) (4,34,fa
il) ?
y(x)
output
4
Classification
x2
ylt0
ygt0
y0
t1
t-1
x1
5
Regression
t
1
0
-1
x
new x
0
1
6
Linear Models

Linear models for regression and classification
if we apply feature extraction,

model parameter
input
7
Problems with Feature Space

Why feature extraction? Working in high
dimensional feature spaces solves the problem of
expressing complex functions
Problems
- there is a computational problem (working
with very large vectors)
- curse of dimensionality

8
Kernel Methods (1)

Kernel function inner products in some feature
space ? nonlinear similarity measure
Examples
- polynomial
- Gaussian

9
Kernel Methods (2)

Many linear models can be reformulated using a
dual representation where the kernel functions
arise naturally ? only require inner products
between data (input)

10
Kernel Methods (3)

We can benefit from the kernel trick
- choosing a kernel function is equivalent
to
choosing f ? no need to specify what
features are being used
- We can save computation by not explicitly
mapping the data to feature space, but
just
working out the inner product in the
data
space

11
Kernel Methods (4)

Kernel methods exploit information about the
inner products between data items
We can construct kernels indirectly by choosing a
feature space mapping f, or directly choose a
valid kernel function
If a bad kernel function is chosen, it will map
to a space with many irrelevant features, so we
need some prior knowledge of the target

12
Kernel Methods (5)

Two basic modules for kernel methods

General purpose learning model
Problem specific kernel function
13
Kernel Methods (6)

Limitation the kernel function k(xn,xm) must be
evaluated for all possible pairs xn and xm of
training points when making predictions for new
data points
Sparse kernel machine makes prediction only by a
subset of the training data points

14
Outline

Introduction to kernel methods
Support vector machines (SVM)
Relevance vector machines (RVM)
Applications
Conclusions

15
Support Vector Machines (1)

Support Vector Machines are a system for
efficiently training the linear machines in the
kernel-induced feature spaces while respecting
the insights provided by the generalization
theory and exploiting the optimization theory
Generalization theory describes how to control
the learning machines to prevent them from
overfitting

16
Support Vector Machines (2)

To avoid overfitting, SVM modify the error
function to a regularized form
where hyperparameter ? balances the
trade-off
The aim of EW is to limit the estimated functions
to smooth functions
As a side effect, SVM obtain a sparse model

17
Support Vector Machines (3)
Fig. 1 Architecture of SVM
18
SVM for Classification (1)

The mechanism to prevent overfitting in
classification is maximum margin classifiers
SVM is fundamentally a two-class classifier

19
Maximum Margin Classifiers (1)

The aim of classification is to find a D-1
dimension hyperplane to classify data in a D
dimension space
2D example

20
Maximum Margin Classifiers (2)
support vectors
support vectors
margin
21
Maximum Margin Classifiers (3)
small margin
large margin
22
Maximum Margin Classifiers (4)

Intuitively it is a robust solution
- If weve made a small error in the
location of the boundary, this gives us least
chance of causing a misclassification
The concept of max margin is usually justified
using Vapniks Statistical learning theory
Empirically it works well

23
SVM for Classification (2)

After the optimization process, we obtain the
prediction model
where (xn,tn) are N training data
we can find that an will be zero except for
that of the support vectors ? sparse

24
SVM for Classification (3)
Fig. 2 data from twp classes in two dimensions
showing contours of constant y(x) obtained from a
SVM having a Gaussian kernel function
25
SVM for Classification (4)

For overlapping class distributions, SVM allow
some of the training points to be misclassified ?
soft margin

penalty
26
SVM for Classification (5)

For multiclass problems, there are some methods
to combine multiple two-class SVMs
- one versus the rest
- one versus one ? more training time

Fig. 3 Problems in multiclass classification
using multiple SVMs
27
SVM for Regression (1)

For regression problems, the mechanism to prevent
overfitting is e-insensitive error function

quadratic error function
e-insensitive error funciton
28
SVM for Regression (2)

Error y(x)-t- e
No error
Fig . 4 e-tube
29
SVM for Regression (3)

After the optimization process, we obtain the
prediction model
we can find that an will be zero except for
that of the support vectors ? sparse

30
SVM for Regression (4)
Fig . 5 Regression results. Support vectors are
line on the boundary of the tube or outside the
tube
31
Disadvantages

Its not sparse enough since the number of
support vectors required typically grows linearly
with the size of the training set
Predictions are not probabilistic
The estimation of error/margin trade-off
parameters must utilize cross-validation which is
a waste of computation
Kernel functions are limited
Multiclass classification problems

32
Outline

Introduction to kernel methods
Support vector machines (SVM)
Relevance vector machines (RVM)
Applications
Conclusions

33
Relevance Vector Machines (1)

The relevance vector machine (RVM) is a Bayesian
sparse kernel technique that shares many of the
characteristics of SVM whilst avoiding its
principal limitations
RVM are based on Bayesian formulation and
provides posterior probabilistic outputs, as well
as having much sparser solutions than SVM

34
Relevance Vector Machines (2)

RVM intend to mirror the structure of the SVM and
use a Bayesian treatment to remove the
limitations of SVM
the kernel functions are simply treated as
basis functions, rather than dot-product in some
space

35
Bayesian Inference

Bayesian inference allows one to model
uncertainty about the world and outcomes of
interest by combining common-sense knowledge and
observational evidence.

36
Relevance Vector Machines (3)

In the Bayesian framework, we use a prior
distribution over w to avoid overfitting
where a is a hyperparameter which control
the model parameter w

37
Relevance Vector Machines (4)

Goal find most probable a and ß to compute the
predictive distribution over tnew for a new input
xnew, i.e.
p(tnew xnew, X, t,
a, ß)
Maximize the likelihood function to obtain a and
ß
p(tX, a, ß)

Training data and their target values
38
Relevance Vector Machines (5)

RVM utilize the automatic relevance
determination to achieve sparsity
where am represents the precision of wm
In the procedure of finding am, some am will
become infinity which leads the corresponding wm
to be zero ? remain relevance vectors !

39
Comparisons - Regression
SVM
RVM (on standard deviation predictive
distribution)
40
Comparisons - Regression
41
Comparison - Classification
SVM
RVM
42
Comparison - Classification
43
Comparisons

RVM are much sparser and make probabilistic
prediction
RVM gives better generalization in regression
SVM gives better generalization in classification
RVM is computationally demanding while learning

44
Outline

Introduction to kernel methods
Support vector machines (SVM)
Relevance vector machines (RVM)
Applications
Conclusions

45
Applications (1)

SVM for face detection

46
Applications (2)
Marti Hearst, Support Vector Machines ,1998
47
Applications (3)

In the feature-matching based object tracking,
SVM are used to detect false feature matches

Weiyu Zhu et al., Tracking of Object with SVM
Regression , 2001
48
Applications (4)

Recovering 3D human poses by RVM

A. Agarwal and B. Triggs, 3D Human Pose from
Silhouettes by Relevance Vector Regression 2004
49
Outline

Introduction to kernel methods
Support vector machines (SVM)
Relevance vector machines (RVM)
Applications
Conclusions

50
Conclusions

The SVM is a learning machine based on kernel
method and generalization theory which can
perform binary classification and real valued
function approximation tasks
The RVM have the same model as SVM but provides
probabilistic prediction and sparser solutions

51
References

www.support-vector.net
N. Cristianini and J. Shawe-Taylor, An
Introduction to Support Vector Machines and Other
Kernel-based Learning Methods, Cambridge
University Press,2000
M. E. Tipping, Sparse Bayesian Learning and the
Relevance Vector Machine, Journal of Machine
Learning Research, 2001

52
Underfitting and Overfitting
underfitting-too simple
overfitting-too complex
new data
Adapted from http//www.dtreg.com/svm.htm

Write a Comment

User Comments (0)