Classification: Support Vector Machine - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Classification: Support Vector Machine

Description:

Classification: Support Vector Machine 10/10/07 – PowerPoint PPT presentation

Number of Views:247

Avg rating:3.0/5.0

Slides: 38

Provided by: gcyuan

Category:

more less

Transcript and Presenter's Notes

Title: Classification: Support Vector Machine

1
Classification Support Vector Machine

10/10/07

2
What hyperplane (line) can separate the two
classes of data?
3
What hyperplane (line) can separate the two
classes of data?
But there are many other choices! Which one is
the best?
4
M margin
What hyperplane (line) can separate the two
classes of data?
But there are many other choices! Which one is
the best?
5
Optimal separating hyperplane
M
M
The best hyperplane is the one that maximizes the
margin, M.
6
Computing the margin width

A hyperplane is

xTb b0 1
Find x and x- on the plus and minus plane,
so that x - x- is perpendicular to b. Then M
x - x-
xTb b0 0
xTb b0 -1
b
x
x-
7
Computing the margin width
A hyperplane is
Find x and x- on the plus and minus plane,
so that x - x- is perpendicular to b. Then M
x - x- Since xTb b0 1 x-Tb
b0 -1 (x - x-)T b 2
xTb b0 1
xTb b0 0
xTb b0 -1
b
x
x-
M x - x- 2/ b
8
Computing the marginal width
The hyperplane is separating if The maximizing
problem is subject
to
M
support vector
9
Optimal separating hyperplane

Rewrite the problem as
subject to
Lagrange function
To minimize, set partial derivatives to be 0
Can be solved by quadratic programming.

10
When the two classes are non-separable
What is the best hyperplane?
Idea allow some points to lie on the wrong side,
but not by much.
11
Support vector machine

When the two classes are not separable, the
problem is slightly modified
Find
subject to
Can be solved using quadratic programming.

12
Convert a nonseparable to separable case by
nonlinear transformation
non-separable in 1D
13
Convert a nonseparable to separable case by
nonlinear transformation
separable in 1D
14
Kernel function

Introduce nonlinear kernel functions h(x), and
work on the transformed functions.
Then the separating function is
In fact, all you need is the kernel function
Common kernels

15
Applications
16
Prediction of central nervous systems embryonic
tumor outcome

42 patient samples
5 cancer types
Array contains 6817 genes
Question are different tumors types
distinguishable from gene expression pattern?

(Pomeroy et al. 2002)
17
(Pomeroy et al. 2002)
18
Gene expressions within a cancer type cluster
together
(Pomeroy et al. 2002)
19
PCA based on all genes
(Pomeroy et al. 2002)
20
PCA based on a subset of informational genes
(Pomeroy et al. 2002)
21
(No Transcript)
22
(No Transcript)
23
Classification and diagnostic prediction of
cancers using gene expression profiling and
artificial neural networks

Four different cancer types.
88 samples
6567 genes
Goal to predict cancer types from gene
expression data

(Khan et al. 2001)
24
Classification and diagnostic prediction of
cancers using gene expression profiling and
artificial neural networks
(Khan et al. 2001)
25
Procedures

Filter out genes that have low expression values
(retain 2308 genes)
Dimension reduction by using PCA --- select top
10 principle components
3 fold cross-validation

(Khan et al. 2001)
26
Artificial Neural Network
27
(No Transcript)
28
(Khan et al. 2001)
29
Procedures

Filter out genes that have low expression values
(retain 2308 genes)
Dimension reduction by using PCA --- select top
10 principle components
3 fold cross-validation
Repeat 1250 times.

(Khan et al. 2001)
30
(Khan et al. 2001)
31
(Khan et al. 2001)
32
Acknowledgement

Sources of slides
Cheng Li
http//www.cs.cornell.edu/johannes/papers/2001/kdd
2001-tutorial-final.pdf
www.cse.msu.edu/lawhiu/intro_SVM_new.ppt

33
Aggregating predictors

Sometimes aggregating several predictors can
perform better than each single predictor alone.
Aggregating is achieved by weighted sum of
different predictors, which can be the same kind
of predictors obtained from slightly perturbed
training datasets.
Key to the improvement of accuracy is the
instability of individual classifiers, such as
the classification trees.

34
AdaBoost