Support Vector Machines - PowerPoint PPT Presentation

About This Presentation
Title:

Support Vector Machines

Description:

1 Mathematics & Statistics, University of Guelph, 2 Biomedical Sciences, University of Guelph, 3 Obstetrics and Gynecology, University of Western Ontario, ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 28
Provided by: hclar
Category:

less

Transcript and Presenter's Notes

Title: Support Vector Machines


1
Support Vector Machines
  • H. Clara Pong
  • Julie Horrocks1, Marianne Van den Heuvel2,Francis
    Tekpetey3, B. Anne Croy4.
  • 1 Mathematics Statistics, University of Guelph,
  • 2 Biomedical Sciences, University of Guelph,
  • 3 Obstetrics and Gynecology, University of
    Western Ontario,
  • 4 Anatomy Cell Biology, Queens University

2
Outline
  • Background
  • Separating Hyper-plane Basis Expansion
  • Support Vector Machines
  • Simulations
  • Remarks

3
Background
  • Motivation
  • The IVF (In-Vitro Fertilization) project
  • 18 infertile women
  • each undergoing the IVF treatment
  • Outcome (Outputs, Ys) Binary (pregnancy)
  • Predictor (Inputs, Xs) Longitudinal data
    (adhesion)

4
Background
  • Classification methods
  • Relatively new method Support Vector Machines
  • V. Vapnik first proposed in 1979
  • Maps input space into a high dimensional feature
    space
  • Constructs a linear classifier in the new feature
    space
  • Traditional method Discriminant Analysis
  • R.A. Fisher 1936
  • Classify according to the values from the
    discriminant functions
  • Assumption the predictors X in a given class has
    a Multi-Normal distribution.

5
Separating Hyper-plane
  • Suppose there are 2 classes (A, B)
  • y 1 for group A, y -1 for group B.
  • Let a hyper-plane be defined as f(X) ß0
    ßTX 0
  • then f(X) is the decision boundary that
    separates the two groups.
  • f(X) ß0 ßTX gt 0 for X ? A
  • f(X) ß0 ßTX lt 0 for X ? B

Given X0 ? A, misclassified when f(X0 ) lt
0. Given X0 ? B , misclassified when f(X0 ) gt 0.
6
Separating Hyper-plane
  • The perceptron learning algorithm search for
    a hyper-plane that minimizes the distance of
    misclassified points to the decision boundary.

However this does not provide a unique solution.
7
Optimal Separating Hyper-plane
  • Let C be the distance of the closest point from
    the two groups to the
  • hyper-plane.
  • The Optimal Separating hyper-plane is the unique
    separating
  • hyper-plane f(X) ß0 ßTX 0, where (ß0
    ,ßT) maximizes C.

8
Optimal Separating Hyper-plane
  • Maximization Problem

Subjects to 1. ai yi (xiTß ß0) -1 0
2. ai 0 all i1N 3. ß S i1..N ai
yixi 4. S i1..N ai yi 0 5. The
Kuhn Tucker Conditions f(X) only depends on the
xis where ai ? 0
9
Optimal Separating Hyper-plane
10
Basis Expansion
  • Suppose there are p inputs X(x1 xp)
  • Let hk(X) be a transformation that maps X from
    Rp ?R.
  • hk(X) is called the basis function.
  • H h1(X), ,hm(X) is the basis of a new
    feature space (dimm)

Example X(x1,x2) H h1(X), h2(X),h3(X)
h1(X) h1(x1,x2) x1, h2(X) h2(x1,x2) x2,
h3(X) h3(x1,x2) x1x2
X_new H(X) (x1, x2, x1x2)
11
Support Vector Machines
  • The optimal hyper-plane X f(X) ß0 ßTX0 .
  • f(X) ß0 ßTX is called the Support Vector
    Classifier.

12
Support Vector Machines
Non-separable Case training data is
non-separable.
f(X) ß0 ßTX 0
  • Hyper-plane X f(X) ß0 ßTX 0

Xi crosses the margin of its group when C yi
f(Xi) gt 0.
Si C yi f(Xi) when Xi crosses the margin and
its zero when Xi outside.
Let ?iC Si, ?i is the proportional of C that the
prediction has crossed the margin. Misclassificat
ion occurs when Si gt C (?i gt 1).
13
Support Vector Machines
The overall misclassification is S?i , and is
bounded by d.
14
Support Vector Machines
  • SVM search for an optimal hyper-plane in a new
    feature
  • space where the data are more separate.

Suppose H h1(X), ,hm(X) is the basis for
the new feature space F. All elements in the new
feature space is a linear basis expansion of X.
15
Support Vector Machines
The kernel and the basis transformation define
one another.
16
Support Vector Machines
  • Dual LaGrange function

This shows the basis transformation in SVM does
not need to be define explicitly.
17
Simulations
  • 3 cases
  • 100 simulations per case
  • Each simulation consists of 200 points
  • 100 points from each group
  • Input space 2 dimensional
  • Output 0 or 1 (2 groups)
  • Half of the points are randomly selected as the
    training set.

X(x1,x2), Y ? 0,1
18
Simulations
  • Case 1 (Normal with same covariance matrix)

19
Simulations
  • Case 1

Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations)
Training Training Testing Testing
Mean Sd Mean Sd
LDA 7.85 2.65 8.07 2.51
SVM 6.98 2.33 8.48 2.81
20
Simulations
  • Case 2 (Normal with unequal covariance matrixes)

21
Simulations
  • Case 2

Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations)
Training Training Testing Testing
Mean Sd Mean Sd
QDA 15.5 3.75 16.84 3.48
SVM 13.6 4.03 18.8 4.01
22
Simulations
  • Case 3 (Non-normal)

23
Simulations
  • Case 3

Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations)
Training Training Testing Testing
Mean Sd Mean Sd
QDA 14 3.79 16.8 3.63
SVM 9.34 3.46 14.8 3.21
24
Simulations
  • Paired t-test for differences in
    misclassifications
  • Ho mean different 0 Ha mean different ? 0
  • Case 1
  • mean different (LDA - SVM) - 0.41 , se
    0.3877
  • t -1.057, p-value 0.29
    (insignificant)
  • Case 2
  • mean different (QDA - SVM) -1.96 , se
    0.4170
  • t -4.70, p-value 8.42e-06 (significant)
  • Case 3
  • mean different (QDA - SVM) 2, sd 0.4218
  • t 4.74, p-value 7.13e-06 (significant)

25
Remarks
  • Support Vector Machines
  • Maps the original input space onto a feature
    space of higher dimension
  • No assumption on the distributions of Xs
  • Performance
  • The performances of Discriminant Analysis and SVM
    are similar (when (XY) has a Normal distribution
    and share the same S)
  • Discriminant Analysis has a better performance
  • (when the covariance matrices for the two groups
    are different)
  • SVM has a better performance
  • (when the input (X) violated the
    distribution assumption)

26
Reference
  • N. Cristianini, and J. Shawe-Taylor An
    introduction to Support Vector Machines and other
    kernel-based learning methods. New York
    Cambridge University Press, 2000.
  • J. Friedman, T. Hastie, and R. Tibshirani The
    Elements of Statistical Learning. NewYork
    Springer, 2001.
  • D. Meyer, C. Chang, and C. Lin. R Documentation
    Support Vector Machines. http//www.maths.lth.se/h
    elp/R/.R/library/e1071/html/svm.html
  • Last updated March 2006
  • H. Planatscher and J. Dietzsch. SVM-Tutorial
    using R (e1071-package) http//www.potschi.de
    /svmtut/svmtut.htm
  • M. Van Den Heuvel, J. Horrocks, S. Bashar, S.
    Taylor, S. Burke, K. Hatta, E. Lewis, and A.
    Croy. Menstrual Cycle Hormones Induce Changes in
    Functional Interac-tions Between Lymphocytes and
    Endothelial Cells. Journal of Clinical
    Endocrinology and Metabolism, 2005.

27
Thank You !
Write a Comment
User Comments (0)
About PowerShow.com