Support Vector Machines Classification Venables - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Support Vector Machines Classification Venables

Description:

It allows the use of more similarity measures than the OHA ... Sepal.Length Sepal.Width Petal.Length Petal.Width Species. 1 5.1 3.5 1.4 0.2 setosa ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 25
Provided by: josephb60
Category:

less

Transcript and Presenter's Notes

Title: Support Vector Machines Classification Venables


1
Support Vector MachinesClassificationVenables
Ripley Section 12.5CSU Hayward Statistics 6601
  • Joseph Rickert
  • Timothy McKusick
  • December 1, 2004

2
Support Vector Machine
  • What is the SVM?
  • The SVM is a
  • generalization of the
  • Optimal Hyperplane Algorithm
  • Why is the SVM important?
  • It allows the use of more similarity measures
    than the OHA
  • Through the use of kernel methods it works with
    non vector data

3
Simple Linear Classifier
  • XRp
  • f(x) wTx b
  • Each x ? X is classified into 2
  • classes labeled y ? 1,-1
  • y 1 if f(x) ? 0 and
  • y -1 if f(x) lt 0
  • S (x1,y1),(x2,y2),...
  • Given S, the problem is to learn f (find w and b)
    .
  • For each f check to see if all (xi,yi) are
    correctly classified i.e. yif(xi) ? 0
  • Choose f so that the number of errors is minimized

4
But what if the training set is not linearly
separable?
  • f(x) wTx b defines two half planes xf(x) ?
    1 and x f(x) ? -1
  • Classify with the Hinge loss
  • function c(f,x,y) max(0,1-yf(x))
  • c (f,x,y) ? as distance from correct half plane ?
  • If (x,y) is correctly classified with large
    confidence then c(f,x,y) 0

wTxb gt 1
yf(x) ? 1 correct with large conf 0 ? yf(x) lt
0 correct with small conf yf(x) lt 0
misclassified
wTxb lt - 1
yf(x)
margin 2/w
1
5
SVMs combine requirements of large margin and few
misclassificationsby solving the problem
  • New formulation
  • min 1/2w2 C?c(f,xi,yi) w.r.t w,x and b
  • C is parameter that controls tradeoff between
    margin and misclassification
  • Large C ? small margins but more samples
    correctly classified with strong confidence
  • Technical difficulty hinge loss function
    c(f,xi,yi) is not differentiable
  • Even better formulation use slack
    variables xi
  • min 1/2w2 C?xi w.r.t w,x and b
  • under the constraint xi ?
    c(f,xi,yi) ()
  • But () is equivalent to
  • xi ? 0
  • xi - 1 yi(wTxi b) ? 0
  • Solve this quadratic optimization problem with
    Lagrange Multipliers

for i 1...n
6
Support Vectors
  • Lagrange Multiplier formulation
  • Find a that minimizes W(a)(-1/2)
    ??yiyjaiajxiTxj ?ai
  • under the constraints ?ai 0 and 0 ? ai
    ? C
  • The points with positive Lagrange Multipliers,ai
    gt 0, are called Support Vectors
  • The set of support vectors contains all the
    information used by the SVM to learn a
    discrimination function

a 0
a C
0 lt a lt C
7
Kernel Methods data not represented
individually, but only through a set of pairwise
comparisons
X a set of objects(proteins)
F(s) (aatcgagtcac, atggacgtct, tgcactact)
Each object represented by a sequence
S
1 0.5 0.3 0.5 1 0.6 0.3 0.6 1
K
Each number in the kernel matrix is a measure of
the similarity or distance between two objects.
8
Kernels
  • Properties of Kernels
  • Kernels are measures of similarity K(x,x) large
    when x and x are similar
  • Kernels must be
  • Positive definite
  • Symmetric
  • ? kernel K, ? a Hilbert Space F and a mapping
    F X ? F ? K(x,x) ltF(x),F(x)gt ? x,x ?
    X
  • Hence all kernels can be thought of as dot
    products in some feature space
  • Advantages of Kernels
  • Data of very different nature can be analyzed in
    a unified framework
  • No matter what the objects are, n objects are
    always represented by an n x n matrix
  • Many times, it is easier to compare objects than
    represent them numerically
  • Complete modularity between function to represent
    data and algorithm to analyze data

9
The Kernel Trick
  • Any algorithm for vector data that can be
    expressed in terms of dot products can be
    performed implicitly in the feature space
    associated with the kernel by replacing each dot
    product with the kernel representation
  • e.g. For some feature space F let
  • d(x,x) F(x) -
    F(x)
  • But
  • F(x)-F(x)2 ltF(x),F(x)gt
    ltF(x),F(x)gt - 2ltF(x),F(x)gt
  • So
  • d(x,x) (K(x,x)K(x,x)-2K(x,x))1/2

10
Nonlinear Separation
  • Nonlinear kernel
  • X is a vector space
  • the kernel F is nonlinear
  • linear separation in the feature space F can be
    associated with non linear separation in X

F
X
F
11
SVM with Kernel
  • Final formulation
  • Find a that minimizes W(a)(-1/2)??yiyjaiajxiTxj
    ?ai
  • under the constraints ?ai 0 and 0 ? ai ? C
  • Find an index i, 0 lt ai lt C and set
  • b yi - ?yjajk(xixj)
  • The classification of a new object x ? X is then
    determined by the sign of the function
  • f(x) ?yiaik(xix) b

12
iris data set (Anderson 1935) 150 cases, 50 each
of 3 species of iris Example from page 48 of
The e1071 Package.
  • First 10 lines of Iris
  • gt iris
  • Sepal.Length Sepal.Width Petal.Length
    Petal.Width Species
  • 1 5.1 3.5 1.4
    0.2 setosa
  • 2 4.9 3.0 1.4
    0.2 setosa
  • 3 4.7 3.2 1.3
    0.2 setosa
  • 4 4.6 3.1 1.5
    0.2 setosa
  • 5 5.0 3.6 1.4
    0.2 setosa
  • 6 5.4 3.9 1.7
    0.4 setosa
  • 7 4.6 3.4 1.4
    0.3 setosa
  • 8 5.0 3.4 1.5
    0.2 setosa
  • 9 4.4 2.9 1.4
    0.2 setosa
  • 10 4.9 3.1 1.5
    0.1 setosa

13
SVM ANALYSIS OF IRIS DATA
  • SVM ANALYSIS OF IRIS DATA SET
  • classification mode
  • default with factor response
  • model lt- svm(Species ., data iris)
  • summary(model)
  • Call
  • svm(formula Species ., data iris)
  • Parameters
  • SVM-Type C-classification
  • SVM-Kernel radial
  • cost 1
  • gamma 0.25
  • Number of Support Vectors 51
  • ( 8 22 21 )
  • Number of Classes 3
  • Levels
  • setosa versicolor virginica

Parameter C in Lagrange Formulation
Radial Kernel exp(-gu - v)2
14
Exploring the SVM Model
  • test with training data
  • x lt- subset(iris, select -Species)
  • y lt- Species
  • pred lt- predict(model, x)
  • Check accuracy
  • table(pred, y)
  • compute decision values
  • pred lt- predict(model, x, decision.values TRUE)
  • attr(pred, "decision.values")14,
  • y
  • pred setosa versicolor virginica
  • setosa 50 0 0
  • versicolor 0 48 2
  • virginica 0 2 48
  • setosa/versicolor setosa/virginica
    versicolor/virginica
  • 1, 1.196000 1.091667 0.6706543
  • 2, 1.064868 1.055877 0.8482041
  • 3, 1.181229 1.074370 0.6438237
  • 4, 1.111282 1.052820 0.6780645

15
Visualize classes with MDS
  • visualize (classes by color, SV by crosses)
  • plot(cmdscale(dist(iris,-5)),
  • col as.integer(iris,5),
  • ch c("o","")1150 in modelindex 1)

cmdscale multidimensional scaling or
principal coordinates analysis
black sertosa red versicolor green virginica
16
iris split into training and test sets first 25
of each case training set
  • Call
  • svm(formula fS.TR ., data iris.train)
  • Parameters
  • SVM-Type C-classification
  • SVM-Kernel radial
  • cost 1
  • gamma 0.25
  • Number of Support Vectors 32
  • ( 7 13 12 )
  • Number of Classes 3
  • Levels
  • setosa veriscolor virginica
  • SECOND SVM ANALYSIS OF IRIS DATA SET
  • classification mode
  • default with factor response
  • Train with iris.train.data
  • model.2 lt- svm(fS.TR ., data iris.train)
  • output from summary
  • summary(model.2)

17
iris test results
  • test with iris.test.data
  • x.2 lt- subset(iris.test, select -fS.TE)
  • y.2 lt- fS.TE
  • pred.2 lt- predict(model.2, x.2)
  • Check accuracy
  • table(pred.2, y.2)
  • compute decision values and probabilities
  • pred.2 lt- predict(model.2, x.2, decision.values
    TRUE)
  • attr(pred.2, "decision.values")14,
  • y.2
  • pred.2 setosa veriscolor virginica
  • setosa 25 0 0
  • veriscolor 0 25 0
  • virginica 0 0 25
  • setosa/veriscolor setosa/virginica
    veriscolor/virginica
  • 1, 1.253378 1.086341 0.6065033
  • 2, 1.000251 1.021445 0.8012664
  • 3, 1.247326 1.104700 0.6068924
  • 4, 1.164226 1.078913 0.6311566

18
iris training and test sets
19
Microarray Data from Golub et al. Molecular
Classification of Cancer Class Prediction by
Gene Expression Monitoring, Science, Vol 286,
10/15/1999
  • Expression levels of predictive genes .
  • Rows genes
  • Columns samples
  • Expression levels (EL) of each gene are relative
    to the mean EL for that gene in the initial
    dataset
  • Red if EL gt mean
  • Blue if EL lt mean
  • The scale indicates s above or below the mean
  • Top panel genes highly expressed in ALL
  • Bottom panel genes more highly expressed in AML.

20
Microarray Data Transposedrows samples,
columns genes
  • Microarray Data Transposedrows samples,
    columns genes
  • ,1 ,2 ,3 ,4 ,5,6 ,7
    ,8 ,9 ,10
  • 1, -214 -153 -58 88 -295 -558 199 -176
    252 206
  • 2, -139 -73 -1 283 -264 -400 -330 -168
    101 74
  • 3, -76 -49 -307 309 -376 -650 33 -367
    206 -215
  • 4, -135 -114 265 12 -419 -585 158 -253
    49 31
  • 5, -106 -125 -76 168 -230 -284 4 -122
    70 252
  • 6, -138 -85 215 71 -272 -558 67 -186
    87 193
  • 7, -72 -144 238 55 -399 -551 131 -179
    126 -20
  • 8, -413 -260 7 -2 -541 -790 -275 -463
    70 -169
  • 9, 5 -127 106 268 -210 -535 0 -174
    24 506
  • 10, -88 -105 42 219 -178 -246 328 -148
    177 183
  • 11, -165 -155 -71 82 -163 -430 100 -109
    56 350
  • 12, -67 -93 84 25 -179 -323 -135 -127
    -2 -66
  • 13, -92 -119 -31 173 -233 -227 -49 -62
    13 230
  • 14, -113 -147 -118 243 -127 -398 -249 -228
    -37 113
  • 15, -107 -72 -126 149 -205 -284 -166 -185
    1 -23
  • Training Data
  • 38 Samples
  • 7129 x 38 matrix
  • ALL 27
  • AML 11
  • Test Data
  • 38 Samples
  • 7129 x 34 matrix
  • ALL 20
  • AML 14

21
SVM ANALYSIS OF MICROARRAY DATAclassification
mode
  • default with factor response
  • y lt-c(rep(0,27),rep(1,11))
  • fy lt-factor(y,levels01)
  • levels(fy) lt-c("ALL","AML")
  • compute svm on first 3000 genes only because of
    memory overflow problems
  • model.ma lt- svm(fy .,data fmat.train,13000)
  • Call
  • svm(formula fy ., data fmat.train,
    13000)
  • Parameters
  • SVM-Type C-classification
  • SVM-Kernel radial
  • cost 1
  • gamma 0.0003333333
  • Number of Support Vectors 37
  • ( 26 11 )
  • Number of Classes 2
  • Levels
  • ALL AML

22
Visualize Microarray Training Data with
Multidimensional Scaling
  • visualize Training Data
  • (classes by color, SV by crosses)
  • multidimensional scaling
  • pc lt- cmdscale(dist(fmat.train,13000))
  • plot(pc,
  • col as.integer(fy),
  • pch c("o","")13000 in model.maindex 1,
  • main"Training Data ALL 'Black' and AML 'Red'
    Classes")

23
Check Model with Training DataPredict outcomes
of Test Data
  • check the training data
  • x lt- fmat.train,13000
  • pred.train lt- predict(model.ma, x)
  • check accuracy
  • table(pred.train, fy)
  • classify the test data
  • y2 lt-c(rep(0,20),rep(1,14))
  • fy2 lt-factor(y2,levels01)
  • levels(fy2) lt-c("ALL","AML")
  • x2 lt- fmat.test,13000
  • pred lt- predict(model.ma, x2)
  • check accuracy
  • table(pred, fy2)
  • fy
  • pred.train ALL AML
  • ALL 27 0
  • AML 0 11
  • fy2
  • pred ALL AML
  • ALL 20 13
  • AML 0 1

Training data correctly classified
Model is worthless so far
24
Conclusion
  • The SVM appears to be a powerful classifier
    applicable to many different kinds of data
  • But
  • Kernel design is a full time job
  • Selecting model parameters is far from obvious
  • The math is formidable
Write a Comment
User Comments (0)
About PowerShow.com