Support Vector Machine: Introduction and Applications

About This Presentation

Title:

Support Vector Machine: Introduction and Applications

Description:

Auto-correction one bit error ... X' when the amino acids are classi?ed as four groups charged, polar, aromatic, and nonpolar ... – PowerPoint PPT presentation

Number of Views:214

Avg rating:3.0/5.0

Slides: 94

Provided by: neuronCsi

Category:

more less

Transcript and Presenter's Notes

Title: Support Vector Machine: Introduction and Applications

1
Support Vector Machine Introduction and
Applications

Jung-Ying Wang 12/2/2006

2
Outline

Basic concept of SVM
SVC formulations
Kernel function
Model selection (tuning SVM hyperparameters)
SVM applications

3
Introduction

Learning
supervised learning (classification)
unsupervised learning (clustering)
Data classification
training
testing

4
Basic Concept of SVM

Consider linear separable case
Training data two classes

5
(No Transcript)
6
Decision Function

f(x) gt 0 ? class 1
f(x) lt 0 ? class 2
How to find good w and b?
There are many possible (w,b)

7
Support Vector Machines

a promising technique for data classification
statistic learning theorem maximize the distance
between two classes
linear separating hyperplane

Maximal margin distance between

9
Questions?

Linear nonseparable case
How to solve w,b?
3. Is this (w,b) good?
4. Multiple-class case

10
Method to Handle Non-separable Case
nonlinear case

mapping the input data into a higher dimensional
feature space

11
Example
12

Find a linear separating hyperplane

Questions
1. How to choose ? ?
2. Is it really better? Yes.
Some times even in high dimension spaces. Data
may still not separable.
? Allow training error

14
example

non-linear curves linear hyperplane in high
dimension space (feature space)

15
SVC formulations (the soft margin hyperplane)

Expect if separable,

16
If f is convex, x is opt. (KKT condition)
17
How to solve an opt. problem with
constraints? Using Lagrangian multipliers

Given an optimisation problem

18
What is good in Dual than Primal?

Consider the following primal problem
(P) variables w? dimension of ?(x) ( very big
number) , b?1, ?? l
(D) variables l
Derive its dual.

19
Derive the Dual The primal Lagrangian for the
problem is
The corresponding dual is found by
differentiating with respect to w, ?, and b.
20
Resubstituting the relations obtained into the
primal to obtain the following adaptation of the
dual objective function Let
then Hence, maximizing the
above objective over is equivalent to
maximizing
21
(No Transcript)
22

Primal and dual problem have the same KKT
conditions
Primal variables very large (shortcoming)
Dual of variable l
High dim. Inner product
Reduce its computational time
For special ? question can be efficiently
calculated.

23
Kernel function
24
(No Transcript)
25
Model selection (Tuning SVM hyperparameters)

Cross validation can avoid overfitting
Ex 10 fold cross-validation, l data separated to
10 groups. Each time 9 groups as training data,
1group as test data.
LOO (leave-one-out)
cross validation with l groups, each time (l-1)
data for training, 1 for testing.

26
Model Selection

The commonly used method of the model selection
is grid method

27
Model Selection of SVMs Using GA Approach

Peng-Wei Chen, Jung-Ying Wang and Hahn-Ming
Lee 2004 IJCNN International Joint Conference
on Neural Networks, 26 - 29 July 2004.
Abstract A new automatic search methodology for
model selection of support vector machines, based
on the GA-based tuning algorithm, is proposed to
search for the adequate hyperameters of SVMs.

28
Model Selection of SVMs Using GA Approach

Procedure GA-based Model Selection Algorithm
Begin
Read in dataset
Initialize hyperparameters
While (not termination condition) do
Train SVMs
Estimate general error
Create hyperparameters by tuning algorithm
End
Output the best hyperparameters
End

29
Experiment Setup

The initial population is selected at random and
the chromosome consists of one string of bits
with fixed length 20.
Each bit can have the value 0 or 1.
The first 10 bits encode the integer value of C,
and the rest 10 bits encode the decimal value of
s.
Suggestion of population size N 20 is used
The crossover rate 0.8 and mutation rate 1/20
0.05 is chosen

30
SVM Application Breast Cancer Diagnosis
Software WEKA
31
Coding for Weka

_at_relation breast_training
_at_attribute a1 real
_at_attribute a2 real
_at_attribute a3 real
_at_attribute a4 real
_at_attribute a5 real
_at_attribute a6 real
_at_attribute a7 real
_at_attribute a8 real
_at_attribute a9 real
_at_attribute class 2,4

32
Coding for Weka

_at_data
5 ,1 ,1 ,1 ,2 ,1 ,3 ,1 ,1 ,2
5 ,4 ,4 ,5 ,7 ,10,3 ,2 ,1 ,2
3 ,1 ,1 ,1 ,2 ,2 ,3 ,1 ,1 ,2
6 ,8 ,8 ,1 ,3 ,4 ,3 ,7 ,1 ,2
8 ,10,10,7 ,10,10,7 ,3 ,8 ,4
8 ,10,5 ,3 ,8 ,4 ,4 ,10,3 ,4
10,3 ,5 ,4 ,3 ,7 ,3 ,5 ,3 ,4
6 ,10,10,10,10,10,8 ,10,10,4
1 ,1 ,1 ,1 ,2 ,10,3 ,1 ,1 ,2
2 ,1 ,2 ,1 ,2 ,1 ,3 ,1 ,1 ,2
2 ,1 ,1 ,1 ,2 ,1 ,1 ,1 ,5 ,2

33
Running Results using Weka 3.3.6 predictor
Support Vector Machines (in Weka called
Sequential Minimal Optimization algorithm
Weka SMO result for 400 training data
34
Weka SMO result for 283 test data
35
Software and Model Selection

software LIBSVM
mapping function use Radial Basis Function
find the best parameter C and kernel parameter g
use cross validation to do the model selection

36
LIBSVM Model Selection using Grid Method
-c 1000 -g 10 3-fold accuracy
69.8389 -c 1000 -g 1000 3-fold accuracy
69.8389 -c 1 -g 0.002 3-fold
accuracy 97.0717 winner -c 1 -g 0.004
3-fold accuracy 96.9253
37
Coding for LIBSVM
2 1 2 2 3 3 1 4 1 5 5 6 1 7 1 8 1 9 1
2 1 3 2 2 3 2 4 3 5 2 6 3 7 3 8 1 9 1 4
110 210 310 4 7 510 610 7 8 8 2 9 1 2
1 4 2 3 3 3 4 1 5 2 6 1 7 3 8 3 9 1 2
1 5 2 1 3 3 4 1 5 2 6 1 7 2 8 1 9 1 2
1 3 2 1 3 1 4 1 5 2 6 1 7 1 8 1 9 1 4
1 9 210 310 410 510 610 710 810 9 1 2
1 5 2 3 3 6 4 1 5 2 6 1 7 1 8 1 9 1 4
1 8 2 7 3 8 4 2 5 4 6 2 7 5 810 9 1
38
Summary
39
Summary
40
Multi-class SVM

one-against-all method
k SVM models (k the number of classes)
ith SVM trained with all examples in the ith
class as positive, and others as negative
one-against-one method
k(k-1)/2 classifiers where each one trains data
from two classes

41
SVM Application in Bioinformatics

SVM-Cabins Prediction of Solvent Accessibility
Using Accumulation Cutoff Set and Support Vector
Machine
Prediction of protein secondary structure
SVM application in protein fold assignment

42
Solvent Accessibility

Waters can touch residues at the surface of a
protein
prediction of solvent-accessible surface area
helps us to understand the complete tertiary
structure of proteins

43
Motivation

Traditionally the ASA prediction is the
classification problem of binary or multiple
classes, but the arbitrary choice of cutoff
thresholds become a problem and drawback to
develop a prediction system
To overcome this, recently many statistical,
regression and machine learning methods have been
proposed to predict the real values of solvent
accessibility.
We want to propose a novel method for real value
prediction of solvent accessibility

44
Related Methods

Statistical information Wang, et al., 2004
Multiple linear regression Wang, et al., 2005
Neural network Ahmad et al., 2003 Garg et al.,
2005
Neural networks-based regression Adamczak, et
al., 2004
Support vector regression Yuan and Huang, 2004.

45
Data Sets

Rost and Shander data set (RS126)
Cuff and Barton data set (CB502)

46
Evolutionary Information

We use the BLASTP to generate multiple sequence
alignments of proteins
Expectation value (E-value) of 0.01 and choosing
the non-redundant protein sequence database (NCBI
nr database) to search.
The alignments were represented as profiles or
position specific substitution matrices (PSSM)

47
Coding Scheme

A moving window of 13 neighboring residues and
each position of a window has 22 possible values
The data obtained from PSSM which includes 20
amino acid substitution scores, indel (inserting
and deleting) and entropy were directed used as
input to our algorithm.
Prediction is made for the central residue in the
window frame

48
Intuition Idea

Using multi-class classifier to assign a real
value to the test datum

49
Two Problems

The performance of SVM was poor when the number
of labeled positive data is small.
This is mainly due to the optimal hyper-plane of
SVM may be biased when the positive data are much
less than the negative data.
Traditional 3 approaches to solve
Crisp set problem

50
Algorithm to Transfer N Binary-class SVM Models
to Real Values of Solvent Accessibility

We construct 13 accumulation cabins from the two
end-point of ASA real value.
There are 0, 0, 0, 5, 0, 10, 0, 20, 0,
30, 0, 40,
0, 50, 50, 100, 60, 100, 70, 100, 80,
100, 90, 100, and 100, 100

51
SVM Model Selection
52
Accuracy for Each Binary-class SVM Model
53
The Mainly Output Vector Patterns (over 97) for
13 Binary-class SVM Models
54
Algorithm to Assign the Prediction Result
55
An Example

For example, the vector 1110000111111
Four binary-class SVM models predict it belonging
to positive class. That is, 0, 20, 0, 30, 0,
40, and 0, 50
We can infer that this datum must be inside the
cabin range of 0, 20.
We can get the same result by taking the
intersection set of above four continuing
positive cabin ranges

56
An Example

The test datum should not be included inside the
nine cabin ranges of 0, 0, 0, 5, 0, 10,
50, 100, 60, 100, 70, 100, 80, 100, 90,
100, and 100, 100
So, we can further infer that the datum should be
inside the cabin range of 10, 20
We can also use the set difference between the
cabin range of 0, 20 and above nine negative
cabin ranges to get the result of 10, 20.
Finally, we use the middle point 15 of the cabin
range 10, 20 as our real ASA prediction value.

57
Algorithm to Assign the Prediction Result

Auto-correction one bit error
Because each binary-class SVM has at least 77
accuracy, when the number of 1 appears inside the
contiguous 0, we have the confidence to correct
it.
About 1.5 test data their vector patterns
belong to the case of one bit error inside the
two contiguous 0.

58
Algorithm to Assign the Prediction Result

The last 1.5 test data patterns we could use our
previous methods, including look-up table or
multiple linear regression to assign their real
value of ASA.
But in here, we use the simplest ways to assign
the ASA values that is, using the ASA of
individual residues to evaluate an average ASA in
our experimental data set and then assign this
average for a new residue as a prediction.
For example, the average ASA for Alanine (A)
residues in the Barton502 data set was 22.8,
which could then be assigned to all Alanine
residues in the last 1.5 test data.

59
Validation Method

Seven-fold cross validation of results was
carried out for RS126 data set.
Five-fold cross validation of results was carried
out for CB502 data set.

60
Assessment of Prediction Performance

Mean absolute error (MAE)
Correlation coefficient between the predicted and
experimental values of ASA

61
Results and Discussion
62
Results and Discussion
63
Results and Discussion
Table 2. Variation in prediction error for
different ranges of ASA
64
Table3. Mean absolute error for different amino
acid types (all values are in percentage scale)
65
Table 4. Effect of Protein Length on Mean
Absolute Error ( number of protein chains)
66
Table 5. Comparison with other real value
prediction methods
67
Introduction to Secondary Structure

The prediction of protein secondary structure is
an important step to determine structural
properties of proteins.
The secondary structure consists of local folding
regularities maintained by hydrogen bonds and is
traditionally subdivided into three classes
alpha-helices, beta-sheets, and coil.

68
(No Transcript)
69
The Secondary Structure Prediction Task
70
Coding ExampleProtein Secondary Structure
Prediction

given an amino-acid sequence
predict a secondary-structure state
(a, b, coil) for each residue in the sequence
coding considering a moving window on n
(typically 13-21) neighboring residues
FGWYALVLAMFFYOYQEKSVMKKGD

71
Methods

statistical information ( Figureau et al., 2003
Yan et al., 2004)
neural networks (Qian and Sejnowski, 1988 Rost
and Sander, 1993 Pollastri et al., 2002 Cai et
al., 2003 Kaur and Raghava, 2004 Wood and
Hirst, 2004 Lin et al., 2005)
nearest-neighbor algorithms
hidden Markov modes
support vector machines (Hua and Sun, 2001
Hyunsoo and Haesun, 2003 Ward et al., 2003 Guo
et al., 2004).

72
Milestone

In 1988, using Neural Networks first achieved
about 62 accuracy (Qian and Sejnowski, 1988
Holley and Karplus, 1989).
In 1993, using evolutionary information, Neural
Network system had improved the prediction
accuracy to over 70 (Rost and Sander, 1993).
Recently there have been approaches (e.g. Baldi
et al., 1999 Petersen et al., 2000 Pollastr and
McLysaght, 2005) using neural networks which
achieve even higher accuracy (gt 78).

73
Benchmark (Data Set Used in Protein Secondary
Structure)

Rost and Sander data set (Rost and Sander, 1993)
(referred as RS126)
Note that the RS126 data set consists of 25,184
data points in three classes where 47 are coil,
32 are helix, and 21 are strand.
Cuff and Barton data set (Cuff and Barton, 1999)
(referred as CB513)
The performance accuracy is verified by a 7-fold
cross validation.

74
Secondary Structure Assignment

According to the DSSP (Dictionary of Secondary
Structures of Proteins) algorithm (Kabsch and
Sander, 1983), which distinguishes eight
secondary structure classes
We converted the eight types into three classes
in the following way H (a-helix), I (p-helix),
and G (310-helix) as helix (a), E (extended
strand) as ß-strand (ß), and all others as coil
(c).
Different conversion methods influence the
prediction accuracy to some extent, as discussed
by Cutt and Barton (Cutt and Barton, 1999).

75
Assessment of Prediction Accuracy

Overall three-state accuracy Q3. (Qian and
Sejnowski, 1988 Rost and Sander, 1993). Q3 is
calculated by
N is the total number of residues in the test
data sets, and qs is the number of residues of
secondary structure type s that are predicted
correctly.

76
Assessment of Prediction Accuracy

A more complicated measure of accuracy is
Matthews correlation coefficient (MCC)
introduced in (Matthews, 1975)
Where TPi, TNi, FPi and FNi are numbers of true
positives, true negatives, false positives, and
false negatives for class i, respectively. It can
be clearly seen that a higher MCC is better.

77
Support Vector Machines Predictor

Using the software LIBSVM (Chang and Lin, 2005)
as our SVM predictor
Using the RBF kernel for all experiments
Choosing optimal parameter for support vector
machines. Find the pair of C 10 and ? 0.01
that achieves the best prediction rate

78
Coding Scheme
79
Coding Scheme for Support Vector Machines

We use the BLASTP to generate the alignments
(profile) of proteins in our database
The expectation value (E-value) of 10.0 and
choosing the non-redundant protein sequence
database (NCBI nr database) to search.
The profile data obtained from BLASTPGP are
normalized to 01, and then used as inputs to our
SVM predictor.

80
Last position-specific scoring matrix computed,
weighted observed percentages rounded down,
information per position, and relative weight of
gapless real matches to pseudocounts
A R N D C Q E G H I L K M F P S
T W Y V A R N D C Q E G H
I L K M F P S T W Y V 1 R
-1 5 -1 -1 -4 3 0 -2 -1 -3 -3 3 -2 -3 -2 -1
-1 -3 -2 -3 0 50 0 0 0 17 0 0 0
0 0 33 0 0 0 0 0 0 0 0 0.63
0.09 2 T 0 -2 -1 -2 5 -1 -2 -2 -2 -2 -2
-1 -2 -3 -2 3 4 -3 -2 -1 0 0 0 0 22
0 0 0 0 0 0 0 0 0 0 30 48 0
0 0 0.54 0.15 3 D -1 -3 4 1 -4 -2 -2
6 -2 -5 -5 -2 -4 -4 -3 -1 -2 -4 -4 -4 0 0
24 8 0 0 0 68 0 0 0 0 0 0
0 0 0 0 0 0 1.04 0.34 4 C -2 0
1 -3 9 -3 -3 -3 -3 -2 -2 -3 -2 0 -3 -1 2 -3 -3
-2 0 6 10 0 61 0 0 0 0 0 0
0 0 6 0 0 17 0 0 0 1.14 0.38
5 Y -3 -3 -4 -5 -4 -3 -4 -5 0 -3 -2 -3 -2 2
-4 -3 -3 1 9 -3 0 0 0 0 0 0 0
0 0 0 0 0 0 3 0 0 0 0 97 0
1.84 0.53 6 G 1 -3 -2 -2 -4 -2 0 6 -3
-4 -5 -3 -4 -4 -3 -1 1 -4 -4 -4 11 0 0 0
0 0 9 72 0 0 0 0 0 0 0 0
9 0 0 0 1.10 0.56 7 N -3 -2 4 4 4
0 1 -3 2 -3 -3 -1 2 -4 -3 -1 -2 -5 -3 -2 0
0 30 26 11 3 9 0 5 0 0 3 9
0 0 2 0 0 0 2 0.64 0.56 8 V -2
-4 -4 -4 -3 -4 -4 -5 -5 6 0 -4 0 -2 -4 -3 1
-4 -3 3 0 0 0 0 0 0 0 0 0 68
0 0 0 0 0 0 9 0 0 23 0.95
0.56 9 N 1 1 3 -2 -3 1 1 -3 -2 -2 -2
-1 5 -3 -3 2 2 -4 -3 -2 11 8 14 0 0
6 8 0 0 0 0 0 23 0 0 17 12 0
0 0 0.40 0.57 10 R 0 2 1 -1 2 1 0
-3 -2 -3 -2 1 -3 -4 -3 2 3 -4 -3 -3 5 11
5 3 5 7 5 0 0 0 3 11 0 0 0
20 24 0 0 0 0.38 0.57 11 I 0 -4 -4
-5 -3 -3 -4 -4 -4 4 3 -4 3 -2 -4 -3 -2 -4 -3
3 9 0 0 0 0 0 0 0 0 29 33
0 10 0 0 0 0 0 0 20 0.65 0.57
12 D -2 -2 -1 4 -4 3 3 -3 -2 -4 -4 1 -3 -5
-3 2 1 -5 -4 -4 0 0 0 26 0 16 26
0 0 0 0 7 0 0 0 17 7 0 0 0
0.67 0.57
81
Results

Results for the ROST126 protein set
(Using the seven-fold cross validation)

82
Results

Results for the CB513 protein set
(Using the seven-fold cross validation)

83
SVM Application in Protein Fold Assignment

"Fine-grained Protein Fold Assignment by Support
Vector Machines using generalized n-peptide
Coding Schemes and jury voting from multiple
parameter sets",
Chin-Sheng Yu, Jung-Ying Wang, Jin-Moon Young,
P.-C. Lyu, Chih-Jen Lin, Jenn-Kang Hwang,
Proteins Structure, Function, Genetics, 50,
531-536 (2003).

84
Data Sets

Ding and Dubchak which consists of 386 proteins
of the most populated 27 SCOP folds in which the
protein pairs have sequence identity below 35
for the aligned subsequences longer than 80
residues.
These 27 proteins folds cover most major
structural classes and have at least 7 or more
proteins in their classes

85
Coding Scheme

We denote the coding schemes by X if all 20 amino
acids are used
X when the amino acids are classi?ed as four
groups charged, polar, aromatic, and nonpolar
X, if predicted secondary structures are used
We assign the symbol X the values of D, T, Q,
and P, denoting the distributions of dipeptides,
3-peptides, and 4-peptides, respectively.

86
Methods
87
Results
88
Results
89
Results
90
Results
91
Structure Example Jury SVM Predictor
92
Structure ExampleSVM Combiner

93

Thanks!

Write a Comment

User Comments (0)

About PowerShow.com

Support Vector Machine: Introduction and Applications - PowerPoint PPT Presentation

Support Vector Machine: Introduction and Applications

Auto-correction one bit error ... X' when the amino acids are classi?ed as four groups charged, polar, aromatic, and nonpolar ... – PowerPoint PPT presentation