Title: Support Vector Machines
1Support Vector Machines
- Cristianini Shawe-Taylor
- Chapt. 6,
- Hua Sun, J.Mol. Biol., 308, 397-407 (2001).
2Overview
- So far we have learned about
- Learning machines, especially linear machines
that classify points by separating them into
categories using hyperplanes - Attribute space vs. Feature space
- How to embed attribute vectors into a
high-dimensional feature space how to do this
implicitly using kernels - How to measure the performance of a learning
machine, using measures based on the margin - How to optimize functions in the presence of
constraints
3Putting it together
- A support vector machine
- Maps attribute vectors into a high-dimensional
feature space (implicitly) using a kernel - Given a training set, finds a hyperplane to
categorize the data, by optimizing the
performance of the resulting learning machine - Uses an algorithm that automically focusses on
those training points most critical to
positioning the hyperplane these are the support
vectors.
4The Maximal Margin Approach
- Given a training set, generate a separating
hyperplane with maximal margin with respect to
the data. - The data is implicitly embedded in a
high-dimensional feature space using a kernel - Data must be separable not practical for many
real-world problems (even in high dimensions),
because of noise.
5The target function
The separating plane
The functional margin
is affected by the scale factor lambda the
position of the plane is not.
6How to set the weight vector
- The geometric margin measures how far the
training points are from the plane it is the
margin computed when the weight vector has unit
length. - It is easy to see that the geometric margin is
maximized if the size of the un-normalized weight
vector is made as small as possible. - To see this, suppose that the functional margin
is 1
7The maximal margin optimization problem
Given a separable training set
Find the hyperplane (w,b) that solves the
optimization problem
8The Lagrangian
Conditions
9Leads to
Substituting
10A quadratic optimization problem
Find vector of alphas that maximizes
subject to
This solves for the optimal hyperplane
This comes from the dual formulation of the
optimization problem
11Observations
- The weight vector is expressed as a linear
combination of the training example, just as in
the case of the perceptron - Consider the inequality constraint the optimal
alphas are non-zero only for those training
points that bump into the constraint these
have minimum distance to the hyperplane. They are
support vectors.
12- The target function can be expressed in terms of
the dual variables - Also, we can compute the margin
13Using a Kernel
Find vector of alphas that maximizes
subject to
14Look at Figure 6.2
15Soft Margins
- Real-life data is noisy even if there is an
underlying distribution that can be used to
classify the data, the presence of random noise
means that some points near the separating
hyperplane may cross over and spoil the
classification the presence of noise may make
the data inseparable, even in high dimensions. - The solution is to introduce margin slack
variables which allow the margin to be violated
at individual training points.
16Using slack variables
17Observations
- The two versions of the optimization problem
correspond to two different norms on the slack
variables, and two different Lagrangians. - The parameter C is adjustable, and is varied
while the performance of the machine is assessed.
C effectively imposes an additional constraint on
the size of the alpha inequality multipliers it
affects the accuracy of the model and how well it
regularizes.It can be adjusted to limit the
influence of outliers.
18Hua SunJ.Mol. Biol., 308, 397-407
(2001).Project 3
19Protein Secondary Structure Prediction
- Classic problem in computational biology first
successful algorithm due to Chou Fasman in the
70s. - Contemporary methods use neural networks hidden
Markov models - Despite the sophistication of current approaches,
there appear to be fundamental limits on the
accuracy of predicting secondary structure on the
basis of sequence alone.
20An SVM classifier for secondary structure
- The authors propose to use a set of SVMs to
classify secondary structure. They justify this
by pointing to the superiority of SVMs over
competing learning methods in a number of
different fields. Also - The rigorous learning theory that describes SVMs
- The relative simplicity of constructing them.
21Implementation
- Use the ever-popular sliding window.
- Use Gaussian Kernel (they refer to this as
radial basis, in keeping with neural network
terminology) - Data sets RS126 and CB513. We will use the RS126
set (Rost Sander, J. Mol. Biol., 232, 584-599
(1993)), available on the class home page. - SVM code? We will use SVM Light (available on the
class home page, precompiled for OS X).
22Training data format
- The RS126 set presents a collection of 126
multiple alignments of related proteins. - We will use the head sequence of each
alignment these are collected conveniently in
FASTA format herehttp//antheprot-pbil.ibcp.fr/R
ost.html - We will use the secondary structure prediction
found using DSSP, with the category
identifications used by Hua Sun. - We will use a window of 11 residues, classifying
the secondary structure assignment of the residue
in the central position.
23Generating Training Data
- We will train three classifiers Helix/Helix,
Sheet/Sheet and Coil/Coil. - For each sequence
- Slide a window of selected length along the
sequence to control how many example are
generate, we will employ a user-selected stride
when advancing the window. - Use the DSSP string to assign the type of each
residue, with the mapping H,G,I-gtH, E,B-gtE, all
others -gt C (coil, represented as - in the data
set).
24Data Encoding
- Since this is for educational/recreational
purposes, lets code our amino acids using three
attributes - Molecular weight
- Kyte-Doolittle hydophobicity
- Charge
- Note that we can also encode the identities of
the amino acids using orthogonal binary vectors
(just like in our neural net work). Hua Sun
also employ this approach.
25Implementing the Data Encoding
- Use a single perl script
- Generate three training output files, one for
H/H, one for E/E, one for C/C, and three
corresponding validation files. Use 50 of the
data for training, 50 for validation. - Generate the encoding expected by SVM Light see
http//svmlight.joachims.org/Example-1 10.43
30.12 92840.2 abcdef
Label
Attribute Value
Comment
26Kyte-Doolittle Hydrophobicity Scale
Alanine 1.8 Arginine -4.5
Asparagine -3.5 Aspartic acid -3.5
Cysteine 2.5 Glutamine -3.5
Glutamic acid -3.5 Glycine -0.4
Histidine -3.2 Isoleucine 4.5
Leucine 3.8
Lysine -3.9 Methionine 1.9
Phenylalanine 2.8 Proline -1.6
Serine -0.8 Threonine -0.7
Tryptophan -0.9 Tyrosine -1.3
Valine 4.2
http//arbl.cvmbs.colostate.edu/molkit/hydropathy/
scales.html
27Amino Acid MWs
Alanine Ala A 71.04 Arginine Arg R
156.10 Aspartic acid Asp D 115.03 Asparagine Asn
N 114.04 Cysteine Cys C 103.01 Glutamic acid Glu
E 129.04 Glutamine Gln Q 128.06 Glycine Gly G
57.02 Histidine His H 137.06 Hydroxyproline Hyp -
113.05 Isoleucine Ile I 113.08 Leucine Leu L
113.08
Lysine Lys K 128.09 Methionine Met M
131.04 Phenylalanine Phe F 147.07 Proline Pro P
97.05 Serine Ser S 87.03 Threonine Thr T
101.05 Tryptophan Trp W 186.08 Tyrosine Tyr Y
163.06 Valine Val V 99.07