Support Vector Machines

About This Presentation

Title:

Support Vector Machines

Description:

Learning machines, especially linear machines that classify points by separating ... Kyte-Doolittle Hydrophobicity Scale. Alanine 1.8. Arginine -4.5. Asparagine -3.5 ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 28

Provided by: randyz

Category:

more less

Transcript and Presenter's Notes

Title: Support Vector Machines

1
Support Vector Machines

Cristianini Shawe-Taylor
Chapt. 6,
Hua Sun, J.Mol. Biol., 308, 397-407 (2001).

2
Overview

So far we have learned about
Learning machines, especially linear machines
that classify points by separating them into
categories using hyperplanes
Attribute space vs. Feature space
How to embed attribute vectors into a
high-dimensional feature space how to do this
implicitly using kernels
How to measure the performance of a learning
machine, using measures based on the margin
How to optimize functions in the presence of
constraints

3
Putting it together

A support vector machine
Maps attribute vectors into a high-dimensional
feature space (implicitly) using a kernel
Given a training set, finds a hyperplane to
categorize the data, by optimizing the
performance of the resulting learning machine
Uses an algorithm that automically focusses on
those training points most critical to
positioning the hyperplane these are the support
vectors.

4
The Maximal Margin Approach

Given a training set, generate a separating
hyperplane with maximal margin with respect to
the data.
The data is implicitly embedded in a
high-dimensional feature space using a kernel
Data must be separable not practical for many
real-world problems (even in high dimensions),
because of noise.

5
The target function
The separating plane
The functional margin
is affected by the scale factor lambda the
position of the plane is not.
6
How to set the weight vector

The geometric margin measures how far the
training points are from the plane it is the
margin computed when the weight vector has unit
length.
It is easy to see that the geometric margin is
maximized if the size of the un-normalized weight
vector is made as small as possible.
To see this, suppose that the functional margin
is 1

7
The maximal margin optimization problem
Given a separable training set
Find the hyperplane (w,b) that solves the
optimization problem
8
The Lagrangian
Conditions
9
Leads to
Substituting
10
A quadratic optimization problem
Find vector of alphas that maximizes
subject to
This solves for the optimal hyperplane
This comes from the dual formulation of the
optimization problem
11
Observations

The weight vector is expressed as a linear
combination of the training example, just as in
the case of the perceptron
Consider the inequality constraint the optimal
alphas are non-zero only for those training
points that bump into the constraint these
have minimum distance to the hyperplane. They are
support vectors.

The target function can be expressed in terms of
the dual variables
Also, we can compute the margin

13
Using a Kernel
Find vector of alphas that maximizes
subject to
14
Look at Figure 6.2
15
Soft Margins

Real-life data is noisy even if there is an
underlying distribution that can be used to
classify the data, the presence of random noise
means that some points near the separating
hyperplane may cross over and spoil the
classification the presence of noise may make
the data inseparable, even in high dimensions.
The solution is to introduce margin slack
variables which allow the margin to be violated
at individual training points.

16
Using slack variables
17
Observations

The two versions of the optimization problem
correspond to two different norms on the slack
variables, and two different Lagrangians.
The parameter C is adjustable, and is varied
while the performance of the machine is assessed.
C effectively imposes an additional constraint on
the size of the alpha inequality multipliers it
affects the accuracy of the model and how well it
regularizes.It can be adjusted to limit the
influence of outliers.

18
Hua SunJ.Mol. Biol., 308, 397-407
(2001).Project 3
19
Protein Secondary Structure Prediction

Classic problem in computational biology first
successful algorithm due to Chou Fasman in the
70s.
Contemporary methods use neural networks hidden
Markov models
Despite the sophistication of current approaches,
there appear to be fundamental limits on the
accuracy of predicting secondary structure on the
basis of sequence alone.

20
An SVM classifier for secondary structure

The authors propose to use a set of SVMs to
classify secondary structure. They justify this
by pointing to the superiority of SVMs over
competing learning methods in a number of
different fields. Also
The rigorous learning theory that describes SVMs
The relative simplicity of constructing them.

21
Implementation

Use the ever-popular sliding window.
Use Gaussian Kernel (they refer to this as
radial basis, in keeping with neural network
terminology)
Data sets RS126 and CB513. We will use the RS126
set (Rost Sander, J. Mol. Biol., 232, 584-599
(1993)), available on the class home page.
SVM code? We will use SVM Light (available on the
class home page, precompiled for OS X).

22
Training data format

The RS126 set presents a collection of 126
multiple alignments of related proteins.
We will use the head sequence of each
alignment these are collected conveniently in
FASTA format herehttp//antheprot-pbil.ibcp.fr/R
ost.html
We will use the secondary structure prediction
found using DSSP, with the category
identifications used by Hua Sun.
We will use a window of 11 residues, classifying
the secondary structure assignment of the residue
in the central position.

23
Generating Training Data

We will train three classifiers Helix/Helix,
Sheet/Sheet and Coil/Coil.
For each sequence
Slide a window of selected length along the
sequence to control how many example are
generate, we will employ a user-selected stride
when advancing the window.
Use the DSSP string to assign the type of each
residue, with the mapping H,G,I-gtH, E,B-gtE, all
others -gt C (coil, represented as - in the data
set).

24
Data Encoding

Since this is for educational/recreational
purposes, lets code our amino acids using three
attributes
Molecular weight
Kyte-Doolittle hydophobicity
Charge
Note that we can also encode the identities of
the amino acids using orthogonal binary vectors
(just like in our neural net work). Hua Sun
also employ this approach.

25
Implementing the Data Encoding

Use a single perl script
Generate three training output files, one for
H/H, one for E/E, one for C/C, and three
corresponding validation files. Use 50 of the
data for training, 50 for validation.
Generate the encoding expected by SVM Light see
http//svmlight.joachims.org/Example-1 10.43
30.12 92840.2 abcdef

Label
Attribute Value
Comment
26
Kyte-Doolittle Hydrophobicity Scale
Alanine 1.8 Arginine -4.5
Asparagine -3.5 Aspartic acid -3.5
Cysteine 2.5 Glutamine -3.5
Glutamic acid -3.5 Glycine -0.4
Histidine -3.2 Isoleucine 4.5
Leucine 3.8
Lysine -3.9 Methionine 1.9
Phenylalanine 2.8 Proline -1.6
Serine -0.8 Threonine -0.7
Tryptophan -0.9 Tyrosine -1.3
Valine 4.2
http//arbl.cvmbs.colostate.edu/molkit/hydropathy/
scales.html
27
Amino Acid MWs
Alanine Ala A 71.04 Arginine Arg R
156.10 Aspartic acid Asp D 115.03 Asparagine Asn
N 114.04 Cysteine Cys C 103.01 Glutamic acid Glu
E 129.04 Glutamine Gln Q 128.06 Glycine Gly G
57.02 Histidine His H 137.06 Hydroxyproline Hyp -
113.05 Isoleucine Ile I 113.08 Leucine Leu L
113.08
Lysine Lys K 128.09 Methionine Met M
131.04 Phenylalanine Phe F 147.07 Proline Pro P
97.05 Serine Ser S 87.03 Threonine Thr T
101.05 Tryptophan Trp W 186.08 Tyrosine Tyr Y
163.06 Valine Val V 99.07

Write a Comment

User Comments (0)

About PowerShow.com

Support Vector Machines - PowerPoint PPT Presentation

Support Vector Machines

Learning machines, especially linear machines that classify points by separating ... Kyte-Doolittle Hydrophobicity Scale. Alanine 1.8. Arginine -4.5. Asparagine -3.5 ... – PowerPoint PPT presentation