Title: Predicting Secondary Structure of AllHelical Proteins Using Hidden Markov Support Vector Machines Bl
1Predicting Secondary Structure of All-Helical
Proteins UsingHidden Markov Support Vector
MachinesBlaise Gassend, Charles W. O'Donnell,
William Thies, Andrew Lee, Marten van Dijk, and
Srinivas Devadas
- Computer Science and Artificial Intelligence
Laboratory - Massachusetts Institute of Technology
- Workshop on Pattern Recognition in Bioinformatics
August 20, 2006
2Protein Structure Prediction
- Classical problem given sequence, predict
structure - High-level approaches
- 1. Energy-minimization (ab-initio) techniques
- - Elegant, but often lack correct parameters
- 2. Homology-based techniques
- - Useful, but hard to predict new proteins
Sequence
Structure
Our approach Use energy minimization, butlearn
parameters from existing proteins
3Our Framework (Training)
Protein Data Bank
Correct structure
Amino-acid Sequence
Prediction Algorithm
Energy Parameters
Predictedstructure
LearningAlgorithm
correct
incorrect
Done!
Constraints energy(incorrect) gt energy(correct)
4Our Framework (Testing)
Amino-acid Sequence
Prediction Algorithm
Energy Parameters
Predictedstructure
5Initial Focus Secondary Structure
- Classify each residue as alpha helix, beta
strand, coil - In this paper, restrict to all-alpha proteins
- Applications
- Informing tertiary structure predictors
- Identification of homologous proteins
- Identification of active sites (coils)
6Secondary Structure Predictors
7Secondary Structure Predictors
Sequence
Sequence
Sequence
Sequence
Alignment
Alignment
Only
Only
Statistical Methods
Statistical Methods
HMMs
8Secondary Structure Predictors
Sequence
Sequence
Sequence
Sequence
Alignment
Alignment
Only
Only
Statistical Methods
Statistical Methods
Neural Networks
Neural Networks
HMMs
9Secondary Structure Predictors
Sequence
Sequence
Sequence
Sequence
Alignment
Alignment
Only
Only
Statistical Methods
Statistical Methods
Neural Networks
Neural Networks
SVMs
HMMs
10Secondary Structure Predictors
Sequence
Sequence
Sequence
Sequence
Alignment
Alignment
Only
Only
Statistical Methods
Statistical Methods
Neural Networks
Neural Networks
SVMs
HMMs
HMMs
11Secondary Structure Predictors
Sequence
Sequence
Sequence
Sequence
Alignment
Alignment
Only
Only
Statistical Methods
Statistical Methods
1400-2900 parameters
Neural Networks
Neural Networks
SVMs
HMMs
HMMs
680 MB of support vectors
471 parameters
- Exploits biochemical models
- Offers biological insight
12Secondary Structure Predictors
Sequence
Sequence
Sequence
Sequence
Alignment
Alignment
Only
Only
Statistical Methods
Statistical Methods
302 params
1400-2900 parameters
Neural Networks
Neural Networks
SVMs
HMMs
HMMs
680 MB of support vectors
471 parameters
- Exploits biochemical models
- Offers biological insight
13Our Framework Applied to Helix Prediction
Protein Data Bank
Alpha Helices
Correct structure
Amino-acid Sequence
MNIFEMLRIDEGL HHHHHHHHH
HiddenMarkov Model
Prediction Algorithm
Energy Parameters
Support Vector Machines
Predictedstructure
LearningAlgorithm
correct
incorrect
Done!
Constraints energy(incorrect) gt energy(correct)
14Energy Parameters
302 Total
15Energy Parameters
302 Total
- Example
- Sequence MNIFELRIDEGL
- Structure HHHHHH
- Energy
16Energy Parameters
302 Total
- Example
- Sequence MNIFELRIDEGL
- Structure HHHHHH
- Energy HF HE HL HR HI HD
(Helix) -
17Energy Parameters
302 Total
- Example
- Sequence MNIFELRIDEGL
- Structure HHHHHH
- Energy HF HE HL HR HI HD
(Helix) - NM,-3 NN,-2 NI,-1 NF,0 NE,1
NL,2 NR,3 (N-cap) -
18Energy Parameters
302 Total
- Example
- Sequence MNIFELRIDEGL
- Structure HHHHHH
- Energy HF HE HL HR HI HD
(Helix) - NM,-3 NN,-2 NI,-1 NF,0 NE,1
NL,2 NR,3 (N-cap) - CL,-3 CR,-2 CI,-1 CD,0
CE,1 CG,2 CL,3 (C-cap)
19Energy Parameters
302 Total
- Example
- Sequence MNIFELRIDEGL
- Structure HHHHHH
- Energy HF HE HL HR HI HD
(Helix) - NM,-3 NN,-2 NI,-1 NF,0 NE,1
NL,2 NR,3 (N-cap) - CL,-3 CR,-2 CI,-1 CD,0
CE,1 CG,2 CL,3 (C-cap)
20Learning the Parameters
Feature Space
Energy ( ) HAA HGG
w A G
Legal structure
Correct structure
where w represents the energy parameters HA HG
G of Glycines in Helices
Highest energy in direction of energy parameters w
A of Alanines in Helices
21Learning the Parameters
Feature Space
Energy ( ) HAA HGG
w A G
Legal structure
Correct structure
where w represents the energy parameters HA HG
G of Glycines in Helices
Highest energy in direction of energy parameters w
w
A of Alanines in Helices
22Learning the Parameters
Feature Space
1. Predict stucture
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
23Learning the Parameters
Feature Space
1. Predict stucture
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
A of Alanines in Helices
24Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
Separating Hyperplane
A of Alanines in Helices
25Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
Separating Hyperplane
A of Alanines in Helices
26Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
27Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
28Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
A of Alanines in Helices
29Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
A of Alanines in Helices
30Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
31Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
32Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters 5. Predict
structure
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
33Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters 5. Predict
structure
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
A of Alanines in Helices
34Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters 5. Predict
structure 6. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
A of Alanines in Helices
35Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters 5. Predict
structure 6. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
36Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters 5. Predict
structure 6. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
37Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters 5. Predict
structure 6. Refine parameters 7. Predict
structure
Legal structure
Structurealready predicted
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
38Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters 5. Predict
structure 6. Refine parameters 7. Predict
structure 8. Terminate
Legal structure
Structurealready predicted
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
39Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters 5. Predict
structure 6. Refine parameters 7. Predict
structure 8. Terminate
Legal structure
Structurealready predicted
Correct structure
Predicted structure
G of Glycines in Helices
Details in paper - How to converge faster -
Early termination condition -
w
A of Alanines in Helices
Tsochantaridis et al., ICML02
40Experimental Methodology
- Data set 300 non-homologous all-alpha proteins
- From EVAs sequence-unique subset of the PDB,
July 2005 - Only consider alpha helices (H symbol in DSSP)
- Randomly split into 150 training, 150 test
proteins
41Results
- Comparison to others
- Best HMM method to date that does not utilize
alignment info - Offers 3.5 (Q?), 0.2 (SOV?) over previous best
- Lags behind neural networks e.g., Porter overall
SOV 76.6 - However, we could likely gain 6-8 from alignment
profiles - Caveats
- Moving beyond all-alpha proteins, we could suffer
3 - By considering 3/10 helices, we could decrease 2
Nguyen02
Rost93
Jones99
42Conclusions
- Represents first step toward learning biophysical
parameters for energy minimization techniques - Iterative, demand-driven learning process using
SVMs - Promising results on alpha-helix prediction
- 77.6 among best Q? for methods without alignment
info - Future work super-secondary structure
- Will predict full contact maps rather than
3-state labels - For beta sheets, replace HMMs by multi-tape
grammars
http//protein.csail.mit.edu/
43Extra Slides
44Prediction Algorithm
- Parameters represent energetic benefitof a given
feature in a protein structure - Features are fixed, chosen by designer
- Example features
- Number of prolines in an alpha helix
- Number of coils shorter than 2 residues
- Energy (structure) ?features 2 structure Energy
(feature) - Minimal-energy structure found with dynamic prog.
- Idea consider all structures, exploiting
overlapping problems - Implemented as HMM using Viterbi algorithm
Structure withMinimal Energy
45Learning Algorithm
- Constraints have form
- For all incorrectly predicted structures Si,
- in future selection of the parameters w
- Energyw (Si) gt Energyw (correct structure)
- Constraints are linear in the energy parameters.
- If feasible, could solve with linear programming
- In general, solve with Support Vector Machines
(SVMs) - Energy(Si) Energy (correct structure) 1 -
?i (?i 0) - Find parameters w minimizing ½ w2 C/n ?i1
?i -
n
Provides general solution using soft-margin
criterion