Title: Some gory details of protein secondary structure prediction
1Some gory details of protein secondary structure
prediction
- Burkhard Rost
- CUBIC Columbia University
- rost_at_columbia.edu
- http//www.columbia.edu/rost
- http//cubic.bioc.columbia.edu/
2(No Transcript)
3Goal of secondary structure prediction
4Secondary structure predictions of 1. and 2.
generation
- single residues (1. generation)
- Chou-Fasman, GOR 1957-70/8050-55 accuracy
- segments (2. generation)
- GORIII 1986-9255-60 accuracy
- problems
- lt 100 they said 65 max
- lt 40 they said strand
non-local - short segments
5Helix formation is local
THYROID hormone receptor (2nll)
6b-sheet formation is NOT local
7Problems of secondary structure
predictions(before 1994)
SEQ KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQG
FVPAAYVKKLD OBS EEEE E E E EEEEEE
EEEEEE EEEEEEHHHEEEE TYP EHHHH EE
EEEE EE HHHEE EEEHH
8Simple neural network
9Training a neural network 1
10Training a neural network 2
2
Errare (out net - out want)
11Training a neural network 3
12Training a neural network 4
13Neural networks classify points
14Simple neural network with hidden layer
15Neural Network for secondary structure
16Secondary structure predictions of 1. and 2.
generation
- single residues (1. generation)
- Chou-Fasman, GOR 1957-70/8050-55 accuracy
- segments (2. generation)
- GORIII 1986-9255-60 accuracy
- problems
- lt 100 they said 65 max
- lt 40 they said strand
non-local - short segments
17(No Transcript)
18(No Transcript)
19(No Transcript)
20Balanced training
normal training
balanced training
21(No Transcript)
22PHDsec structure-to-structure network
23Better prediction of segment lengths
24Evolution has it!
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30Spectrin homology domain (SH3)
31Prediction accuracy varies!
32Why so bad?
33Stronger predictions more accurate!
34Correct prediction of correctly predicted residues
35BAD errors are frequent!
36False prediction for engineered proteins!
37PHDsec the un-g(l)ory details
- average accuracy gt 72 (helix, strand, other)
- 72 is average over distribution 10
- stronger predictions more accurate
- WARNING reliability index almost factor 2 too
large for single sequences
38Details PHDsec Multiple alignment
- single sequences gt accuracy clearly lower
id nali Q3sec Q2acc AA
KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPA
AYVKKLD OBS EEEE E E
EEEEEE EEEEEE EEEEEEHHHEEEE 30 N 26 70 77
EEEEEEE EEE EEEEE EEEE EE
EEE self 1 63 72 EEEEEEE EEEE
EEEEE EEEEEE HHHHH
39PHDsec the un-g(l)ory details
- average accuracy gt 72 (helix, strand, other)
- 72 is average over distribution 10
- stronger predictions more accurate
- WARNING reliability index almost factor 2 too
large for single sequences
40Details PHDsec Multiple alignment
- single sequences gt accuracy clearly lower
id nali Q3sec Q2acc AA
KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPA
AYVKKLD OBS EEEE E E
EEEEEE EEEEEE EEEEEEHHHEEEE 30 N 26 70 77
EEEEEEE EEE EEEEE EEEE EE
EEE self 1 63 72 EEEEEEE EEEE
EEEEE EEEEEE HHHHH
41Secondary structure prediction
- Limit of prediction accuracy reached?
- How complementing other methods?
- Ultimate rôle in structure prediction (1D-3D)?
- Better to use "pure" secondary structure
prediction methods, or to use 3D methods and
read the secondary structure off the 3D model? - Conversely, are 3D predictors making optimal use
of secondary structure predictions? - Will secondary structure and 3D prediction merge
completely?
42Secondary structure prediction 2000
- history
- 1st generation 50-55
- 2nd generation 55-62
- 3rd generation 1992 70-72 2000 gt 76
- what improves?
- database growth 3
- PSI-BLAST 0.5
- new training 1
- clever method 1
- limit?
- max 88 -gt 12 to go
- 1/5 of proteins with more than 100 proteins-gt
gt80 - and from there?
43Prediction of protein secondary structure
- 1980 55 simple
- 1990 60 less simple
- 1993 70 evolution
- 2000 76 more evolution
- what is the limit?
- 88 for proteins of similar structure
- 80 for 1/5th of proteins with families gt 100
- missing through better definition of secondary
structure including long-range interactions - structural switches
- chameleon / folding
44CAFASP statistics
- 29 proteins not similar to known PDB
- T0086,T0087,T0090,T0091,T0092,T0094,T0095,T0096,T0
097,T0098,T0101,T0102,T0104,T0105,T0106,T0107,T010
8,T0109,T0110,T0114,T0115,T0116,T0117,T0118,T0120,
T0124,T0125,T0126,T0127 - 2 proteins with PSI-BLAST homologue
- T0089,T0103
- 9 proteins with trivial homologue to PDB
- T0099,T0100,T0111,T0112,T0113,T0121,T0122,T0123,T0
128
45CAFASP sec unique
46CAFASP sec homologous
47CAFASP concept
- Targets Non-targets
- comparative modelling 85 gt all current methods
- Never compare methods on different proteins
- Never rank when too few proteins
- (Never show numbers for one protein between
different proteins)
48What is significant
49Rank only if significant
- e.g. M1 75, M2 73
- say 16 proteins
- rule-of-thumb significantsigma / sqrt(Number of
porteins) - -gt 10/4 2.5 -gt M1 and M2 cannot be
distinguished
50EVA automatic continuous EVAluation of
structure prediction
51EVA automatic continuous EVAluation of
structure prediction
- statistics31 weeks -gt 1549 new structures
352 new sequence unique chains (of 2200) - categories
- secondary structure prediction (7 methods)
- comparative modelling (4)
- fold recognition (7)
- contact prediction (4)
52EVA secondary structure
- MAJOR lessons from EVA
- no point comparing apples and oranges
- no point comparing lt 20 apples
- EVA team
- CUBIC, Columbia Volker Eyrich, Dariusz
Przybylski, Burkhard Rost - RockefellerMarc Marti-Renom, Andras Fiser,
Andrej Sali - MadridFlorencio Pazos, Alfonso Valencia
- URL
- http//cubic.bioc.columbia.edu/eva/
- http//pipe.rockefeller.edu/eva/
- http//montblanc.cnb.uam.es/eva/
53EVA secondary structure
76
54Accuracy varies for proteins!
55Averaging overmany methods not alwaysa good
idea!
56Some proteins predicted better
57Reliability correlates with accuracy!
58Conclusion
- big gain through using evolutionary information
- are we going to reach above 80? How high?
- continuous secondary structure
- better methods
- other features
- use secondary structure ASP Young M,
Kirshenbaum K, Dill KA, Highsmith S Predicting
conformational switches in proteins. Protein Sci
1999, 81752-1764.
59Availability of methods
- email PredictProtein_at_columbia.edu
- subject HELP
- file
- WWW http//cubic.bioc.columbia.edu/predictprotein
/ - META http//cubic.bioc.columbia.edu/
predictprotein/submit_meta.html - EVA http//cubic.bioc.columbia.edu/eva
- CUBIC http//cubic.bioc.columbia.edu/
Email address options protein name SEQWENCE