Some gory details of protein secondary structure prediction - PowerPoint PPT Presentation

1 / 59

About This Presentation

Title:

Some gory details of protein secondary structure prediction

Description:

Some gory details of protein secondary structure prediction – PowerPoint PPT presentation

Number of Views:81

Avg rating:3.0/5.0

Slides: 60

Provided by: burk160

Category:

more less

Transcript and Presenter's Notes

Title: Some gory details of protein secondary structure prediction

1
Some gory details of protein secondary structure
prediction

Burkhard Rost
CUBIC Columbia University
rost_at_columbia.edu
http//www.columbia.edu/rost
http//cubic.bioc.columbia.edu/

2
(No Transcript)
3
Goal of secondary structure prediction
4
Secondary structure predictions of 1. and 2.
generation

single residues (1. generation)
Chou-Fasman, GOR 1957-70/8050-55 accuracy
segments (2. generation)
GORIII 1986-9255-60 accuracy
problems
lt 100 they said 65 max
lt 40 they said strand
non-local
short segments

5
Helix formation is local
THYROID hormone receptor (2nll)
6
b-sheet formation is NOT local
7
Problems of secondary structure
predictions(before 1994)
SEQ KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQG
FVPAAYVKKLD OBS EEEE E E E EEEEEE
EEEEEE EEEEEEHHHEEEE TYP EHHHH EE
EEEE EE HHHEE EEEHH
8
Simple neural network
9
Training a neural network 1
10
Training a neural network 2
2
Errare (out net - out want)
11
Training a neural network 3
12
Training a neural network 4
13
Neural networks classify points
14
Simple neural network with hidden layer
15
Neural Network for secondary structure
16
Secondary structure predictions of 1. and 2.
generation

single residues (1. generation)
Chou-Fasman, GOR 1957-70/8050-55 accuracy
segments (2. generation)
GORIII 1986-9255-60 accuracy
problems
lt 100 they said 65 max
lt 40 they said strand
non-local
short segments

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Balanced training
normal training
balanced training
21
(No Transcript)
22
PHDsec structure-to-structure network
23
Better prediction of segment lengths
24
Evolution has it!
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
Spectrin homology domain (SH3)
31
Prediction accuracy varies!
32
Why so bad?
33
Stronger predictions more accurate!
34
Correct prediction of correctly predicted residues
35
BAD errors are frequent!
36
False prediction for engineered proteins!
37
PHDsec the un-g(l)ory details

average accuracy gt 72 (helix, strand, other)
72 is average over distribution 10
stronger predictions more accurate
WARNING reliability index almost factor 2 too
large for single sequences

38
Details PHDsec Multiple alignment

single sequences gt accuracy clearly lower

id nali Q3sec Q2acc AA
KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPA
AYVKKLD OBS EEEE E E
EEEEEE EEEEEE EEEEEEHHHEEEE 30 N 26 70 77
EEEEEEE EEE EEEEE EEEE EE
EEE self 1 63 72 EEEEEEE EEEE
EEEEE EEEEEE HHHHH
39
PHDsec the un-g(l)ory details

average accuracy gt 72 (helix, strand, other)
72 is average over distribution 10
stronger predictions more accurate
WARNING reliability index almost factor 2 too
large for single sequences

40
Details PHDsec Multiple alignment

single sequences gt accuracy clearly lower

Limit of prediction accuracy reached?
How complementing other methods?
Ultimate rôle in structure prediction (1D-3D)?
Better to use "pure" secondary structure
prediction methods, or to use 3D methods and
read the secondary structure off the 3D model?
Conversely, are 3D predictors making optimal use
of secondary structure predictions?
Will secondary structure and 3D prediction merge
completely?

42
Secondary structure prediction 2000

history
1st generation 50-55
2nd generation 55-62
3rd generation 1992 70-72 2000 gt 76
what improves?
database growth 3
PSI-BLAST 0.5
new training 1
clever method 1
limit?
max 88 -gt 12 to go
1/5 of proteins with more than 100 proteins-gt
gt80
and from there?

43
Prediction of protein secondary structure

1980 55 simple
1990 60 less simple
1993 70 evolution
2000 76 more evolution
what is the limit?
88 for proteins of similar structure
80 for 1/5th of proteins with families gt 100
missing through better definition of secondary
structure including long-range interactions
structural switches
chameleon / folding

44
CAFASP statistics

29 proteins not similar to known PDB
T0086,T0087,T0090,T0091,T0092,T0094,T0095,T0096,T0
097,T0098,T0101,T0102,T0104,T0105,T0106,T0107,T010
8,T0109,T0110,T0114,T0115,T0116,T0117,T0118,T0120,
T0124,T0125,T0126,T0127
2 proteins with PSI-BLAST homologue
T0089,T0103
9 proteins with trivial homologue to PDB
T0099,T0100,T0111,T0112,T0113,T0121,T0122,T0123,T0
128

45
CAFASP sec unique
46
CAFASP sec homologous
47
CAFASP concept

Targets Non-targets
comparative modelling 85 gt all current methods
Never compare methods on different proteins
Never rank when too few proteins
(Never show numbers for one protein between
different proteins)

48
What is significant
49
Rank only if significant

e.g. M1 75, M2 73
say 16 proteins
rule-of-thumb significantsigma / sqrt(Number of
porteins)
-gt 10/4 2.5 -gt M1 and M2 cannot be
distinguished

50
EVA automatic continuous EVAluation of
structure prediction
51
EVA automatic continuous EVAluation of
structure prediction

statistics31 weeks -gt 1549 new structures
352 new sequence unique chains (of 2200)
categories
secondary structure prediction (7 methods)
comparative modelling (4)
fold recognition (7)
contact prediction (4)

52
EVA secondary structure

MAJOR lessons from EVA
no point comparing apples and oranges
no point comparing lt 20 apples
EVA team
CUBIC, Columbia Volker Eyrich, Dariusz
Przybylski, Burkhard Rost
RockefellerMarc Marti-Renom, Andras Fiser,
Andrej Sali
MadridFlorencio Pazos, Alfonso Valencia
URL
http//cubic.bioc.columbia.edu/eva/
http//pipe.rockefeller.edu/eva/
http//montblanc.cnb.uam.es/eva/

53
EVA secondary structure
76
54
Accuracy varies for proteins!
55
Averaging overmany methods not alwaysa good
idea!
56
Some proteins predicted better
57
Reliability correlates with accuracy!
58
Conclusion

big gain through using evolutionary information
are we going to reach above 80? How high?
continuous secondary structure
better methods
other features
use secondary structure ASP Young M,
Kirshenbaum K, Dill KA, Highsmith S Predicting
conformational switches in proteins. Protein Sci
1999, 81752-1764.

59
Availability of methods