Some gory details of protein secondary structure prediction - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Some gory details of protein secondary structure prediction

Description:

Some gory details of protein secondary structure prediction – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 60
Provided by: burk160
Category:

less

Transcript and Presenter's Notes

Title: Some gory details of protein secondary structure prediction


1
Some gory details of protein secondary structure
prediction
  • Burkhard Rost
  • CUBIC Columbia University
  • rost_at_columbia.edu
  • http//www.columbia.edu/rost
  • http//cubic.bioc.columbia.edu/

2
(No Transcript)
3
Goal of secondary structure prediction
4
Secondary structure predictions of 1. and 2.
generation
  • single residues (1. generation)
  • Chou-Fasman, GOR 1957-70/8050-55 accuracy
  • segments (2. generation)
  • GORIII 1986-9255-60 accuracy
  • problems
  • lt 100 they said 65 max
  • lt 40 they said strand
    non-local
  • short segments

5
Helix formation is local
THYROID hormone receptor (2nll)
6
b-sheet formation is NOT local
7
Problems of secondary structure
predictions(before 1994)
SEQ KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQG
FVPAAYVKKLD OBS EEEE E E E EEEEEE
EEEEEE EEEEEEHHHEEEE TYP EHHHH EE
EEEE EE HHHEE EEEHH
8
Simple neural network
9
Training a neural network 1
10
Training a neural network 2
2
Errare (out net - out want)
11
Training a neural network 3
12
Training a neural network 4
13
Neural networks classify points
14
Simple neural network with hidden layer
15
Neural Network for secondary structure
16
Secondary structure predictions of 1. and 2.
generation
  • single residues (1. generation)
  • Chou-Fasman, GOR 1957-70/8050-55 accuracy
  • segments (2. generation)
  • GORIII 1986-9255-60 accuracy
  • problems
  • lt 100 they said 65 max
  • lt 40 they said strand
    non-local
  • short segments

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Balanced training
normal training
balanced training
21
(No Transcript)
22
PHDsec structure-to-structure network
23
Better prediction of segment lengths
24
Evolution has it!
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
Spectrin homology domain (SH3)
31
Prediction accuracy varies!
32
Why so bad?
33
Stronger predictions more accurate!
34
Correct prediction of correctly predicted residues
35
BAD errors are frequent!
36
False prediction for engineered proteins!
37
PHDsec the un-g(l)ory details
  • average accuracy gt 72 (helix, strand, other)
  • 72 is average over distribution 10
  • stronger predictions more accurate
  • WARNING reliability index almost factor 2 too
    large for single sequences

38
Details PHDsec Multiple alignment
  • single sequences gt accuracy clearly lower

id nali Q3sec Q2acc AA
KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPA
AYVKKLD OBS EEEE E E
EEEEEE EEEEEE EEEEEEHHHEEEE 30 N 26 70 77
EEEEEEE EEE EEEEE EEEE EE
EEE self 1 63 72 EEEEEEE EEEE
EEEEE EEEEEE HHHHH
39
PHDsec the un-g(l)ory details
  • average accuracy gt 72 (helix, strand, other)
  • 72 is average over distribution 10
  • stronger predictions more accurate
  • WARNING reliability index almost factor 2 too
    large for single sequences

40
Details PHDsec Multiple alignment
  • single sequences gt accuracy clearly lower

id nali Q3sec Q2acc AA
KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPA
AYVKKLD OBS EEEE E E
EEEEEE EEEEEE EEEEEEHHHEEEE 30 N 26 70 77
EEEEEEE EEE EEEEE EEEE EE
EEE self 1 63 72 EEEEEEE EEEE
EEEEE EEEEEE HHHHH
41
Secondary structure prediction
  • Limit of prediction accuracy reached?
  • How complementing other methods?
  • Ultimate rôle in structure prediction (1D-3D)?
  • Better to use "pure" secondary structure
    prediction methods, or to use 3D methods and
    read the secondary structure off the 3D model?
  • Conversely, are 3D predictors making optimal use
    of secondary structure predictions?
  • Will secondary structure and 3D prediction merge
    completely?

42
Secondary structure prediction 2000
  • history
  • 1st generation 50-55
  • 2nd generation 55-62
  • 3rd generation 1992 70-72 2000 gt 76
  • what improves?
  • database growth 3
  • PSI-BLAST 0.5
  • new training 1
  • clever method 1
  • limit?
  • max 88 -gt 12 to go
  • 1/5 of proteins with more than 100 proteins-gt
    gt80
  • and from there?

43
Prediction of protein secondary structure
  • 1980 55 simple
  • 1990 60 less simple
  • 1993 70 evolution
  • 2000 76 more evolution
  • what is the limit?
  • 88 for proteins of similar structure
  • 80 for 1/5th of proteins with families gt 100
  • missing through better definition of secondary
    structure including long-range interactions
  • structural switches
  • chameleon / folding

44
CAFASP statistics
  • 29 proteins not similar to known PDB
  • T0086,T0087,T0090,T0091,T0092,T0094,T0095,T0096,T0
    097,T0098,T0101,T0102,T0104,T0105,T0106,T0107,T010
    8,T0109,T0110,T0114,T0115,T0116,T0117,T0118,T0120,
    T0124,T0125,T0126,T0127
  • 2 proteins with PSI-BLAST homologue
  • T0089,T0103
  • 9 proteins with trivial homologue to PDB
  • T0099,T0100,T0111,T0112,T0113,T0121,T0122,T0123,T0
    128

45
CAFASP sec unique
46
CAFASP sec homologous
47
CAFASP concept
  • Targets Non-targets
  • comparative modelling 85 gt all current methods
  • Never compare methods on different proteins
  • Never rank when too few proteins
  • (Never show numbers for one protein between
    different proteins)

48
What is significant
49
Rank only if significant
  • e.g. M1 75, M2 73
  • say 16 proteins
  • rule-of-thumb significantsigma / sqrt(Number of
    porteins)
  • -gt 10/4 2.5 -gt M1 and M2 cannot be
    distinguished

50
EVA automatic continuous EVAluation of
structure prediction
51
EVA automatic continuous EVAluation of
structure prediction
  • statistics31 weeks -gt 1549 new structures
    352 new sequence unique chains (of 2200)
  • categories
  • secondary structure prediction (7 methods)
  • comparative modelling (4)
  • fold recognition (7)
  • contact prediction (4)

52
EVA secondary structure
  • MAJOR lessons from EVA
  • no point comparing apples and oranges
  • no point comparing lt 20 apples
  • EVA team
  • CUBIC, Columbia Volker Eyrich, Dariusz
    Przybylski, Burkhard Rost
  • RockefellerMarc Marti-Renom, Andras Fiser,
    Andrej Sali
  • MadridFlorencio Pazos, Alfonso Valencia
  • URL
  • http//cubic.bioc.columbia.edu/eva/
  • http//pipe.rockefeller.edu/eva/
  • http//montblanc.cnb.uam.es/eva/

53
EVA secondary structure
76
54
Accuracy varies for proteins!
55
Averaging overmany methods not alwaysa good
idea!
56
Some proteins predicted better
57
Reliability correlates with accuracy!
58
Conclusion
  • big gain through using evolutionary information
  • are we going to reach above 80? How high?
  • continuous secondary structure
  • better methods
  • other features
  • use secondary structure ASP Young M,
    Kirshenbaum K, Dill KA, Highsmith S Predicting
    conformational switches in proteins. Protein Sci
    1999, 81752-1764.

59
Availability of methods
  • email PredictProtein_at_columbia.edu
  • subject HELP
  • file
  • WWW http//cubic.bioc.columbia.edu/predictprotein
    /
  • META http//cubic.bioc.columbia.edu/
    predictprotein/submit_meta.html
  • EVA http//cubic.bioc.columbia.edu/eva
  • CUBIC http//cubic.bioc.columbia.edu/

Email address options protein name SEQWENCE
Write a Comment
User Comments (0)
About PowerShow.com