Protein secondary structure predictions By: Refael Vivanti - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Protein secondary structure predictions By: Refael Vivanti

Description:

Today we have much more sequenced proteins than protein's structures. ... Gets 'signals' from its neighbours. When achieving certain threshold - sends signals. ... – PowerPoint PPT presentation

Number of Views:169
Avg rating:3.0/5.0
Slides: 44
Provided by: jojo
Category:

less

Transcript and Presenter's Notes

Title: Protein secondary structure predictions By: Refael Vivanti


1
Protein secondary structure predictionsBy
Refael Vivanti Tal Tabakman
2
Rising accuracy of protein secondary structure
prediction Burkhard Rost
3
Main dogma in biology
AGCTCTCTGAGGCTT
  • D.N.A.
  • R.N.A
  • strand of A.A.
  • protein

UCGAGAGACUCCGAA
AGHTY
?
4
Sequence structure gap
  • Today we have much more sequenced proteins than
    proteins structures.
  • The gap is rapidly increasing.

Problem Finding protein structure isnt that
simple.
Solution A good start find secondary
structure.
5
"???? ?? ???? ??? ??? ?????? ?????" ?????? ?? ?
  • Comparing methods requires same terms
    and tests.
  • Secondary structure types

H - helix
E ß strand
L\C other.
seq
A A P P L L L L M M M G I M M R R I M E E E E E
C C C C H H H H C C C E E E
pred
6
How to evaluate a prediction?
The Q3 test
correctly predicted residues number of
residues
Of course, all methods would be tested on the
same proteins.
7
Old methods
  • First generation single residue statistics
  • Fasman Chou (1974)
  • Some residues have particular secondary
  • structure preference.
  • Examples Glu a-Helix
  • Val
    ß-strand
  • Second generation segment statistics
  • Similar, but also considering adjacent
    residues.

8
Difficulties
Bad accuracy - below 66 (Q3 results).
Q3 of strands (E) 28 - 48.
Predicted structures were too short.
9
Methods accuracy comparison
10
3rd generation methods
  • Third generation methods reached 77 accuracy.
  • They consist of two new ideas
  • 1. A biological idea
  • Using evolutionary information.
  • 2. A technological idea
  • Using neural networks.

11
How can evolutionary information help us?
Homologues similar structure
But sequences change up to 85
Sequence would vary differently - depends on
structure
12
How can evolutionary information help us?
Where can we find high sequence conservation?
Some examples
In defined secondary structures.
In protein cores segments (more hydrophobic).
In amphipatic helices (cycle of hydrophobic and
hydrophilic residues).
13
How can evolutionary information help us?
  • Predictions based on multiple alignments were
    made manually.
  • Problem
  • There isnt any well defined algorithm!
  • Solution
  • Use Neural Networks .

14
Artificial Neural Networks
An attempt to imitate the human brain
construction, (assuming this is the way it works).
When do we use it ?
When we cant solve the problems ourselves!!!
15
Artificial Neural Network
  • The neural network basic structure
  • Big amount of processors
  • neurons.
  • Highly connected.
  • Working together.

16
Artificial Neural Network
What does a neuron do?
  • Gets signals from its neighbours.
  • Each signal has different weight.
  • When achieving certain threshold - sends
    signals.

17
Artificial Neural Network
General structure of ANN
  • One input layer.
  • Some hidden layers.
  • One output layer.
  • Our ANN have one-direction flow !

18
Artificial Neural Network
  • A neuron may be
  • Because this is a complete system, a neural
    network can compute anything.

19
Artificial Neural Network
Network training and testing
Test set
Correct
Neural network
Training set
Incorrect
Back - propagation
  • Training set - inputs for which we know the
    wanted output.
  • Back propagation - algorithm for changing
    neurons pulses
  • power.
  • Test set - inputs used for final network
    performance test.

20
Artificial Neural Network
  • The Network is a black box
  • Even when it succeeds
  • its hard to understand
  • how.
  • Its difficult to conclude
  • an algorithm from the network
  • Its hard to deduce
  • new scientific principles.

21
Structure of 3rd generation methods
Find homologues using large data bases.
Create a profile representing the entire protein
family.
Give sequence and profile to ANN.
Output of the ANN 2nd structure prediction.
22
Structure of 3rd generation methods
  • The ANN learning process
  • Training testing set
  • - Proteins with known sequence structure.

Training - Insert training set to ANN as
input. - Compare output to known structure. -
Back propagation.
23
3rd generation methods - difficulties
Main problem - unwise selection of training
test sets for ANN.
  • First problem unbalanced training
  • Overall protein composition
  • Helices - 32
  • Strands - 21
  • Coils 47

What will happen if we train the ANN with random
segments ?
24
3rd generation methods - difficulties
  • Second problem unwise separation between
    training
  • test proteins

What will happen if homology / correlation exists
between test training proteins?
over optimism!
Above 80 accuracy in testing.
  • Third problem similarity between test
    proteins.

25
Protein Secondary Structure Prediction Based on
Position specific Scoring Matrices David T.
Jones
PSI - PRED 3RD generation method based on the
iterated PSI BLAST
algorithm.

26
PSI - BLAST
PSSM - position specific scoring matrix
Sequence
Distant homologues
  • PSI - BLAST outperforms other algorithms in
    finding distant
  • homologues.
  • PSSM input for PSI - PRED.

27
PSI - PRED
ANNs architecture
  • Two ANNs working together.

Sequence PSSM
1ST ANN
Prediction
2ND ANN
Final prediction
28
PSI - PRED
  • Step 1
  • Create PSSM from sequence - 3 iterations of
  • PSI BLAST.
  • Step 2 1ST ANN
  • Sequence PSSM 1st ANNs input.

A D C Q E I L H T S T T W Y V 15
RESIDUES
E/H/C
output central amino acid secondary state
prediction.
A D C Q E I L H T S T T W Y V
29
PSI - PRED
Using PSI - BLAST brings up PSI BLAST
difficulties
Iteration - extension of proteins family
Updating PSSM
Inclusion of non homologues
Misleading PSSM
30
PSI - PRED
Step 3 2nd ANN
  • So why do we need a second ANN ?

possible output for 1st ANN
one-amino-acid helix doesnt exist
seq
A A P P L L L L M M M G I M M R R I M E E E E
E C C C C C H C C C C C E E E
pred
whats wrong with that ?
Solution ANN that looks at the whole context !
Input output of 1st ANN.
Output final prediction.
31
PSI - PRED
Training
  • 10 of proteins were used as inner test.
  • Balanced training.

Testing
  • 187 proteins, Highly resolved
  • structure.
  • PSI BLAST was used for
  • removing homologues.
  • Without structural similarities.

32
PSI - PRED
Joness reported results
  • Q3 results 76 - 77.

33
PSI - PRED
Reliability numbers
  • The way the ANN tells us
  • how much it is sure about
  • the assignment.
  • Used by many methods.
  • Correlates with accuracy.

34
Performance evaluation
  • Through 3rd generation methods accuracy
  • jumped 10.
  • Many 3rd generation methods exist today.

Which method is the best one ? How to recognize
over-optimism ?
35
Performance evaluation
CASP - Critical Assessment of Techniques for
Protein Structure Prediction.
EVA Automatic Evaluation of Automatic
Prediction Servers.
36
(No Transcript)
37
Performance evaluation
Conclusion PSI-PRED seams to be one of the
most reliable method today.
Reasons
  • The widest evolutionary information
  • (PSI - BLAST profiles).
  • Strict training testing criterions for ANN.

38
Improvements
The first 3rd generation method PHD 72 in Q3.
3rd generation methods best results 77 in Q3 .
Sources of improvement
  • Larger protein data bases.
  • PSI BLAST
  • PSI PRED broke through, many followed...

39
Improvements
How can we do better than that ?
  • Through larger data bases (?).
  • Combination of methods.

Example Combining 4 best methods Q3
of 78 !
  • Find why certain proteins
  • predicted poorly.

40
Improvements
What is the limit of prediction improvement?
  • Some regions of proteins are more mobile
  • than others.
  • 12 of proteins structure is unknown even by
  • manual methods.
  • The limit of accuracy is 88 !

41
Secondary structure prediction in practice
SECONDARY STRUCTURE PREDICTION
finding structural switches
genome analysis
protein structure
42
Finding Structural Switches
young et al
Prediction of secondary structure with several
methods
Different results same preferences
Structural switch ???
43
Bibliography
  • Jones DT. Protein secondary structure prediction
    based on
  • position specific scoring matrices. J Mol Biol.
    1999 292195-202
  • Rost B. Rising accuracy of protein secondary
    structure prediction
  • 'Protein structure determination, analysis, and
    modeling for
  • drug discovery (ed. D Chasman), New York
    Dekker, pp. 207-249
Write a Comment
User Comments (0)
About PowerShow.com