Title: Structural Bioinformatics 2
1Structural Bioinformatics (2)
2Reminder of the previous lecture
- Protein 3D structure
- primary, secondary, tertiary and quaternary
structure - SSEs and surface loops
- rate of evolution of different parts of protein
structures - relevance to multiple sequence alignment
3Reminder of previous lecture
- Evolution of overall structure
- sequence similarity guarantees structural
similarity - 25 sequence ID in an 80 or more residue
alignment means that the two proteins have the
same basic 3D structure - BUT some proteins which share the same structure
have much lower sequence identities
4Objectives of this lecture (1)
- To understand the importance of evolutionary
relationships in prediction problems in
bioinformatics - To understand what protein structure prediction
means - To understand why protein structure prediction is
important
5Objectives of this lecture (2)
- To be aware a number of structure prediction
methods - secondary structure prediction
- comparative modelling
- To understand the conditions under which they can
and should be applied - To understand their expected accuracy and some
factors which affect it
6Prediction in bioinformatics
- Important prediction problems
- prediction of protein sequence from genomic DNA
- prediction of protein function from sequence
- prediction of protein 3D structure from sequence
- prediction of protein function from structure
7Why are these prediction methods important
- Genome sequencing projects
- generate large quantities of genomic sequences
- BUT what does it mean?
- Prediction of protein sequence, structure and
function can give clue - Predictions can be verified experimentally
- often slow
8Prediction based on similarity
- Many successful prediction methods rely on the
detection of similarity - similar sequences have similar functions
- similar sequences have similar structures (last
lecture) - For instance doing a BLAST/FASTA search with a
new protein sequence (previous lectures)
9The importance of evolution
- Divergent evolution gives rise to many similar
sequences with related structures and often with
related functions - an evolutionary family
10An evolutionary prediction paradigm
- If a sequence of unknown structure/function can
be shown to be similar to one or more of known
structure/function then functional/structural
details can be transferred to the new sequence.
11Predicting structure is easier
- As sequences evolve 3D structure tends to be
conserved - Function is often conserved as well
- BUT function tends to change with evolution
faster than structure - WHY? Or HOW?
12Protein evolutionary families
- Within an evolutionary protein family we expect
- only one basic 3D structure
- perhaps more than one different function
- Functional differences can be minor or major
- changes in enzyme specificity (minor)
- change from enzyme to structural protein (seen in
GST family) (major).
13What is protein structure prediction?
- In its most general form
- a prediction of the (relative) spatial position
of each atom in the tertiary structure generated
from knowledge only of the primary structure
(sequence)
14Why predict protein structure?
- The sequence structure gap
- 1 000 000 known sequences, 20 000 known
structures - Structural knowledge brings understanding of
function and mechanism of action (last lecture
and practical) - Can help in prediction of function
15Why predict protein structure?
- Predicted structures can be used in structure
based drug design - It can help us understand the effects of
mutations on structure or function - It is a very interesting scientific problem
- still unsolved in its most general form after
more than 20 years of effort
16Methods of structure prediction
- Comparative modelling
- Secondary structure prediction
- Fold recognition/threading
- Ab initio protein folding approaches
17Terminology
- The (prediction) target sequence
- a sequence of unknown structure for which we
require a structure prediction
18Comparative modelling
- Makes a prediction of tertiary structure based on
- sequences of known structure which are similar to
the target sequence (called template structures) - an alignment between these and the target
sequence - Remember 25 seq ID means two proteins have the
same basic structure
19Choice of prediction methods
- If you can find similar sequences of known
structure then comparative modelling is the best
way to predict structure - all other methods are less reliable
- Of course, you cant always find similar
sequences of known structure.
20When you cant do comparative modelling?
- The next step is secondary structure prediction
- less detailed results
- only predicts the H (helix), E (extended) or C
(coil/loop) state of each residue, does not
predict the full atomic structure - Example http//www.bioinformatics.leeds.ac.uk/grou
p/undergraduate/biol3000_blgy2212/ex3_1utg.html
21Beyond secondary structure prediction
- When you cant do comparative modelling there are
some things you could do beyond a secondary
structure prediction - fold recognition or threading
- ab initio protein folding
22Fold recognition or threading
- Aimed at detecting when the target sequence
adopts a known fold, even if it has no
significant similarity to sequences of known fold - remember the globin example last lecture
- Beyond the scope of this module
23Ab initio protein folding
- Aims to predict tertiary structure from basic
physico-chemical principles - does not rely on any detection of similarity to
sequences of known structure - An important scientific question
- As yet very unreliable for practical predictions
24Accuracy of structure prediction
- Comparative modelling
- when template and target sequences are closely
related high accuracy is possible - sometimes lt 1.0 Angstrom RMSD (RMSD is the root
mean square deviation between atomic positions in
predicted and actual structures) - See handout graph
25Factors affecting accuracy
- The accuracy of comparative modelling is
controlled by the quality of the alignment
between target sequence and template structures - Alignment is easier if the sequences are closely
related (e.g. sequence identity gt 80).
26Accuracy of secondary structure prediction
- The best methods have an average accuracy of just
about 73 (the percentage of residues predicted
correctly)
27Secondary structure prediction methods
- PHD - Rost and Sander (artificial neural network)
- DSC - King and Sternberg (linear discriminant
analysis) - NNSSP -Salomov and Solevyev (nearest neighbour
algorithm) - PREDATOR - Frishman and Argos
- Jnet Barton Group
28Use of evolutionary information
- All the previous secondary structure prediction
methods make use of evolutionary information in
related sequences - This improves prediction accuracy enormously
(without this accuracy would be less than 70)
29Structure prediction resources
- Secondary structure prediction
- Jpred (http//www.compbio.dundee.ac.uk/Software/JP
red/jpred.html) - several others also on the WWW
- Comparative modelling
- SWISSMODEL (http//www.espasy.ch/swissmod/SWISS-MO
DEL.html) - We will use these in the practical
30Caveats
- The prediction methods we have covered are mainly
aimed at water soluble (globular) proteins - we still do not know very many 3D structures of
integral membrane proteins
31Other aspects of structure prediction
- There are methods for predicting which parts of
sequences are transmembrane segments - see for example the ExPASy WWW site
- http//www.expasy.ch/
- There are links to lots of protein prediction
resources available from ExPASy
32Summary
- The main points of this lecture were
- prediction using methods based on the detection
of similarity are important in bioinformatics - the underlying reason for this is divergent
evolution of sequences/structures - prediction of structure by comparative modelling
is an example of such a method
33Summary (2)
- Continued
- Comparative modelling should be used to predict
structures whenever possible - If it is not possible then secondary structure
prediction methods can be used, followed by more
sophisticated methods - These prediction methods are mainly for globular
proteins