Title: Homology Modelling
1Homology Modelling
- Thomas Blicher
- Center for Biological Sequence Analysis
2Why Do We Need Homology Modelling?
- Ab Initio protein folding (random sampling)
- 100 aa, 3 conf./residue gives approximately 1048
different overall conformations! - Random sampling is NOT feasible, even if
conformations can be sampled at picosecond (10-12
sec) rates. - Levinthals paradox
- Do homology modelling instead.
3How Is It Possible?
- The structure of a protein is uniquely determined
by its amino acid sequence(but sequence is
sometimes not enough) - prions
- pH, ions, cofactors, chaperones
- Structure is conserved much longer than sequence
in evolution. - Structure gt Function gt Sequence
4How Often Can We Do It?
- There are currently 40000 structures in the PDB
(but only 4000 if you include only ones that are
not more than 30 identical and have a resolution
better than 3.0 Ã…). - An estimated 25 of all sequences can be modeled
and structural information can be obtained for
50.
5Worldwide Structural Genomics
- Fold space coverage
- Complete genomes
- Signaling proteins
- Improving technology
- Disease-causing organisms
- Model organisms
- Membrane proteins
- Protein-ligand interactions
6Structural Genomics in North America
- 10 year 600 million project initiated in 2000,
funded largely by NIH. - AIM structural information on 10000 unique
proteins (now 4-6000), so far 1000 have been
determined. - Improve current techniques to reduce time (from
months to days) and cost (from 100.000 to
20.000/structure). - 9 research centers currently funded (2005),
targets are from model and disease-causing
organisms (a separate project on TB proteins).
7Homology Modeling for Structural Genomics
Roberto Sánchez et al. Nature Structural Biology
7, 986 - 990 (2000)
8How Well Can We Do It?
Sali, A. Kuriyan, J. Trends Biochem. Sci. 22,
M20M24 (1999)Â
9How Is It Done?
- Identify template(s) initial alignment
- Improve alignment
- Backbone generation
- Loop modelling
- Side chains
- Refinement
- Validation ?
10Template Identification
- Search with sequence
- Blast
- Psi-Blast
- Fold recognition methods
- Use biological information
- Functional annotation in databases
- Active site/motifs
11Alignment
12 1 2 3 4 5 6 7 8 9 10 11 12
13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA
GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO ---
--- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG ---
--- --- THR PRO GLU ALA ILE CYS
13 1 2 3 4 5 6 7 8 9 10 11 12
13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA
GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO ---
--- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG ---
--- --- THR PRO GLU ALA ILE CYS
14Improving the Alignment
1 2 3 4 5 6 7 8 9 10 11 12
13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA
GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO ---
--- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG ---
--- --- THR PRO GLU ALA ILE CYS
From Professional Gambling by Gert Vriend
http//www.cmbi.kun.nl/gv/articles/text/gambling.
html
15Template Quality
- Selecting the best template is crucial!
- The best template may not be the one with the
highest id (best p-value) - Template 1 93 id, 3.5 Ã… resolution ?
- Template 2 90 id, 1.5 Ã… resolution ?
16The Importance of Resolution
4 Ã…
3 Ã…
2 Ã…
1 Ã…
17Ramachandran Plot
- Allowed backbone torsion angles in proteins
Amino acid residue
18Template Quality Ramachandran Plot
X-ray structure good data.
19Backbone Generation
- Generate the backbone coordinates from the
template for the aligned regions. - Several programs can do this, most of the groups
at CASP6 use Modeller - http//salilab.org/modeller/modeller.html
20Loop Modelling
- Knowledge based
- Searches PDB for fragments that match the
sequence to be modelled (Levitt, Holm, Baker
etc.). - Energy based
- Uses an energy function to evaluate the quality
of the loop and minimizes this function by Monte
Carlo (sampling) or molecular dynamics (MD)
techniques. - Combination
21Loops the Rosetta Method
- Find fragments (10 per amino acid) with the same
sequence and secondary structure profile as the
query sequence. - Combine them using a Monte Carlo scheme to build
the loop. -
- David Baker et al.
22Side Chains
- Side chain rotamers are dependent on backbone
conformation. - Most successful method in CASP6 was SCWRL by
Dunbrack et al. - Graph-theory knowledge based method to solve the
combinatorial problem of side chain modelling. - http//dunbrack.fccc.edu/SCWRL3.php
23Side Chains
- Prediction accuracy is high for buried residues,
but much lower for surface residues - Experimental reasonsside chains at the surface
are more flexible. - Theoretical reasonsmuch easier to handle
hydrophobic packing in the core than the
electrostatic interactions, including H-bonds to
waters.
24Side Chains
- If the seq. id is high, the networks of side
chain contacts may be conserved, and keeping the
side chain rotamers from the template may be
better than predicting new ones.
25Refinement
- Energy minimization
- Molecular dynamics
- Big errors like atom clashes can be removed, but
force fields are not perfect and small errors
will also be introduced keep minimization to a
minimum or matters will only get worse.
26Error Recovery
- If errors are introduced in the model, they
normally can NOT be recovered at a later step - The alignment can not make up for a bad choice of
template. - Loop modeling can not make up for a poor
alignment. - If errors are discovered, the step where they
were introduced should be redone.
27Validation
- Most programs will get the bond lengths and
angles right. - The Ramachandran plot of the model usually looks
pretty much like the Ramachandran plot of the
template (so select a high quality template). - Inside/outside distributions of polar and apolar
residues can be useful.
28Validation ProQ Server
- ProQ is a neural network based predictor that
based on a number of structural features predicts
the quality of a protein model. - ProQ is optimized to find correct models in
contrast to other methods which are optimized to
find native structures.
Arne Elofssons group http//www.sbc.su.se/bjorn
/ProQ/
29Structure Validation
- ProCheck
- http//www.biochem.ucl.ac.uk/roman/procheck/proch
eck.html - WhatIf server
- http//swift.cmbi.kun.nl/WIWWWI/
30Homology Modelling Servers
- Eva-CM performs continous and automated analysis
of comparative protein structure modeling servers - A current list of the best performing servers can
be found at - http//cubic.bioc.columbia.edu/eva/doc/intro_cm.ht
ml
31The Hardest Target in CASP6
- Only 8 sequence id between target and template.
Dunbrack, Wang Jin (2004) CASP6 Fold
Recognition Assessment
32Summary
- Successful homology modelling depends on the
following - Template quality
- Alignment (add biological information)
- Modelling program/procedure (use more than one)
- Always validate your final model!