Homology Modelling presentation

About This Presentation

Transcript and Presenter's Notes

Title: Homology Modelling

1
Homology Modelling

Thomas Blicher
Center for Biological Sequence Analysis

2
Why Do We Need Homology Modelling?

Ab Initio protein folding (random sampling)
100 aa, 3 conf./residue gives approximately 1048
different overall conformations!
Random sampling is NOT feasible, even if
conformations can be sampled at picosecond (10-12
sec) rates.
Levinthals paradox
Do homology modelling instead.

3
How Is It Possible?

The structure of a protein is uniquely determined
by its amino acid sequence(but sequence is
sometimes not enough)
prions
pH, ions, cofactors, chaperones
Structure is conserved much longer than sequence
in evolution.
Structure gt Function gt Sequence

4
How Often Can We Do It?

There are currently 40000 structures in the PDB
(but only 4000 if you include only ones that are
not more than 30 identical and have a resolution
better than 3.0 Å).
An estimated 25 of all sequences can be modeled
and structural information can be obtained for
50.

5
Worldwide Structural Genomics

Fold space coverage
Complete genomes
Signaling proteins
Improving technology
Disease-causing organisms
Model organisms
Membrane proteins
Protein-ligand interactions

6
Structural Genomics in North America

10 year 600 million project initiated in 2000,
funded largely by NIH.
AIM structural information on 10000 unique
proteins (now 4-6000), so far 1000 have been
determined.
Improve current techniques to reduce time (from
months to days) and cost (from 100.000 to
20.000/structure).
9 research centers currently funded (2005),
targets are from model and disease-causing
organisms (a separate project on TB proteins).

7
Homology Modeling for Structural Genomics
Roberto Sánchez et al. Nature Structural Biology
7, 986 - 990 (2000)
8
How Well Can We Do It?
Sali, A. Kuriyan, J. Trends Biochem. Sci. 22,
M20M24 (1999)
9
How Is It Done?

Identify template(s) initial alignment
Improve alignment
Backbone generation
Loop modelling
Side chains
Refinement
Validation ?

10
Template Identification

Search with sequence
Blast
Psi-Blast
Fold recognition methods
Use biological information
Functional annotation in databases
Active site/motifs

11
Alignment
12
1 2 3 4 5 6 7 8 9 10 11 12
13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA
GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO ---
--- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG ---
--- --- THR PRO GLU ALA ILE CYS
13
1 2 3 4 5 6 7 8 9 10 11 12
13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA
GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO ---
--- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG ---
--- --- THR PRO GLU ALA ILE CYS
14
Improving the Alignment
1 2 3 4 5 6 7 8 9 10 11 12
13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA
GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO ---
--- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG ---
--- --- THR PRO GLU ALA ILE CYS
From Professional Gambling by Gert Vriend
http//www.cmbi.kun.nl/gv/articles/text/gambling.
html
15
Template Quality

Selecting the best template is crucial!
The best template may not be the one with the
highest id (best p-value)
Template 1 93 id, 3.5 Å resolution ?
Template 2 90 id, 1.5 Å resolution ?

16
The Importance of Resolution
4 Å
3 Å
2 Å
1 Å
17
Ramachandran Plot

Allowed backbone torsion angles in proteins

Amino acid residue
18
Template Quality Ramachandran Plot
X-ray structure good data.
19
Backbone Generation

Generate the backbone coordinates from the
template for the aligned regions.
Several programs can do this, most of the groups
at CASP6 use Modeller
http//salilab.org/modeller/modeller.html

20
Loop Modelling

Knowledge based
Searches PDB for fragments that match the
sequence to be modelled (Levitt, Holm, Baker
etc.).
Energy based
Uses an energy function to evaluate the quality
of the loop and minimizes this function by Monte
Carlo (sampling) or molecular dynamics (MD)
techniques.
Combination

21
Loops the Rosetta Method

Find fragments (10 per amino acid) with the same
sequence and secondary structure profile as the
query sequence.
Combine them using a Monte Carlo scheme to build
the loop.
David Baker et al.

22
Side Chains

Side chain rotamers are dependent on backbone
conformation.
Most successful method in CASP6 was SCWRL by
Dunbrack et al.
Graph-theory knowledge based method to solve the
combinatorial problem of side chain modelling.
http//dunbrack.fccc.edu/SCWRL3.php

23
Side Chains

Prediction accuracy is high for buried residues,
but much lower for surface residues
Experimental reasonsside chains at the surface
are more flexible.
Theoretical reasonsmuch easier to handle
hydrophobic packing in the core than the
electrostatic interactions, including H-bonds to
waters.

24
Side Chains

If the seq. id is high, the networks of side
chain contacts may be conserved, and keeping the
side chain rotamers from the template may be
better than predicting new ones.

25
Refinement

Energy minimization
Molecular dynamics
Big errors like atom clashes can be removed, but
force fields are not perfect and small errors
will also be introduced keep minimization to a
minimum or matters will only get worse.

26
Error Recovery

If errors are introduced in the model, they
normally can NOT be recovered at a later step
The alignment can not make up for a bad choice of
template.
Loop modeling can not make up for a poor
alignment.
If errors are discovered, the step where they
were introduced should be redone.

27
Validation

Most programs will get the bond lengths and
angles right.
The Ramachandran plot of the model usually looks
pretty much like the Ramachandran plot of the
template (so select a high quality template).
Inside/outside distributions of polar and apolar
residues can be useful.

28
Validation ProQ Server

ProQ is a neural network based predictor that
based on a number of structural features predicts
the quality of a protein model.
ProQ is optimized to find correct models in
contrast to other methods which are optimized to
find native structures.

Arne Elofssons group http//www.sbc.su.se/bjorn
/ProQ/
29
Structure Validation

ProCheck
http//www.biochem.ucl.ac.uk/roman/procheck/proch
eck.html
WhatIf server
http//swift.cmbi.kun.nl/WIWWWI/

30
Homology Modelling Servers

Eva-CM performs continous and automated analysis
of comparative protein structure modeling servers
A current list of the best performing servers can
be found at
http//cubic.bioc.columbia.edu/eva/doc/intro_cm.ht
ml

31
The Hardest Target in CASP6

Only 8 sequence id between target and template.

Dunbrack, Wang Jin (2004) CASP6 Fold
Recognition Assessment
32
Summary

Successful homology modelling depends on the
following
Template quality
Alignment (add biological information)
Modelling program/procedure (use more than one)
Always validate your final model!

Write a Comment

User Comments (0)

About PowerShow.com

Homology Modelling PowerPoint PPT Presentation