Title: Homology Modeling: Principles, tools and techniques
1Homology ModelingPrinciples, tools and
techniques
- Supa Hannongbua
- Department of Chemistry,
- Faculty of Science,
- Kasetsart University, Bangkok 10900, THAILAND
- fscisph_at_ku.ac.th http//kuchem.sci.ku.ac.th
Supa Hannongbua
2Introduction
- Development of molecular biology rapid
identification, isolation and sequencing of
genes. - Problem time-consuming task to obtain the
3D-structure of proteins. - Alternative strategy in structural biology is to
develop models of protein when the contraints
from X-ray diffraction or NMR are not yet
available. - Homology modeling is the method that can be
applied to generate reasonable models of protein
structure.
3What is Homology Modeling?
- Predicts the three-dimensional structure of a
given protein sequence (TARGET) based on an
alignment to one or more known protein structures
(TEMPLATES) - If similarity between the TARGET sequence and the
TEMPLATE sequence is detected, structural
similarity can be assumed. - In general, 30 sequence identity is required for
generating useful models.
4Structural Prediction by Homology Modeling
Structural Databases
SeqFold,Profiles-3D, PSI-BLAST, BLAST FASTA
Reference Proteins
Ca Matrix Matching
Conserved Regions
Protein Sequence
Sequence Alignment Coordinate Assignment
Predicted Conserved Regions
Loop Searching/generation
MODELER
Initial Model
Structure Analysis
Sidechain Rotamers and/or MM/MD
WHAT IF, PROCHECK, PROSAII,..
Refined Model
5How good can homology modeling be?
- Sequence Identity
- 60-100 Comparable to medium resolution NMR
- Substrate Specificity
-
- 30-60 Molecular replacement in crystallography
- Support site-directed mutagenesis
- through visualization
-
- lt30 Serious errors
6Significance of Protein Structure
- What does a structure offer in the way of
biological knowledge? - Location of mutants and conserved residues
- Ligand and functional sites
- Clefts/Cavities
- Evolutionary Relationships
- Mechanisms
7The importance of the sequence alignment
- the quality of the sequence alignment is of
crucial importance - Misplaced gaps, representing insertions or
deletions, will cause residues to be misplaced in
space - Careful inspection and adjustment on Automatic
alignment may improve the quality of the modeling.
8Programs for Model Protein Construction
- MODELLER 4.0
- guitar.rockefeller.edu/modeller/modeller.html
- SWISS-MOD Server
- www.expasy.ch/swissmod/SWISS-MODEL.html
- SCWRL (SideChain placement With Rotamer Library)
- www.fccc.edu/research/labs/dunbrack/scwrl/
9Protein Structural Databases
- Templates can be found using the TARGET sequence
as a query for searching using FASTA or BLAST - PDB (http//www.rcsb.org/pdb)
- MODELLER (http//guitar.rockefeller.edu/modeller/m
odeller.html) - ModBase (http//pipe.rockefeller.edu/modbase/gener
al-info.html) - 3DCrunch (http//www.expasy.ch/swissmod/SM_3DCrunc
h.html)
10Gaining confidence in template searching
- Once a suitable template is found, it is a good
idea to do a literature search (PubMed) on the
relevant fold to determine what biological
role(s) it plays. - Does this match the biological/biochemical
function that you expect?
11Other factors to consider in selecting templates
- Template environment
- pH
- Ligands present?
- Resolution of the templates
- Family of proteins
- Phylogenetic tree construction can help find the
subfamily closest to the target sequence - Multiple templates?
12Target-Template Alignment
- No current comparative modeling method can
recover from an incorrect alignment - Use multiple sequence alignments as initial
guide. - Consider slightly alternative alignments in areas
of uncertainty, build multiple models - Sequence-Structure alignment programs
- Tries to put gaps in variable regions/loops
- Note sequence from database versus sequence from
the actual PDB are not always identical
13Differences in multiple sequence alignments
- Inserting gap at ends of helix versus in the
middle - When gaps are placed at the ends of helices, all
models from these alignments resulted in rmsd.
versus actual of 1.3-1.8 Ã…. - In another helical region, placing them in the
middle results in rmsd. of 2.0 Ã… versus less
than 1.0 Ã… for correct alignment.
14Differences in multiple sequence alignments
- Inserting gaps into the middle of helices and
misaligning - For residues 75-95, this caused a rmsd. between
model and actual of 5-7 Ã… versus less than 1.0 Ã…
for the correct alignment. - For residues 95-115, which include a random
section, the rmsd. is 5.0 Ã… versus 1.5 Ã… for
correct alignment.
15Target-Multiple Template Alignment
- Alignment is prepared by superimposing all
template structures - Add target sequence to this alignment
- Compare with multiple sequence alignment and
adjust
16Adjusting the alignment
- Using tools such as Genedoc (www.psc.edu/biomed/ge
nedoc) to view secondary structure along the
alignment and use this information as criteria
for adjustments - Avoid gaps in secondary structure elements
- Use MEME to find a relatively large number of
well conserved regions
17Secondary Structure Prediction
- The Predict Protein server
- http//www.embl-heidelberg.de/predictprotein/
- Adding secondary structure prediction algorithms
can help make decisions on whether helices should
be shortened/extended in areas of poor sequence
identity. - PHD program, output can be read by Genedoc.
18Constructing Multi-domain protein models
- Building a multi-domain protein using templates
corresponding to the individual domains - proteinA aaaaaaaaaaaaa---------------------
- proteinB -----------------bbbbbbbbbbbbbbb
- Target aaaaaaaaaaaaabbbbbbbbbbbbbbb
19Multiple model approach
- Reminder Consider the effects of different
substitution matrices, different gap penalties,
and different algorithms. (Vogt et al. J. Mol.
Biol. 1995, 249816-831.) - Construct multiple models
- Use structural analysis programs to determine
best model
Jaroszewski, Pawlowski and Godsik, J. Molecular
Modeling, 1998, 4294-309 Venclovas, Ginalski and
Fidelis. PROTEINS, 1999, 373-80 (Suppl)
20Model Building
- Rigid-Body Assembly
- Assembles a model from a small number of rigid
bodies obtained from aligned protein structure - Implememted in COMPOSER
- Segment Matching
- Satisfaction of Spatial Restraints
- MODELLER
- guitar.rockefeller.edu/modeller/modeller.html
21Initial model and procedures
- Calculate coordinates for atoms that have
equivalent atoms in the templates as an average
over all templates - CHARMM internal coordinates are used for
remaining unknown coordinates - Generate stereochemical and homology derived
restraints
22Spatial restraints ?
- Minimizes the objective function, F, with respect
to the Cartesian coordinates of the protein atoms - F(R) Sci (fi,pi)
- R are the cartesian coordinates of the atoms
- c is a restraint dependant on f,p
- f is a geometric feature of a molecule and
include the distance, angle and dihedral values - p are parameters to help describe some restraints
23Homology and Sterochemical Restraints
- Initial model is an average over all templates
- Stereochemical bond, angle, dihedral, improper
- Homology mainchain and sidechain dihedrals
- mainchain CA-CA distances
- sidechain-mainchain distances
- sidechain-sidechain distances
- Non-bonded pairs (on the fly)
- user-defined
24Sidechain Conformation
- Protein sidechains play a key role in molecular
recognition and packing of hydrophobic cores of
globular proteins - Protein sidechain conformations tend to exist in
a limited number of canonical shapes, usually
called rotamers - Rotamer libraries can be constructed where only
3-50 conformations are taken into account for
each side chain
25Coupling between mainchain and sidechain
- Mainchain shifts (0.2 0.5 Ã…) cause increased
sidechain coordinate errors (0.1 0.8 Ã… ),
torsional errors of 10-30º and exaggerated strain
energy for overpacked mutants compared with the
correct mutant backbones. - Lee, C. Folding and Design, 1995, 11-12
26Sidechains on surface of protein
- Exposed sidechains on surface can be highly
flexible without a single dominant conformation - So ultimately if these solvent exposed sidechains
do not form binding interactions with other
molecules or involved in say, a catalytic
reaction, then accuracy may not be crucialalso
look at the B-factors - Can refine the sidechains with molecular
mechanics minimization - Sampling?
- Scoring?
27Clustering the ensemble
- Cluster analysis, based on overall fold, followed
by selection of the structure closest to the
centroid of the largest cluster is likely to
identify a structure more representative of the
ensemble than the commonly used minimized average
structure
NMRCLUST (http//neon.chem.le.ac.uk/nmrclust/prot
ocol.html)
28Errors in Homology Modeling
- a) Side chain packing b)Distortions and
shifts c) no template
29Errors in Homology Modeling
- d) Misalignments e) incorrect
template - Marti-Renom et al., Ann. Rev. Biophys. Biomol.
Struct., 2000, 29291-325.
30Detection of Errors
- First check should include a stereochemical check
on the modeled structurePROCHECK, WHATCHECK,
DISTAN which will show deviations from normal
bond lengths, dihedrals, etc. - Visualization follow the backbone trace and then
subsequently move out to Ca-Cß orientation.
31PROCHECK
http//www.biochem.ucl.ac.uk/roman/ procheck/proc
heck.html
32Dihydrofolate Reductase (DHFR) multiple sequence
alignment
33Dihydrofolate Reductase (DHFR) alignment
34(No Transcript)