Title: Tertiary Structure Prediction Methods
1Tertiary Structure Prediction Methods
Any given protein sequence
2Why Homology modelling ?
- X-ray Diffraction
- Only a small number of proteins can be made to
form crystals. - A crystal is not the proteins native
environment. - Very time consuming.
- NMR Distance Measurement
- Not all proteins are found in solution.
- This method generally looks at isolated
proteins rather - than protein complexes.
- Very time consuming
3Homology ModelingPrinciples, tools and
techniques
- Development of molecular biology rapid
identification, isolation and sequencing of
genes. - Problem time-consuming task to obtain the
3D-structure of proteins. - Alternative strategy in structural biology is to
develop models of protein when the constraints
from X-ray diffraction or NMR are not yet
available. - Homology modeling is the method that can be
applied to generate reasonable models of protein
structure.
4Database approach to homology modelling
As of June 2000, 12,500 protein structures have
been deposited into the Protein Data Bank (PDB)
and 86,500 protein sequence entries were
contained in SwissProt protein sequence
database. This is a 17 ratio relatively few
structures are known. The number of sequence
will increase much faster than the number of
structures due to advances in sequencing.
5Sequence similarity methods
These methods can be very accurate if there is
gt 50 sequence similarity. They are rarely
accurate if the sequence similarity lt 30.
They use similar methods as used for sequence
alignment such as the dynamic programming
algorithm, hidden markov models, and clustering
algorithms.
6What is Homology Modeling?
- Predicts the three-dimensional structure of a
given protein sequence (TARGET) based on an
alignment to one or more known protein structures
(TEMPLATES) - If similarity between the TARGET sequence and the
TEMPLATE sequence is detected, structural
similarity can be assumed. - In general, 30 sequence identity is required for
generating useful models.
7Structural Prediction by Homology Modeling
Structural Databases
SeqFold,Profiles-3D, PSI-BLAST, BLAST FASTA,
Fold-recognition methods (FUGUE)
Reference Proteins
Ca Matrix Matching
Conserved Regions
Protein Sequence
Sequence Alignment Coordinate Assignment
Predicted Conserved Regions
Loop Searching/generation
MODELER
Initial Model
Structure Analysis
Sidechain Rotamers and/or MM/MD
WHAT IF, PROCHECK, PROSAII,..
Refined Model
8How good can homology modeling be?
- Sequence Identity
- 60-100 Comparable to medium resolution NMR
- Substrate Specificity
-
- 30-60 Molecular replacement in crystallography
- Support site-directed mutagenesis
- through visualization
-
- lt30 Serious errors
9Significance of Protein Structure
- What does a structure offer in the way of
biological knowledge? - Location of mutants and conserved residues
- Ligand and functional sites
- Clefts/Cavities
- Evolutionary Relationships
- Mechanisms
10The importance of the sequence alignment
- the quality of the sequence alignment is of
crucial importance - Misplaced gaps, representing insertions or
deletions, will cause residues to be misplaced in
space - Careful inspection and adjustment on Automatic
alignment may improve the quality of the modeling.
11Programs for Model Protein Construction
- MODELLER 4.0
- guitar.rockefeller.edu/modeller/modeller.html
- SWISS-MOD Server
- www.expasy.ch/swissmod/SWISS-MODEL.html
- SCWRL (SideChain placement With Rotamer Library)
- www.fccc.edu/research/labs/dunbrack/scwrl/
12Protein Structural Databases
- Templates can be found using the TARGET sequence
as a query for searching using FASTA or BLAST - PDB (http//www.rcsb.org/pdb)
- MODELLER (http//guitar.rockefeller.edu/modeller/m
odeller.html) - ModBase (http//pipe.rockefeller.edu/modbase/gener
al-info.html) - 3DCrunch (http//www.expasy.ch/swissmod/SM_3DCrunc
h.html)
13Gaining confidence in template searching
- Once a suitable template is found, it is a good
idea to do a literature search (PubMed) on the
relevant fold to determine what biological
role(s) it plays. - Does this match the biological/biochemical
function that you expect?
14Other factors to consider in selecting templates
- Template environment
- pH
- Ligands present?
- Resolution of the templates
- Family of proteins
- Phylogenetic tree construction can help find the
subfamily closest to the target sequence - Multiple templates?
15Target-Template Alignment
- No current comparative modeling method can
recover from an incorrect alignment - Use multiple sequence alignments as initial
guide. - Consider slightly alternative alignments in areas
of uncertainty, build multiple models - Sequence-Structure alignment programs
- Tries to put gaps in variable regions/loops
- Note sequence from database versus sequence from
the actual PDB are not always identical
16Target-Multiple Template Alignment
- Alignment is prepared by superimposing all
template structures - Add target sequence to this alignment
- Compare with multiple sequence alignment and
adjust
17Adjusting the alignment
- Using tools such as Joy (www-cryst.bioc.cam.ac.uk/
joy/) to view secondary structure along the
alignment and use this information as criteria
for adjustments - Avoid gaps in secondary structure elements
18Secondary Structure Prediction
- The Predict Protein server
- http//www.embl-heidelberg.de/predictprotein/
- Adding secondary structure prediction algorithms
can help make decisions on whether helices should
be shortened/extended in areas of poor sequence
identity. - PHD program
19Constructing Multi-domain protein models
- Building a multi-domain protein using templates
corresponding to the individual domains - proteinA aaaaaaaaaaaaa---------------------
- proteinB -----------------bbbbbbbbbbbbbbb
- Target aaaaaaaaaaaaabbbbbbbbbbbbbbb
20Multiple model approach
- Reminder Consider the effects of different
substitution matrices, different gap penalties,
and different algorithms. (Vogt et al. J. Mol.
Biol. 1995, 249816-831.) - Construct multiple models
- Use structural analysis programs to determine
best model
Jaroszewski, Pawlowski and Godsik, J. Molecular
Modeling, 1998, 4294-309 Venclovas, Ginalski and
Fidelis. PROTEINS, 1999, 373-80 (Suppl)
21Model Building
- Rigid-Body Assembly
- Assembles a model from a small number of rigid
bodies obtained from aligned protein structure - Implemented in COMPOSER
- Segment Matching
- Satisfaction of Spatial Restraints
- MODELLER
- guitar.rockefeller.edu/modeller/modeller.html
22Initial model and procedures
- Calculate coordinates for atoms that have
equivalent atoms in the templates as an average
over all templates - CHARMM internal coordinates are used for
remaining unknown coordinates - Generate stereochemical and homology derived
restraints
23Modeller
- Main input are restraints on the spatial
structure of AA and ligands to be modeled. - Output is a 3D structure that satisfies these
restraints - Restraints are obtained from related protein
structures (homology modeling) - obtained
automatically, NMR structures, secondary struture
packing and other experimental data
24Spatial restraints ?
- Minimizes the objective function, F, with respect
to the Cartesian coordinates of the protein atoms - F(R) Sci (fi,pi)
- R are the cartesian coordinates of the atoms
- c is a restraint dependant on f,p
- f is a geometric feature of a molecule and
include the distance, angle and dihedral values - p are parameters to help describe some restraints
25What are the Restraints ?
- distances, angles, dihedral angles, pairs of
dihedral angles and some other spatial features
defined by atoms or pseudo atoms.
26Sidechain Conformation
- Protein sidechains play a key role in molecular
recognition and packing of hydrophobic cores of
globular proteins - Protein sidechain conformations tend to exist in
a limited number of canonical shapes, usually
called rotamers - Rotamer libraries can be constructed where only
3-50 conformations are taken into account for
each side chain
27Sidechains on surface of protein
- Exposed sidechains on surface can be highly
flexible without a single dominant conformation - So ultimately if these solvent exposed sidechains
do not form binding interactions with other
molecules or involved in say, a catalytic
reaction, then accuracy may not be crucialalso
look at the B-factors - Can refine the sidechains with molecular
mechanics minimization - Sampling?
- Scoring?
28Errors in Homology Modeling
- a) Side chain packing b) Distortions and
shifts c) no template
29Errors in Homology Modeling
- d) Misalignments e) incorrect
template - Marti-Renom et al., Ann. Rev. Biophys. Biomol.
Struct., 2000, 29291-325.
30Detection of Errors
- First check should include a stereochemical check
on the modeled structurePROCHECK, WHATCHECK,
DISTAN which will show deviations from normal
bond lengths, dihedrals, etc. - Visualization follow the backbone trace and then
subsequently move out to Ca-Cß orientation.
31PROCHECK
http//www.biochem.ucl.ac.uk/roman/ procheck/proc
heck.html