Title: Comparative Protein Modeling
1Comparative Protein Modeling
Jason Wiscarson (jwiscarson_at_gmail.com), Lloyd
Spaine (llspaine_at_gmail.com)
- Sequence Alignment and Modeling System with
Hidden Markov Models (SAM)-T02 provides sequence
alignment from the target sequence to all
templates in steps - Find sequences similar to the target sequence.
- Predict the secondary structure.
- Find probable templates for threading.
- Align the target with the templates.
- Construct a fragment library for the target.
- Build a 3-D model of the target.
- Threading different proteins that have similar
structures - Creates pseudo-protein models based on solved
proteins. - Calculates energy value for the pseudo-protein
models. - Ranks the alignments based on that energy value.
Selecting Templates and Improving Alignments
Introduction
Protein Model Refinement
Comparative or homology modeling, is a
computational tool used to predict
three-dimensional structure of proteins with
unknown structures. If the sequence and the
protein share sequence similarity, proteins with
known 3-D structures may serve as templates to
predict the unknown protein structure. The term
homology refers to evolutionary relationship
between two or more proteins that have the same
ancestor in an evolution tree regardless of their
sequence similarity. Proteins from similar
families often have similar functions, yet there
are many instances in which proteins have similar
structure but different functions. Therefore the
process to construct 3-D models of proteins shown
in Figure 1 is paramount.
- Side-Chains with Rotamer Library (SCWRL)
determines the most likely side-chain
conformations by - Reading the initial structure and determining
possible low energy side-chain conformations
(rotamers). - Defining disulfide bridges and performing a
dead-end elimination to get rid of rotamers. - Constructing a residue graph and determining the
rotamer clusters and outputing the final
structure. - Molecular Mechanics (MM) is a method that
removes repulsive contacts between side chains by
allowing the side chains to relax to low-energy
rotamers. - Molecular Dynamics (MD) simulation involves
- Warm-up, equilibrium, cool down
- Sampling the trajectory during a production run
time period and analyzing results. -
- Molecular Dynamics with Simulated Annealing
(MD-SA) is an optimization method that works by
heating a system, samples many energy states, and
then slowly cools the system to ensure that the
low-energy structures are found.
The first step is to improve the alignment and
select the template. This is where the sequence
of interest (target) and other sequences and
structures (template) are aligned. Afterwards,
the best templates are chosen based on
evolutionary distance as determined by a
phylogenic tree. Selecting Templates
structure for a protein model is done by
considering R-factor (residual index), the value
that relates how well predicted structure matches
experimental electron density maps. Improving
Sequence Alignment With Primary and Secondary
Structure Analysis is used to reveal regions rich
in proline, glutamic acid, serine, and threonine
(PEST regions) ? locate sequence repeats predict
percentage of buried versus accessible residues
and provide information about proteins
isoelectric point. Pattern and Motif-Based
Secondary Structure Prediction AA sequence ? 3D
structure. Well-known pattern and motif-based
secondary structure prediction methods include
PSIPRED, GenTHREADER, PREDATOR, PROF, MEMSAT, and
PHD.
Sequence Alignment
Find known sequences and 3-D structures related
to the target protein
- Alignment based on evolutionary history is done
to amino acid residues of target protein. The
types of alignment are - Global alignment of regions that lack similarity
and then search for similar regions. - Local alignment in regions with significant
similarity first, and then align regions of
optimally aligned residues. - To prepare sequences a database Sequence to
Coordinates (S2C) is used to examine the
differences that originate from the mutagenesis
studies. - Alignment programs differ in the methods used
but they score or evaluate the final alignment
using gap penalties, similarity matrices and
alignment scores. - Similarity Matrices describe the probability of
a specific amino acid residue mutating to a
different residue type. Common similarity
matrices include - Point-Accepted Mutation per 100 amino acid
residues (PAM), is based on the probability of an
amino acid residue mutating to another amino acid
residue. - BLOck SUstitution Matix (BLOSUM) matrices is
similar to PAM but uses more diverse set of
sequences. - Gonnet similarity matrices index and reorganize
amino acids using a tree on small cluster of
computers. - Clustal is an alignment program that aligns
large sequences of varying similarity quickly.
Sequences are progressively aligned based on the
branching order in the phylogenetic tree. - Tree-Based Consistency Objective Function for
Alignment Evaluation (T-Coffee) is a method to
rectify progressive-alignment (heuristic) methods
where errors in the first alignment cannot be
corrected as other sequences are added to the
alignment. It suffers from greediness, its
inability to correct errors (addition or
extension of a gap). - Divide-and-Conquer Alignment (DCA) method aligns
sequences simultaneously. It uses the multiple
sequence simultaneously (MSA) methodology.
Final Model
Align the target and template amino acid residues
Evaluate Model
Refine Model
Select templates and adjust/improve the alignments
Construct Model
Evaluating Protein Models
Figure 1 Flow chart that shows construction of
comparative protein models. The solid lines
represent comparative modeling steps, and dotted
lines represent parameters (template, alignment,
construction environment, or refinement method)
that can improve the quality of the protein model
Several methods exist to check imperfections in
the models including PROCHECK which does
statistical checks and indicates regions of a
protein structure that might require modification
because of nonoptimal stereochemistry. Verify
3D scores 3-D models with probability table and
assess probability that each amino acid residue
would occupy specific position in the 3-D
structure. ERRAT examines nonbonded distances
of C-C, C-N, C-O, N-N, N-O, and O-O atoms.
Protein Structure Analysis (ProSa) uses
potential of mean force which is change in
potential energy of a system caused by the
variation of a specific coordinate to locate the
regions of the protein structure that may contain
improper or unsuitable geometries. Protein
Volume Evaluation (PROVE) uses computed volume of
individual atoms as a means of evaluating the
viability of a protein model. Model Clustering
Analysis uses NMRCLUST, NMRCORE, and OLDERADO
which are programs that aid in the superposition
and clustering of protein structure.
Constructing Protein Models
Finding related sequences and structures
- Satisfaction of Spatial Restraints (SSR)
constructs a 3-D protein model using spatial
restraints based on distances, bond angles,
dihedral angles, dihedral pairs, etc. - Segment Match Modeling (SMM) constructs protein
by - Choosing protein template.
- Building list of possible template matches
- Sorting templates by best fit to targets
structure. - Using probabilities to select the best segment
from a low pseudo-energy subset group. - Moving coordinates from best segments template
protein. - Multiple Template Method (MTM) uses solved X-ray
structures to build the target sequences protein
model. - 3D-JIGSAW creates a homology model
- Select and align templates, based on sequence.
- Select template segments.
- Create backbone (framework, scaffold).
- Add side chains, refine and evaluate target
protein model. -
In comparative protein modeling several databases
are used to find genomic, amino acid, and protein
data. The Expert Protein Analysis System
(ExPASy) is the start for searching for proteins
and their related sequences. Swiss-Prot contains
data that has been refined by removing
unnecessary information and TrEMBL receives and
stores initial genomics data. PROSITE uses
tertiary structure and key amino acid residues
based on biologically significant
patterns. ENZYME retrieves an enzymes
recommended name, alternative names, catalytic
activity, cofactors, human genetic diseases, and
cross-references. SWISS-MODEL holds comparative
protein models that do not have a known 3-D
structure. Basic Local Alignment Search Tool
(BLAST) uses protein sequence to search and
analyze the sequences of interest locates
similar protein sequences sequence
alignments. Protein Data Bank (PDB) is a
repository for experimentally determined protein
3-D structures.
References
1 Esposito, E. X. Tobi, D. Madura, J. D.
Comparative Protein Modeling Reviews in
Computational Chemistry, Volume 22, 2006,
Wiley-VCH, John Wiley Sons, Inc. to be
published. 2 Ramachandran Plot and analine
structure http//www.cgl.ucsf.edu/home/glasfeld/t
utorial/AAA/AAA.html
Figure 2 Peptide bonds create rigid plates which
rotate about phi and psi.
Figure 3 A Ramachandran plot for the tripeptide
in Figure 2.