Title: Protein Structure Prediction
1Protein Structure Prediction
2What is PSP ?
3Primary sequence (1D)
Tertiary Structure (3D)
all bonds angles, dihedral angles and bond
lengths between each amino acid residue in
protein
4Solving PSP
5View PSP as a search
- Given any primary sequence of an unknown protein
(in the sense of its 3D structure) - Consider PSP as performing a search through the
configuration space of the given protein
The space of different configurations
6Steps in solving PSP
- Given primary sequence predict the final 3D
structure - (1D 3D).
- 2 Step process (1D 2D, 2D 3D)
- 1st find configuration for the secondary
structure (SS Prediction) - 2nd find configuration for the side-chains
(side-chain conformation)
7Required components in solving PSP
- All methods require the definition of a protein
model - A simplified protein structure model
- A potential energy function
8Simplified structure model of Protein
- By the above we mean that the protein in
question has simpler physical properties then an
actual protein - This is needed as trying to solve PSP is too
complex for real proteins - Good simplified models give a good approximation
for the actual shape of the protein - Determining a good model is a research area by
itself
9Example of Simplified Model of Protein
Simplified model of the protein backbone
Actual model of the protein backbone
3 dihedral angles
is the only dihedral angle
Bond angles
10Potential Energy Function
- How do we know when a predicted structure is the
native shape of the protein ?
In thermodynamics, A molecule is most
stable when its free energy is at a minimum
native shape is at a free energy minimum
- The potential energy function is a
simplification of actual forces acting on a real
protein molecule and its formulation is based on
the given simplified structural model
11Example of Potential Energy Function
Purpose Minimize
Hydrophobic Interaction
Van der Waals Interaction
Evdw Cv fvdw
Van der Waals Potential
rij distance between atom i and j Ri van der
waals radii of atom i
Summation over all atoms with rij lt 8A A
Angstrom 1 ten-billionth of meter
12Different approaches to PSP
- Ab Initio Methods
- Knowledge Based Methods
13Ab Initio Methods
14What is Ab Initio ?
- Ab Initio means from 1st principles
- Use thermodynamic laws to figure out the
configuration of the fold of the given protein
protein folding problem -
- Global/semi-global minimization of the function
- 1D 2D secondary structure problem
- 2D 3D side-chain conformation
15Some Ab Initio Methods
- Molecular Dynamic Simulation
- Using complex energy functions simulate folding
of the primary sequence until it reaches its
native state (1D-gt3D) - Genetic Algorithm
- Used in refining a given potential function so
that it can best predict the native state of a
protein - Simulated Annealing
- Branch and Bound Methods (usually used in
side-chain conformation) - Approximation algorithms
- Comparative/Homologue Modeling
- Threading
- Docking
16Knowledge Based Methods
17Knowledge Based Methods
- Using knowledge of currently known protein
folds, predict the shape of the target protein - Assumption is the native fold of the target
protein is similar to a currently known one i.e
in the same family - Unable to predict any novel folds, i.e new fold
family
18Some Knowledge-Based Methods
- Comparative/Homologue Modeling
- Threading
- Docking
19Methodological Framework for solving PSP
Primary Protein Sequence
Knowledge-base, e.g PDB
Ab Initio Methods
Homologue Modeling
Threading
Predicted 3D Structure of Protein
20Side-Chain Prediction
- Find a conformation of the all the side chains
along the given main chain of a protein - Usually done as the 2nd step in predicting the
3D structure of protein - Also useful in drug design, where drug
structures have to be designed to be easily
docked by enzymes for breaking down
21Side-Chain Prediction
- The main chain fold has been computed and given
as input - choose positions of all side chains so as to
minimize some potential energy function - Problem if solved Ab Initio is proven to be
NP-Complete (reduce Clique to it)
22Central Dogma
- The more tightly packed Side-chains are, the
more stable they will be. - Ponder Richards have shown that there are a
fixed set of rotations (rotamers) side-chains can
take. - Most methods now make use of this library of
rotamers (abt 67 different rotations) - Main concern is the search strategy to find the
best conformation
23Methods in Side-chain prediction
- Simulated Annealing
- A algorithm
- Monte Carlo Minimization
- Molecular Dynamics Simulation
- Dead End Elimination
- Genetic Algorithm
24Dead End Elimination
- Deterministic method to determine the global
minimum energy conformation (GMEC) of set of
side-chains. - Continuously eliminate rotamers from
consideration in the GMEC, until only 1 rotamer
is left in each side-chain position (thus giving
final conformation). - DEE can be viewed as a mathematical criteria
that a rotamer must fulfill in order not to be
eliminated
25Dead End Elimination
- Potential function is described in terms of
pair-wise interactions of all rotamers at all
positions. - Therefore energy function to minimized can be
formulated as
Sum of pairwise interaction energy
between rotamer r at position j and rotamer u at
positions j
energy of the given backbone fold
Sum of energy of rotamer r at side chain i
26Dead End Elimination
- Assuming p side-chains and n rotamers for each
side-chain - Time complexity of finding the configuration
that minimizes the energy function takes O(pnp) - Not feasible to use the original formulation
27Original DEE
- A rotamer ir can be eliminated from
consideration if there is an alternative rotamer
it at the same position that satisifies
Maximum pairwise interaction energy between it
and every other side-chain j
Energy resulting from using rotamer r at position
i
Minimum pairwise interaction energy between ir
and every other side-chain j
Energy resulting from using rotamer t at position
i
28Original DEE
- Given some relevant energy landscape, the
previous inequality in fact says the following
Rotamer r at side-chain i can only be eliminated
only if maximum energy conformation using rotamer
t at side-chain i is smaller than the minimum
energy conformation of using rotamer r
29Original DEE
- A simplistic implementation of Original DEE by
simply translating the inequality to code result
in a time complexity of O(n2p2) - There is however still a problem if the
following happens
30Simple Goldstein DEE
- A rotamer ir can be eliminated from
consideration if there is an alternative rotamer
it at the same position that satisifies
Energy resulting from using rotamer r at position
i
Energy resulting from using rotamer t at position
i
Difference in energy of conformation using ir and
conformation using it which are at the point of
closest contact
31Simple Goldstein DEE
- Given some relevant energy landscape, the
previous inequality in fact says the following
Rotamer r at side-chain i can only be eliminated
by both totamer t1 and rotamer t2 since the
difference is ve at the points of closest
contact. Meaning for any given conformation
using t1 or t2 will result in a smaller overall
energy than using r
32Simple Goldstein DEE
- A simplistic implementation of Simple Goldstein
DEE by simply translating the inequality to code
result in a time complexity of O(n3p2) - There is however still a problem if energy
profiles of rotamer r and every other rotamer
intersect. - More powerful criteria will have to be used
- General GoldStein DEE, Simple Split DEE and
general Split DEE - The more powerful the criterion the higher its
time complexity
33Conclusion
- Myriad of methods to attempt to solve the
protein prediction problem - Knowledge-based methods have gained a edge over
Ab initio methods - However not much improvement in the prediction
power of modern heuristics, since the 1st
experiment by Anfisen 3 decades ago - Either problem is too hard / More discovery
awaits the adventurous researcher