Title: An Efficient ProteinProtein Docking Algorithm
1An Efficient Protein-Protein Docking Algorithm
____Physicochemical and Residue Conservation
Approach Yuhua Duan1, Boojala Reddy2 and
Yiannis Kaznessis1,2 1Department of Chemical
Engineering and Materials Science, 2Digital
Technology Center, University of Minnesota
1BRS
1BRC
Results
1ATN
Free Energy and Renormalized Rank
Introduction
- With some approximation, the free energy change
can be divided into several terms - ?G ?Ges ?Gcav ?Gbonding ?Gcoulomb
?Gpol SskAk ?Gbonding - The individual terms can be calculated
separately. - ?Gcoulomb and ?Gpol are calculated by the
Generalized Born model with the Debye-Huckel
approximation. - The desolvation term SskAk can be obtained by
calculating the solvent-accessible-surface-area
Ak for each residue k and the optimizing weight
sk - The bonding term ?Gbonding can be expressed with
by using self-consistent Lennard-Jones form in
which parameters (e, s ) took from AMBER and
CHARMM force feild. - The normalized rank is obtained by the value span
of each descriptor. - The global ranking is weighted the normalized
rank where the weights were obtained by
correlation-coefficent calculations.
1KXQ
1MEL
We have employed docking calculations and
atomistic simulations to determine the structure
and the binding affinity of protein-protein
complexes. By exploring the interaction
interface, we find that the conservation
information can improve the docking rank. Here we
present our docking scheme and apply on a
59-benchmark complexes.
1FIN
1SPB
1STF
1PPE
Docking Procedure and Energy Minimization
Figure 1. The improvement after filtering. The
results are (1) 48 out of 60 complexes have
I_factgt1 (2) there are 5 complexes (1AVW, 1BQL,
1EFU, 1FIN, 1GOT) with I_fact1 because FTDock
did not generate hits to begin with (3) There
are 3 complexes for which our filters worsen the
results (1FSS, 1IGC, 1MAH) with 1gtI_factgt0 after
filtering There are 4 complexes (1EO8, 1L0Y,
1NCA, 1QFU) for which our combined filter failed
with I_fact0.0 since all of near-native
structures (2,1,7 and 5 respectively) were
filtered out. After applying only filter II for
these 7 complexes, I_facts were improved.
2SIC
4HTC
2MTA
- We studied with 59 benchmark complexes suggested
by Chen et al1. - For each protein complex, we employ docking
calculations using FTDock package2,3 to get
10,000 possible complexes and we obtained the
shape complementarity rank and pair potential
rank. - For each possible complex, using CHARMM
molecular mechanics simulations4 we minimized
the side-chain structure, and obtained an
estimate of the free energy for the generated
complexes. - Appliy the residue conservation filter to reduce
the number of possible strucutres to a small
number. - Using normalized ranking scheme for all
descriptors to get a global rankfor the subset of
complexes5,6.
Figure 3. Selected structures from our global
predictions. Red and blue indicate the
experimental co-crystal. Green and purple
indicate the best prediction of rank less than 10
determined by equation (12). For 1FIN, the
bound-bound (green and purple) and
unbound-unbound (black and white blue) results
are shown in the same figure.
Conclusion
- We described the considerable improvement in
ranking of the Ftdock generated model complexes
using the residue conservation filter. Using
conservation information we significantly reduce
the number of docking solutions(Table 2). - We also achieve ranking improvement for low RMSD
structures with our renormalized ranking scheme.
As we determine residue conservation in the
functionally interacting natural proteins, such
as enzyme-inhibitor complexes, we need to give
higher ranks for the models with higher number of
conserved positions in the interface region. In
the case of unnatural interactions such as
antigen-antibody complexes the interacting
regions are highly variable, and we need to give
higher ranks for the models with low numbers of
conserved positions(Table 1). - Our algorithms can be adopted with our docking
software to improve the rank.
Our Filters
Residue Conservation
- Filter I Using homologous sequences we
calculated conservation indices for each docked
model. We have identified the top 8 (defined as
group 1) and top 17 (defined as group 2) of
highly conserved and well-exposed surface
residues, in each polypeptide chain of the
interacting complex. We counted the total number
of group 1 and group 2 positions in each modeled
complex interface region. Using the group 1 and
group 2 conservation positions as a filter, the
total number of docked models are reduced. We
selected only the models, which have at least 4
of group 1 positions or 6 of group 2 positions in
the interface region of the enzyme-inhibitor
model complexes. In the case of antigen-antibody
complexes, we have reversed the selection,
limiting to 2 or less group 1 positions and 4 or
less group 2 positions(see Table 1). Sum all
conservation indices for all residues and use it
as conservation rank. - Filter II If the rank of a complex is worse than
1,200 in any of the four rankings(conservation
rank, shape-complementarity, pair-potential, and
desolvation energy) then the corresponding model
is filtered out of the set of putative near
native structures. Filter II is performed with
only three ranks if conservation information is
not available.
Homologous sequences Using the FASTA3 sequence
similarity search tool we obtained homologous
sequences from an annotated non redundant protein
sequence data base (SWALL). Homologous sequences
with less than 30 gaps in the sequence and
greater than 35 sequence identity to the parent
sequence were used for analysis. Evolutionary
Distance Evolutionary distance among the
sequences is calculated using the structure based
amino acid substitution matrix7. A similarity
score Sii for sequence i is calculated by summing
the identical substitution values of the residues
a and b from the substitution matrix M(a,b). An
evolutionary distance (EDij) between the two
sequences is calculated
Current and Future Work
Conservation Index of Residue Position As
described above evolutionary distances between
the reference sequence and its homologues were
used to calculate residue conservation index
(CIl) for each position l using amino acid
substitution matrix, similar to the amino acid
conservation used by Valdar and
Thornton8.Conservation Index (CIl) is a
weighted sum of all pair wise similarities
between all residues present at the position. The
CIl value is calculated in a given alignment and
takes a value in the range 0.0 to 1.0.
- Combine our filters into docking model generator
to get more hits within the modeled structures. - Dissecting the structures of known
repressor-operator(TetR/TetO) complexes we use
computationally efficient simulations to
calculate the binding affinity of
repressor-operator complexes and identify the
protein residues that play a central role in
binding and are amenable to mutations.
References
- Chen, R., Mintseris, J., Janin, J., and Weng, Z.,
Proteins. 52, 88-91(2003) - Gabb, H.A., Jackson, R.M., and Sternberg, M.J. J.
Mol. Biol. 272, 109-120(1997) - Moont, G., Gabb, H.A., and Sternberg, M.J.,
Proteins. 35, 364-373(1999) - Brooks, B.R., Bruccoleri, R.E., Olfson, B.D.,
States, D.J., Swaminathan, S., and Karplus, K.,
J. Comp. Chem. 4, 187-217(1983) - Reddy,B.V.B. and Kaznessis, Y., Submitted to
Protein Engineering, 2004 - Duan, Y., Reddy, B.V.B. and Kaznessis, Y.,
Protein Science, in press, 2004 - Gonnet, G.H., Cohen, M.A., and Benner, S.A.,
Science 256, 1443-1445(1992) - Valdar, W.S., and Thornton, J.M., Proteins. 42,
108-124(2001)
Where N is the number of homologous sequences in
the alignment si(l) and sj(l) are the amino
acids at the alignment position l of sequences si
and sj respectively ED(si) and ED(sj) are the
average evolutionary distance of s(i) and s(j)
from the remaining homologues. Mut(a,b) measures
the similarity among the amino acids a and b as
derived from amino acid substitution matrix
M(a,b) and defined as
Acknowledgements
a,b are the pairs of amino acids at a given
alignment position l. M(a,b)low is the lowest
value in the substitution matrix and M(a,b)max is
the maximum value among all the possible
substitution pairs in that position. Thus the
Mut(a,b) takes a value in the range 0 to 1. We
have identified the top 8 and 17 of highly
conserved residues, which have solvent
accessibility greater than 25 of their total
surface area.
This work is partially supported by the Army High
Performance Computing Center (AHPCRC) under the
auspices of the Department of the Army, Army
Research Laboratory (ARL) under contract number
DAAD19-01-2-0014. We also thanks the University
of Minnesota Digital Technology Center for
support.
Figure 2. Global ranking for 18 complexes. RMSD
versus rank score (eq.(12)) for those decoys from
FTDock and filtered by our filters.