Title: Physicochemical Methods for Protein Function Prediction
1Physicochemical Methods for Protein Function
Prediction
- Mary Jo Ondrechen
- Dept of Chemistry Chemical Biology
- Northeastern University
- Boston, MA 02115
2THEMATICS
- Genomics and proteomics
- About titration curves
- Method for active site location and
characterization - Examples
- Future directions and conclusions
3The post-genomic path
- Genome sequence
- Protein sequence
- Protein structure
- Protein function
- Active site location and characterization, drug
design, understanding protein function, normal
and disease processes
4PROTEOMICS
- Structural genomics
- rapidly discovering new protein structures, many
of unknown function - The Next Frontier
- Characterizing the 106 proteins for which genes
hold the code.
5Predicting Protein Function
- Protein structure and protein function are not
well correlated. Need methods to predict
function from structure (or sequence). - THEMATICS Theoretical Microscopic Titration
Curves a reliable way to locate and
characterize enzyme active sites.
6Typical Experimental Titration Curve
7In the absence of a field, acids obey
Henderson-Hasselbalch
- pH pKa log10A-/HA
- which may be rewritten in terms of the average
charge as a function of pH - C _ 10pH / (10pH 10pKa) OR
- C 10pKa / (10pH 10pKa)
- where C is the mean net charge
8C(pH) for a typical residue
9Typical weak acid/base narrow window of
reactivity
- When the pH is close to the pKa, a weak acid/base
is available to act as both an acid and a base - By definition of a catalyst, the enzyme must
regenerate itself before one cycle is over. - for HA B? ? A? HB,
- reaction proceeds both ways if HA and HB have
matched pKas
10A common 1st step in enzyme catalysis -
- Deprotonate a C-H bond
- C-H B ? C? HB
- What is required of B?
- It must be a strong enough base
- It must be deprotonated at neutral pH
- Mutually contradictory requirements (for a
Henderson-Hasselbalch acid/base)
11A better way
- Catalytic base Lysine39 is a very strong base
- AND is partially deprotonated at neutral pH!
12Perturbed titration curves
- Enable residue to act as acid/base over a wide pH
range - Precise pH adjustment not needed
- Precise pKa match not required
- Enable residue to have right mix of chemical
properties - acid (or base) strength
- right protonation state at neutrality
13Perturbed curves
- Have been noticed before (in titration curves
obtained computationally for proteins) - We now understand are significant
- Are markers of chemical reactivity
- Can be used to locate active site
- M.J. Ondrechen, J.G. Clifton and D. Ringe, Proc.
Natl. Acad Sci USA 98, 12473-12478 (2001)
14THEMATICS
- Theoretical Microscopic Titration Curves
- Conceptually simple
- Require a known structure
- Can be computed
- Highly reliable identifier of active site
- Characterize enzyme active site
15Complementary to other methods
- THEMATICS complements well other methods that
predict, or provide clues about, function - Evolutionary history sequence relationships
sequence homology domain fusion conservation of
gene position gene coinheritance geometric
motif search cleft search small molecule
docking energetics flexibility - Characterize function by chemical reactivity
16THEMATICS COMPUTATION
- Start with protein structure
- Solve Poisson-Boltzmann equations for electrical
potential function - Obtain C(pH) by Monte Carlo method
Boltzmann-weighted populations - Plot C(pH) and find perturbed curves
17Which curves are perturbed?
- Visual inspection
- Mathematical analysis (H. Yang)
- Statistical analysis
- Fit to parametrized sigmoid function
- Neural networks / Support Vector Machines (W.
Tong) - Only small fraction (3-7) of all ionizable
residues are perturbed
18Ionizable residues
- Arg Asp Cys Glu His Lys Tyr termini
- A cluster of two or more perturbed residues in
physical proximity is a reliable predictor of
active site location - Success in finding active site is not
particularly sensitive to selection criteria
19THEMATICS A unique predictive tool for
Proteomics
- Gives chemical information
- Indicates why a particular residue may be
involved in catalysis - Highly reliable for identifying active sites
- Conceptually simple
- Computationally (relatively) fast
20Alanine Racemase
- Used by bacteria in cell wall construction
- A target for antibiotics (and for drugs to treat
tuberculosis) - Vitamin B6 dependent
- Active as a dimer
- Active site located at dimer interface
21Alanine Racemase catalysis
- Catalyzes interconversion of D-Ala and L-Ala
- Reaction occurs on a Schiff base intermediate
(alanine pyridoxal phosphate) - First step on Schiff base remove alpha-H atom
from Ala moiety - K39 and Y265 are the catalytic bases
22Alanine Racemase Lysines 39A-234A
- K39 is the catalytic base for D-to-L
23Tyrosines in Alanine racemase
24Results for Alanine Racemase
- Full results for Alanine racemase
- R219, C311, K39, Y43, Y265, Y284, Y354, C358,
R366, D68 - Bold known active site residue
- Italics second shell
- False positives tend to be isolated
25THEMATICS results
- THEMATICS has succeeded in finding the active
site for several dozen proteins with a variety of
folds and chemistries. - Occasionally, get two or more clusters
- Occasionally, when visual inspection has not
found a positive cluster, statistical analysis
has.
26Human Adenosine Kinase
- Catalyzes the transfer of phosphate from ATP to a
nucleoside analogue - Unique fold an ?-? three-layer sandwich plus a
smaller ?-? two-layer domain - Antiviral and anticancer drug target
27Human Adenosine Kinase
- One of two proteins to date where the human
observer was unable to locate the active site - Statistical analysis successful in finding the
active site (H. Yang) - Perturbations in predicted titration curves are
subtle, but statistically significant
28Aspartates in Human AK
- D300 has slightly perturbed curve
29(No Transcript)
30Colicin E3 important test case
- Nuclease - cleaves a phosphodiester linkage in
the RNA of the ribosome - Used by e coli to kill rival bacteria
- Unique fold cannot infer active site location
from other RNAases - Structure provided by Prof. M. Shoham (CWR) prior
to publication
31THEMATICS results Colicin E3
- E517, H526, R495, R545, Y519
- Calculation was performed on the structure of the
catalytic fragment - Active site found prior to completion of the
biochemical characterization - Active site correctly located by THEMATICS
32HIV-1 protease
- Acid protease
- Cathepsin D fold
- Active as a dimer
- D25 and D25 are the catalytic groups
- THEMATICS human observer found D25 and D25
33HIV-1 Protease Aspartates
34THEMATICS on HIV-1 Protease
- Human observer finds
- D25, D25
- Statistical analysis finds
- D25, D25, R8, R87
- R8 and R87 are believed to be involved in
substrate recognition Bardi, J.S., I. Luque, E.
Freire, Structure-based thermodynamic analysis of
HIV-1 protease inhibitors. Biochemistry, 1997.
36 p. 6588-6596
35Conclusions
- THEMATICS simple, computationally fast, and
reliable - Simple connection with chemistry
- A cluster of two or more positive residues is
predictive of active sites - Has been automated
- Characterizes residues (reactivity)
- Positive clusters well conserved
36Conclusions - continued
- Perturbed curves result from the polyprotic
nature of proteins - Working hypotheses about perturbed titration
curves - Afford catalytic advantage
- Afford advantage in reversible binding at
recognition sites
37Thanks
- David Budil, Leo Murga, Terry Yang, Ying Wei,
Alissa Bologna, Katie Boino, Wenxu Tong, Bob
Futrelle, Ron Williams (Northeastern) - Jaeju Ko (IUP)
- Ihsan Shehadi (UAEU)
- Dagmar Ringe Jim Clifton (Brandeis)
38Support Acknowledged
- National Science Foundation
- Institute for Complex Scientific Software (ICSS)
- Northeastern
39mjo_at_neu.edu
- M.J. Ondrechen, J.G. Clifton and D. Ringe, Proc.
Natl. Acad Sci USA 98, 12473-12478 (2001) - I.A. Shehadi, H. Yang and M.J. Ondrechen, Mol.
Biol. Rpts 29, 329-335 (2002)