Title: Acknowledgements
1Comparative Analysis of Novel Proteins from the
CATH Family of Zinc Peptidases
Debanu Das1,2, Abhinav Kumar1,2, Lukasz
Jaroszewski1,3 and Ashley Deacon1,2 1Joint Center
for Structural Genomics, 2Stanford Synchrotron
Radiation Laboratory, Menlo Park, CA 94025,
3Burnham Institute, La Jolla, CA, 92037
III. General structure and biochemistry These
metallopeptidases show a high degree of
structural conservation in the CATH domain which
has a a/ß/a sandwich architecture. The active
site usually comprises of histidines and
carboxylates interacting with two zinc
ions. Despite the variety of molecular
functions and substrate specificities of these
proteins, the catalysis most likely involves a
hydroxyl ion ligand involved in a nucleophilic
attack. The full proteins often oligomerize and
display some differences in their
oligomerization state, however, the exact role
of the oligomer in the molecular functionis
still unclear. In some cases, dimer formation
results inassembly of a productive catalytic
site. Dimerization is usually mediated by a
dimerization domain. Higher oligomeric forms such
as tetramers or octamers are also observed for
some proteins. Figure of the representative CATH
structure fro http//cathwww.biochem.ucl.ac.uk/cgi
-bin/cath/GotoCath.pl?cath3.40.630.10
I. Introduction
II. Background and Significance CATH 3.40.630.10
proteins belong to PFAM clan CL0035 (Peptidase
MH/MC/MF), and MEROPS peptidase (also termed
proteases/proteinases/proteolytic enzymes)
database clan MH/MC/MF of metallopeptidases.
CL0035 has 7591 proteins in 8 Pfams
These proteins are involved in a variety
of proteolytic activities, have a range of
substrate specificities and are present in
numerous microbial organisms, many of which are
important human pathogens like S. aureus, S.
typhimurium, T. vaginalis, M. tuberculosis, N.
gonorrhea, N. meningitidis, C. trachomatis, G.
intestinalis, and E. coli. Several of these
proteins have been investigated for their
therapeutic potential and diseases roles
(Canavans disease, cancer therapy and
prohormone/propeptide processing).
IV. Progress of structure determination
As part of its mission to increase structural
coverage of protein families, JCSG is targeting
proteins from the large CATH homologous
superfamily 3.40.630.10 of zinc peptidases, which
belong to the phosphorylase/hydrolase-like fold
in SCOP and are comprised of proteins from
several Pfam families (the peptidase_MH clan).
Hidden Markov Models from the CATH database
were used to identify sequences in the JCSG
genome pool. PSI-Blast seeded with sequences of
these CATH family members were used to find
additional proteins. These two sets contained 226
unique targets. After removing targets with more
than 30 sequence identity to any PDB structure
or to any crystallized target from a structural
genomics center, 161 targets remained. Further
clustering at 90 (in order to avoid nearly
identical sequences), yielded a set of 137
targets. Prior to commencing work on these
proteins in March 2007, there were 40 unique
structures from these Pfams from global SG and
non-SG efforts. We have contributed 6 new
structures and 7 other targets have been
crystallized. We present our progress towards
complete structural coverage of this family,
highlighting common and variant structural
features that support different molecular and
cellular roles, focusing on active site residues,
ligand binding, protein size and oligomerization
state. This analysis may provide insights into
structural themes that dictate protein function
and also allows modeling of protein structures
related by sequence. Our structures serve as a
nucleation point for the design of further
structure-based experiments to probe the
biochemical and biomedical roles of these
proteins.
Current status of 137 targets
Distribution of selected targetsacross Pfam
families
All targets selected in March 2007
Targets assigned in PfamA
Targets unassigned in PfamA
PF04952 Succinylglutamate desuccinylase / Aspartoacylase family (AstE-AspA ) 458 proteins 2 JCSG structures, 5 all other SG
PF02127 Aminopeptidase I Zinc metalloprotease M18 227 4 all other SG
PF01546 Peptidase family M20/M25/M40 3779 4 JCSG structures, 7 all other SG 6 non-SG
PF00246 Zinc carboxypeptidase M14 1013 10 non-SG
PF04389 Peptidase family M28 812 5 non-SG
PF00883 Cytosol aminopeptidase family, catalytic domain 827 1 all other SG 1 non-SG
PF05343 M42 Glutamyl aminopeptidase 427 1 JCSG structures, 1 all other SG 1 non-SG
PF05450 Nicastrin (eukaryotic, not known to be peptidase, part of ?-secretase complex, no structures) 48 None None
PF04952 32 3
PF02127 0 0
PF01546 56 1
PF00246 9 8
PF04389 10 7
PF00883 2 0
PF05343 5 1
PF05450 0 7
PFAM assigned based on sequence homology
detected with FFAS http//ffas.ljcrf.edu/ffas-cgi/
cgi/ffas.pl There are 3 targets not assigned
by PfamA or FFAS. 7 targets indicated show
significant FFAS match to both PF04389 and
PF05450, possibly distant bacterial homologs
to the eukaryotic nicastrin family.
V. Structures solved by JCSG
HP10625B, 2.3Å, work in progress PF01546 50 close
homologs from important human pathogens Potential
in cancer therapy
2RB7.pdb (HP1666A), 1.6Å, R/Rfr15.4/18.0 Unknown
function, PF01546 48 close homologs from
important human pathogens Potential in cancer
therapy
2QYV.pdb (HP9625C), 2.11Å, R/Rf 22.0,
24.4 Putative Xaa-His dipeptidase, PF01546, Zn2
bound 7 close homologs from important human
pathogens
2FVG.pdb (TM1049), 2.01Å, R/Rf
20.3/24.4 Endoglucanase, PF05343 27 close
homologs from important human pathogens
3B2Y.pdb (HP10645E), 1.74Å, R/Rfr17.45/21.51 Unk
nown function, PF04952, Ni2 bound Structure
suggests target may be closer in homology To
PF00246 proteins
2QVP.pdb (HP10645A), 2.0Å, R/Rf
16.1/21.3 Unknown function, PF04952 Structure
suggests target may be closer in homology To
PF00246 proteins
2QJ8.pdb (HP10622H), 2.0Å, R/Rf
20.7/25.4, Unknown function, PF04952 Homolog
involved in Canavans disease
VI. Phylogenetic tree and structure tree
X. Active site study may lead to structural basis
of substrate specificity
XI. Elucidation of a unique oligomeric form
2RB7 (cyan) and 1CG2, PF01546. Proteins in this
Pfam with solved structures and gt30 seq id with
one another have function which include
succinyl-diaminopimelate desuccinylase activity
Carboxypeptidase G2 which cleaves C-terminal
glutamate moiety from folic acid and its
analogues, such as methotrexate
N-acetyl-L-citrulline deacetylase and Peptidase T
tripeptidase.
The 2QYV (PepD, MEROPS M20.007, clan MH,
subfamily C) monomer is very similar in structure
to the 1LFW monomer (PepV, MEROPS M20.004,
subfamily A). Both are dipeptidases belonging to
PF01546. However, 1LFW is known to function as a
monomer in which the molecular structure mimics
that of a dimer seen in most other proteins in
this Pfam. PepD in E. coli and Prevotella
albensis are seen to function as dimers. 2QYV
represents the first crystal structure of a PepD,
revealing it to be dimeric in the crystal
structure (see panel above) as well as by size
exclusion chromatography and shows the structural
nature of the dimer. This novel structure serves
as a starting point for further experiments to
probe the effect of this unique dimer formation
on protein function.
Sequence with gt30 identity within a particular
Pfam also cluster together in structure space
Based on this information, it would now be
possible to perform targeted biochemical assays
to determine substrate for 2RB7, to try to
understand the structural basis for substrate
selection and specificity and to exploit this
information for its therapeutic potential. For
example, can 2RB7 hydrolyse methorexate? Can it
do so more efficiently? Can active site
engineering based on structural information
produce a more potent enzyme?
Acknowledgements
Active site in 2RB7
fatcat.burnham.org/POSA
http//www.phlogeny.fr
Active site is 1CG2 is H112, D141, E200, E176,
H385 Based on this, putative active site in 2RB7
is H72, D99, D100, E138, E139, D162
IX. Suggestion of PfamA assignment based on
structure HP10645A (2QVP) and HP10645E (3B2Y) are
assigned to PF04952 in PfamB. However, structural
comparisons of only the CATH domain show a
stronger similarity to a PF00246 protein (1QMU,
left) than to a PF04952 protein (2QJ8, center)
and this is also supported by structure
phylogenetic trees and FFAS. Also, like 1QMU,
HP10645A/E lacks an 70 amino acid insertion that
forms a C-terminal domain (left, black circle)
that is present in PF04952 proteins and is
important for biochemical function. These two
pieces of evidence suggest and support the
assignment of HP10645A/E in PF00246 in PfamA.
Alternatively, it is also possible that
HP10645A/E could be novel members of PF04952
although sequence and structure suggest
PF00246.
Hydrolysis of methotrexate by 1CG2
- XII. Inferences and further work
- In the quest for increasing structural coverage
across protein families, it is expected that
proteins similar in sequence within a protein
family will be similar in structure. Increasing
structural coverage provides better templates for
modeling other proteins. The comparative
structural analysis presented here provides
experimental verification of the validity of this
approach. - The structures for the proteins HP10645A and
HP10645E suggest that they should be assigned to
PF00246 in PfamA instead of the current
suggestion of belonging to PF04952 by PfamB. - The 7 structures presented here provide a basis
for enhancing the modeling of 2177 out of 7591
proteins (29) belonging to this Pfam clan.
Furthermore, 3 of these JCSG structures provide
the first examples of structures for proteins
within a particular sequence cluster (2QYV, 2QJ8
and 3B2Y) and thus provide the basis for modeling
384 unique proteins (10 from organisms listed as
top human pathogens) belonging to these 3
clusters from 2 different Pfams (PF01546 and
PF04952). - 2QYV/HP9625C represents the first crystal
structure of a dipeptidase PepD showing a dimer. - Further analysis will be performed to try to
understand evolutionary relationships between
these proteins based on sequence-based
phylogenetic trees and structure-based trees. - Attempts will be made to investigate use of
these structures and their comparative analyses
in understanding structural basis for enzyme
function and substrate specificities by analysis
of active site amino acids, and to attempt to
exploit information for therapeutic purposes.
Superimposition of all 6 structures in PF04952
1YW4, 1YW6, 2BCO, 2G9D, 2GU2 and 2QJ8
Common core of 191 aa, RMSD 2.49 Å
Common core of 226 aa, RMSD 2.45 Å