Title: Acknowledgements
1Comparative analysis of novel proteins from the
CATH family of zinc peptidases
Debanu Das1,2, Abhinav Kumar1,2, Lukasz
Jaroszewski1,3 and Ashley Deacon1,2 1Joint Center
for Structural Genomics, 2Stanford Synchrotron
Radiation Laboratory, Menlo Park, CA 94025,
3Burnham Institute, La Jolla, CA, 92037
Biomedical theme Central Machinery of Life -
proteins conserved in all
kingdoms of life Biological theme Complete
coverage of Thermotoga maritima
III. General structure and biochemistry These
metallopeptidases show a high degree of
structural conservation in the CATH domain which
has a a/ß/a sandwich architecture. The active
site usually comprises of histidines and
carboxylates interacting with two zinc
ions. Despite the variety of molecular
functions and substrate specificities of these
proteins, the catalysis most likely involves a
hydroxyl ion ligand involved in a nucleophilic
attack. The full proteins often oligomerize and
display some differences in their
oligomerization state, however, the exact role
of the oligomer in the molecular functionis
still unclear. In some cases, dimer formation
results inassembly of a productive catalytic
site. Figure of the representative CATH
structure from http//cathwww.biochem.ucl.ac.uk/c
gi-bin/cath/GotoCath.pl?cath3.40.630.10
I. Introduction
II. Background and Significance CATH 3.40.630.10
proteins belong to PFAM clan CL0035 (Peptidase
MH/MC/MF), and MEROPS peptidase (also termed
proteases/proteinases/proteolytic enzymes)
database clan MH/MC/MF of metallopeptidases.
CL0035 has 7591 proteins in 8 Pfams
These proteins are involved in a variety
of proteolytic activities, have a range of
substrate specificities and are present in
numerous microbial organisms, many of which are
important human pathogens like S. aureus, S.
typhimurium, T. vaginalis, M. tuberculosis, N.
gonorrhea, N. meningitidis, C. trachomatis, G.
intestinalis, and E. coli. Several of these
proteins have been investigated for their
therapeutic potential and diseases roles
(Canavans disease, cancer therapy and
prohormone/propeptide processing).
IV. Progress of structure determination
As part of its mission to increase structural
coverage of protein families, JCSG is targeting
proteins from the large CATH homologous
superfamily 3.40.630.10 of zinc peptidases, which
belong to the phosphorylase/hydrolase-like fold
in SCOP and are comprised of proteins from
several Pfam families (the peptidase_MH clan).
Hidden Markov Models from the CATH database
were used to identify sequences in the JCSG
genome pool. PSI-Blast seeded with sequences of
these CATH family members were used to find
additional proteins. These two sets contained 226
unique targets. After removing targets with more
than 30 sequence identity to any PDB structure
or to any crystallized target from a structural
genomics center, 161 targets remained. Further
clustering at 90 (in order to avoid nearly
identical sequences), yielded a set of 137
targets. To date we have solved 7 structures from
this CATH family and 7 other targets have been
crystallized. In addition, 16 structures have
been solved by other worldwide structural
genomics centers. We present our progress
towards complete structural coverage of this
family, highlighting common and variant
structural features that support different
molecular and cellular roles, focusing on active
site residues, ligand binding, protein size and
oligomerization state. This analysis may provide
insights into structural themes that dictate
protein function and also allows modeling of
protein structures related by sequence. Our
structures serve as a nucleation point for the
design of further structure-based experiments to
probe the biochemical and biomedical roles of
these proteins.
Current status of 137 targets
Distribution of selected targetsacross Pfam
families
All targets selected in March 2007
Targets assigned in PfamA
Targets unassigned in PfamA
PF04952 Succinylglutamate desuccinylase / Aspartoacylase family (AstE-AspA ) 458 proteins 2 JCSG structures, 5 all other SG
PF02127 Aminopeptidase I Zinc metalloprotease M18 227 4 all other SG
PF01546 Peptidase family M20/M25/M40 3779 4 JCSG structures, 7 all other SG 6 non-SG
PF00246 Zinc carboxypeptidase M14 1013 10 non-SG
PF04389 Peptidase family M28 812 5 non-SG
PF00883 Cytosol aminopeptidase family, catalytic domain 827 1 all other SG 1 non-SG
PF05343 M42 Glutamyl aminopeptidase 427 1 JCSG structures, 1 all other SG 1 non-SG
PF05450 Nicastrin 48 None None
PF04952 32 3
PF02127 0 0
PF01546 56 1
PF00246 9 8
PF04389 10 7
PF00883 2 0
PF05343 5 1
PF05450 0 7
PFAM assigned based on sequence homology
detected with FFAS http//ffas.ljcrf.edu/ffas-cgi/
cgi/ffas.pl There are 3 targets not assigned
by PfamA or FFAS. 7 targets indicated show
significant FFAS match to both PF04389 and
PF05450, possibly distant bacterial homologs
to the eukaryotic nicastrin family.
V. Structures solved by JCSG
2QVP.pdb (HP10645A), 2.0Å, R/Rf
16.1/21.3 Unknown function, PF04952 Structure
suggests target may be closer in homology To
PF00246 proteins
2QJ8.pdb (HP10622H), 2.0Å, R/Rf
20.7/25.4, Unknown function, PF04952 Homolog
involved in Canavans disease
2FVG.pdb (TM1049), 2.01Å, R/Rf
20.3/24.4 Endoglucanase, PF05343 27 close
homologs from important human pathogens
3B2Y.pdb (HP10645E), 1.74Å, R/Rfr17.45/21.51 Unk
nown function, PF04952, Ni2 bound Structure
suggests target may be closer in homology To
PF00246 proteins
2QYV.pdb (HP9625C), 2.11Å, R/Rf 22.0,
24.4 Putative Xaa-His dipeptidase, PF01546, Zn2
bound 7 close homologs from important human
pathogens
HP10625B, 2.3Å, work in progress PF01546 50 close
homologs from important human pathogens Potential
in cancer therapy
2RB7.pdb (HP1666A), 1.6Å, R/Rfr15.4/18.0 Unknown
function, PF01546 48 close homologs from
important human pathogens Potential in cancer
therapy
- VII. Comparison of two proteins with gt30
sequence identity within the same Pfam - PF01546 1CG2, 2RB7
- 1CG2C-terminal glutamate moiety
- from folic acid and its analogues,
- such as methotrexate
- 2RB7 Unknown function, JCSG
- Common core 290 aa, RMSD 3.0 Å
VI. Phylogenetic tree and structure tree
X. Active site comparisons
XI. First structure of a dipeptidase in clan MH,
2QYV/HP9625C reveals a dimer
2RB7 (cyan) and 1CG2, PF01546. Functions of
proteins with solved structures and gt30 seq id
include diaminopimelate biosythesis (component
of cell wall and lysine biosynthesis) dapE gene
succinyl-diaminopimelate desuccinylase activity
Carboxypeptidase G2 cleaves C-terminal
glutamate moiety from folic acid and its
analogues, such as methotrexate
N-acetyl-L-citrulline deacetylase Peptidase T
tripeptidase, hydrolyzes tripeptides at their
N-termini
The 2QYV (PepD, MEROPS M20.007, clan MH,
subfamily C) monomer is very similar in structure
to the 1LFW monomer (PepV, MEROPS M20.004,
subfamily A). Both are dipeptidases belonging to
PF01546. However, 1LFW is known to function as a
monomer in which the molecular structure mimics
that of a dimer seen in most other proteins in
this Pfam. PepD in E. coli and Prevotella
albensis is seen to function as a dimer. 2QYV
represents the first crystal structure of a PepD,
revealing it to be dimeric in the crystal
structure as well as by size exclusion
chromatography. This novel structure serves as a
starting point for further experiments to probe
the effect of dimer formation on protein function.
Sequence with gt30 identity within a particular
Pfam also cluster together in structure space
Based on this information, it would now be
possible to perform targeted experiments to
determine substrate for the function of 2RB7,
perform structure-based site-directed mutagenesis
experiments and to also explore possiblity of
exploiting therapeutic potential
Acknowledgements
Active site in 2RB7
For structures that cluster together at 30
level, structural conservation in the common core
is the highest, Generally only slight
rearrangement of secondary structural elements is
observed (within the domain).
fatcat.burnham.org/POSA
http//www.phlogeny.fr
VIII. Proteins with lt30 sequence id. within the
same Pfam PF01546 2RB7, 2QYV (green) Common
core 250 aa, RMSD 3.0 Å
IX. Comparison between different Pfams 2RB7,
1XJO (brick), 2QJ8 (gold)
Active site is 1CG2 is H112, D141, E200, E176,
H385 Based on this, puutative active site in 2RB7
is H72, D99, D100, E138, E139, D162
Hydrolysis of methotrexate by 1CG2, implications
in cancer and gene therapy
Common core 100 aa, RMSD 3.69 Å Least amount of
common conserved core when structures in
different Pfams in the same Pfam clan and CATH
family are compared
- XII. Inferences and further work
- In the quest for increasing structural coverage
across protein families, it is expected that
proteins similar in sequence within a protein
family will be similar in structure. Increasing
structural coverage provides better templates for
modeling other proteins. The comparative
structural analysis presented here provides
experimental verification of the validity of this
approach. - The 7 structures presented here provide a basis
for enhancing the modeling of 2177 out of 7591
proteins (29) belonging to this Pfam clan.
Furthermore, 3 of these JCSG structures provide
the first examples of structures for proteins
within a particular sequence cluster (2QYV, 2QJ8
and 3B2Y) and thus provide the basis for modeling
384 unique proteins (10 from organisms listed as
top human pathogens) belonging to these 3
clusters from 2 different Pfams (PF01546 and
PF04952). - 2QYV/HP9625C represents the first crystal
structure of a dipeptidase PepD showing a dimer - Further analysis will be performed to try to
understand evolutionary relationships between
these proteins based on sequence-based
phylogenetic trees and structure-based trees. - Attempts will be made to investigate use of
these structures and their comparative analyses
in understanding structural basis for enzyme
function and substrate specificities by analysis
of active site amino acids, and to attempt to
exploit information for therapeutic purposes.
Common core 190 aa, RMSD 3.0 Å PF04952 2QJ8,
3B2Y (cyan)
Larger rearrangements and extensions of secondary
structural elements. Inserts and novel features
more common.
http//fatcat.burnham.org/POSA
The JCSG is funded by the Protein Structure
Initiative of the National Institutes of Health,
National Institute of General Medical
Sciences.SSRL operations is funded by DOE BES,
and the SSRL Structural Molecular Biology program
by DOE BER, NIH NCRR BTP and NIH NIGMS.