Title: The Tree of Life Viewed by Protein Domain Content
1Evolutionary Insights from Protein Structure
Philip E. Bourne University of California San
Diego pbourne_at_ucsd.edu
Support Open Access All the work here does
2Agenda
- Why is protein structure useful?
- Tree construction using protein structure
- One protein superfamily in more detail
- Environmental influence
- On-going work
- The role of calcium over time
- Applying structural domain combinations
- Co-evolution of kinases and phosphatases
3Phosphoinositide-3 Kinase (D) and Actin-Fragmin
Kinase (E)
PKA
ChaK (Channel Kinase)
Why is protein structure useful?
4The Key is Natures Reductionism
There are 20300 possible proteins gtgtgtgt all the
atoms in the Universe
6.7M protein sequences from 4734 species
(source RefSeq)
34,494 protein structures yield 1086 folds (SCOP
1.73)
Why is protein structure useful?
5It follows that structure is more conserved than
sequence
- Hence, structure comparison reveals relationships
not detectable from sequence alone
Stated another way, structure offers the
opportunity to look at more distant evolutionary
relationships
Why is protein structure useful?
6Potential Problems in Using Structure on a
Proteomic Scale
- Is structural space well enough populated?
- Is proteome coverage by structure with current
detection methods enough? - Currently 50-70
Why is protein structure useful?
7Initial Bold QuestionWith this level of
coverage and assuming we know a high percentage
of all folds, is structure useful in
discriminating species?
Tree Construction Using Protein Structure
8Song Yang Former Graduate Student Department of
Chemistry and Biochemistry UCSD
Russ Doolittle, Professor Center for Molecular
Genetics UCSD
Yang, Doolittle Bourne (2005) PNAS 102(2) 373-8
Tree Construction Using Protein Structure
9To Answer this Question We Only Need to Make Use
of Existing Resources
- SCOP Further catalogs Natures reductionism
into structural domains, folds, families and
superfamilies - SUPERFAMILY assigns the above to fully sequenced
proteomes
Tree Construction Using Protein Structure
10Use of SCOP Superfamilies
- Using structure, how do you distinguish
convergent versus divergent evolution? - The SCOP notion of SUPERFAMILY with evidence of
weak sequence relationships can be used to
discount convergence.
Tree Construction Using Protein Structure
11Structural OrganizationSCOP v1.73
7
1086
1777
3464
97178
Tree Construction Using Protein Structure
12Is Structure a Useful Discriminator - Maybe
Distribution among the three kingdoms as taken
from SUPERFAMILY
- Superfamily distributions would seem to be
related to the complexity of life - Update of the work of Caetano-Anolles2 (2003)
Genome Biology 131563
153/14
21/2
310/0
645/49
1
9/1
29/0
68/0
Any genome / All genomes
Tree Construction Using Protein Structure
13The Unique Superfamily in Archaea d.17.6
- Archaeosine tRNA-guanine transglycosylase (tgt),
C2 domain - First step in the biosynthesis of an
archaea-specific modified base, archaeosine
(7-formamidino-7-deazaguanosine) - Found in tRNAs
- Was found exclusively in Archaea.
Reference Interpro IPR004804
Tree Construction Using Protein Structure
14Method Distance Determination
Presence/Absence Data Matrix
(FSF) SCOP SUPERFAMILY organisms organisms organisms
(FSF) SCOP SUPERFAMILY C. intestinalis C. briggsae F. rubripes
a.1.1 1 1 1
a.1.2 1 1 1
a.10.1 0 0 1
a.100.1 1 1 1
a.101.1 0 0 0
a.102.1 0 1 1
a.102.2 1 1 1
Distance Matrix
C. intestinalis C. briggsae F. rubripes
C. intestinalis 0 101 109
C. briggsae 0 144
F. rubripes 0
Tree Construction Using Protein Structure
15Is Structure a Useful Discriminator - Yes
Eukaryota
Bacteria
Archaea
The method cleanly placed all species in their
correct superkingdoms
Tree Construction Using Protein Structure
16Presence/absence vs. Abundance
- Abundance fails to distinctly separate the three
superkingdoms - Presence/absence succeeds in distinctly
separating the three superkingdoms - Why?
- Emergence or loss of a FSF is a major
evolutionary event - Emergence of a new FSF may lead to 1-n new
functions - Gene loss likely FSF less likely
- Horizontal gene transfer only relevant if it
introduces a FSF - Not affected by gene duplication
- Coverage and sensitivity while not perfect is
enough
Tree Construction Using Protein Structure
17Trees of Archaea
Our
NCBI
Crenarchaeota
Pyrococcus furiosus Pyrococcus horikoshii Pyrococc
us abyssi Thermoplasma volcanium Thermoplasma
acidophilum Halobacterium sp. NRC-1 Sulfolobus
tokodaii Sulfolobus solfataricus Pyrobaculum
aerophilum Aeropyrum pernix Methanosarcina
mazei Methanosarcina acetivorans Archaeoglobus
fulgidus Methanopyrus kandleri Methanocaldococcus
jannaschii Methanobacterium thermoautotrophicum Me
thanothermobacter thermautotrophicus
Sulfolobus tokodaii Sulfolobus solfataricus Pyroba
culum aerophilum Aeropyrum pernix Pyrococcus
furiosus Pyrococcus horikoshii Pyrococcus
abyssi Thermoplasma volcanium Thermoplasma
acidophilum Halobacterium sp. NRC-1 Methanosarcina
mazei Methanosarcina acetivorans Methanocaldococc
us jannaschii Archaeoglobus fulgidus Methanopyrus
kandleri Methanobacterium thermoautotrophicum Meth
anothermobacter thermautotrophicus
15 14 11 2 13 12 10 17 16 3 9 4 6 1 7 8 5
Pyrococcus
Thermoplasma
Crenarchaeota
Methanogen
Euryarchaeota
Tree Construction Using Protein Structure
18Our Tree of Bacteria
- 123 Bacteria
- Parasitic bacteria are not grouped with their
full gene complement counterparts - They are sorted into proper groupings that mirror
the overall tree - A few anomalies
Tree Construction Using Protein Structure
19Eukaryotes Anomalies May Point to Genome
Problems
Tree Construction Using Protein Structure
20 A Closer Look at One SuperfamilyThe Protein
Kinase-Like Superfamily
Eric Scheeff
Scheeff Bourne 2005 PLoS Comp. Biol. 1(5) e49
A Closer Look at One Superfamily
21The Protein Kinase-like Superfamily
- A large family important to signal transduction
in eukaryotes and many bacteria. - Phosphotransferases transfer phosphate group
from ATP to Ser/Thr or Tyr residue on target
protein, producing a range of downstream
signaling effects. - PKA an example of a typical protein kinase
(TPK) fold, shown in open book format
A Closer Look at One Superfamily
22The Protein Kinase-Like Superfamily
- A range of different families, all
phosphotransferases - A variety of different targets
- All possess a core cassette of elements shared
with the TPKs - ATP binding
- Catalysis
- Structures can be highly variable, particularly
in the substrate binding regions
Family Structural Representative Phosphorylates Biological result
Typical Protein Kinases (TPKs) Protein Kinase A (PKA) Ser/Thr or Tyr residues of proteins Range of signaling effects
Alpha kinases Channel Kinase (ChaK) Ser/Thr residues in alpha-helices Range of signaling effects
Actin-Fragmin Kinase (AFK) Actin-Fragmin Kinase (AFK) Thr residue of actin Control of actin polymerization
Phosphatidyl -inositol 3- and 4-kinases Phosphatidylinositol 3-kinase (PI3K) Phosphatidylinositol (PI), PI-phosphates, PI-bisphosphates Range of second-messenger signaling effects
Phosphatidyl-inositol phosphate kinases Phosphatidylinositol phosphate kinase (PIPK) PI-phosphates Range of second-messenger signaling effects
Choline/ ethanolamine kinases Choline Kinase (CK) Choline Part of pathway that eventually produces phoshpatidylcholine, important constituent of membranes
Aminoglycoside Kinases Aminoglycoside Kinases (AK) Aminoglycoside antibiotics Antibiotic resistance
A Closer Look at One Superfamily
23Method
- Begin with a multiple structure alignment using
CE-MC (NAR 2004) of 30 comparable TPKs and APKs
and manually correct in a pair-wise manner over a
period of 1-2 person years - Review the literature on each structure
- Review the associated sequence alignments derived
from structure
A Closer Look at One Superfamily
24Phosphoinositide-3 Kinase (D) and Actin-Fragmin
Kinase (E)
PKA
ChaK (Channel Kinase)
A Closer Look at One Superfamily
25Can We Propose an Evolutionary History for the
Protein Kinase-Like Superfamily?
1 2 3 4 5
- Bayesian inference of phylogeny (MrBayes)
- Manual structure alignment produces very
high-quality sequence alignment of diverse
homologues - But, sequence information too degraded to
produce branching with sufficient support (i.e. a
high posterior probability) - Addition of a matrix of structural
characteristics (similar to morphological
characteristics) produces a well supported
combined model - Neither sequence structural characteristics
sufficient to alone produce resolved tree, must
be used in combination.
1BO1 Atypical 0 0 0 0 1
1IA9 Atypical 1 1 1 1 0
1E8X Atypical 1 0 1 1 1
1CJA Atypical 1 0 1 1 1
1NW1 Atypical 1 0 1 0 0
1J7U Atypical 1 0 1 0 1
1CDK AGC 1 1 1 0 1
1O6L AGC 1 1 1 0 1
1OMW AGC 1 1 1 0 1
1H1W AGC 1 1 1 0 1
1MUO Other 1 1 1 0 1
1TKI CAMK 1 0 1 0 1
1JKL CAMK 1 0 1 0 1
1A06 CAMK 1 0 1 0 1
1PHK CAMK 1 0 1 0 1
1KWP CAMK 1 0 1 0 1
1IA8 CAMK 1 0 1 0 0
1GNG CMGC 1 0 1 0 1
1HCK CMGC 1 0 1 0 1
1JNK CMGC 1 0 1 0 1
1HOW CMGC 1 0 1 0 1
1LP4 Other 1 0 1 0 1
1F3M STE 1 0 1 0 1
1O6Y Other 1 0 1 0 1
1CSN CK1 1 0 1 0 1
1B6C TKL 1 0 1 0 1
2SRC TK 1 0 1 0 1
1LUF TK 1 0 1 0 1
1IR3 TK 1 0 1 0 1
1M14 TK 1 0 1 0 1
1GJO TK 1 0 1 0 1
Example columns 1) Ion pair analogous to K72-E91
in PKA 2) a-Helix B present 3) State of a-Helix C
(0 kinked, 1 straight) 4) State of Strand 4 (0
kinked, 1 straight) 5) a-Helix D present
A Closer Look at One Superfamily
26Proposed Evolutionary History for the Protein
Kinase-Like Superfamily
- Suggests distinctive history for atypical
kinases, as opposed to intermittent divergence
from the typical protein kinases (TPKs) - TPK portion of tree shows high degree of
agreement with Manning tree - Branching is supported by species representation
of kinase families
APH
AGC
CK
CAMK
0.64
AFK
0.97
CMGC
1.0
0.85
0.78
TKL
PI3K
CK1
TK
- Atypical kinase families Blue
- Typical protein kinase groups (subfamilies) Red
- Branch labels posterior probability of branch
PIPKIIß
A Closer Look at One Superfamily
ChaK
27Has the Environment had an Influence on Modern
Day Proteomes?
Chris Dupont Scripps Institute of
Oceanography UCSD
Dupont, Yang, Palenik, Bourne. 2006 PNAS 103(47)
17822-17827
Environmental Influence
28Consider the Distribution of Disulfide Bonds
among Folds
- Disulphides are only stable under oxidizing
conditions - Oxygen content gradually accumulated during the
earths evolution - The divergence of the three kingdoms occurred
1.8-2.2 billion years ago - Oxygen began to accumulate 2.0 billion years
ago - Logical deduction disulfides more prevalent in
folds (organisms) that evolved later - This would seem to hold true
- Can we take this further?
1
Environmental Influence
29Theoretical Levels of Trace Metals and Oxygen in
the Deep Ocean Through Earths History
- Whether the deep ocean became oxic or euxinic
following the rise in atmospheric oxygen (2.3
Gya) is debated, therefore both are shown (oxic
ocean-solid lines, euxinic ocean-dashed lines). - The phylogenetic tree symbols at the top of the
figure show one idea as to the theoretical
periods of diversification for each Superkingdom.
Replotted from Saito et al, 2003 Inorganica
Chimica Acta 356 308-318
Environmental Influence
30Making the Metallome of Each Species Can Only
be Done from Structure
- Start with SCOP
- Each superfamily level assignment was checked
manually for metal binding - All the structures representing the family had to
bind the metal for it to be considered
unambiguous - The literature was consulted to resolve
ambiguities - Superfamily database used to map to proteomes
- 23 Archaea, 233 Bacteria, 57 Eukaryota
- Cu, Ni, Mo ignored (lt0.3) of proteome
Environmental Influence
31Levels of Ambiguity
- Ambiguous superfamily binds different metals or
have members that are not known to bind metals - Ditto families
- Approx 50 of superfamilies and 10 of families
are ambiguous - Only unambiguous families used in this study
Environmental Influence
32Superfamily Distribution As Well As Overall
Content Has Changed
Environmental Influence
33Metallomes are Discriminatory
- A quantile plot showing the percent of Bacterial
proteomes each Fe-binding fold family occurs in
(x). - This plot also shows the average copy number of
that fold family in the proteomes where it occurs
(?). - Few Fe-binding folds are in most proteomes.
- Widespread Fe-binding folds are not necessarily
abundant. - Similar trends are observed for Zn, Mn, and Co in
all three Superkingdoms.
Environmental Influence
34Metal Binding Proteins are Not Consistent Across
Superkingdoms
Since these data are derived from current species
they are independent of evolutionary events such
as duplication, gene loss, horizontal transfer
and endosymbiosis
Environmental Influence
35Power Laws Fundamental Constants in the
Evolution of Proteomes
- A slope of 1 indicates that a group of structural
domains is in equilibrium with genome growth,
while a slope gt 1 indicates that the group of
domains is being preferentially duplicated (or
retained in the case of genome reductions).
van Nimwegen E (2006) in Koonin EV, Wolf YI,
Karev GP, (Ed.). Power laws, scale-free
networks, and genome biology
Environmental Influence
36Metal Binding Proteins are Not Consistent Across
Superkingdoms
Environmental Influence
37Why are the Power Laws Different for Each
Superkingdom?
- Power laws are likely influenced by selective
pressure. Qualitatively, the differences in the
power law slopes describing Eukarya and Prokarya
are correlated to the shifts in trace metal
geochemistry that occur with the rise in oceanic
oxygen - We hypothesize that proteomes contain an imprint
of the environment at the time of the last common
ancestor in each Superkingdom
Environmental Influence
38Do the Metallomes Contain Further Support for
this Hypothesis?
Environmental Influence
39e- Transfer ProteinsSame Broad Function, Same
Metal, Different Chemistry Induced by the
Environment?
- Fe-S clusters
- Fe bound by S
- Cluster held in place by Cys
- Generally negative reduction potentials
- Very susceptible to oxidation
- Cytochromes
- Fe bound by heme (and amino-acids)
- Generally positive reduction potentials
- Less susceptible to oxidation
Environmental Influence
40Agenda
- Why is protein structure useful?
- Tree construction using protein structure
- One protein superfamily in more detail
- Environmental Influence
- On-going work
- The role of calcium over time
- Applying structural domain combinations
- Co-evolution of kinases and phosphatases
41The Role of Calcium
- Calcium concentrations have not fluctuated over
evolutionary time scales to the same degree as
iron and zinc - Low diffusion rate and rapid kinetics
- Calcium important for maintaining cell structure
- Calcium became a very important signaling
molecule in multi-cellular organisms
The Role of Calcium
42Calcium Positive Selection Across All
Superkingdoms
Large number of arylsulfatases
The Role of Calcium
43Calcium Uni vs. Multi Cellular
The Role of Calcium
44Structural Domain Combinations
- Definition
- Compact, spatially distinct
- Fold in isolation
- Recurrence
- Importance
- Understand the structure and function of the
whole protein
Structural Domain Combinations
45Domain Trees Might Provide Insights into
Horizontal Gene Transfer
Chlamydiales
Alveolata
Rhodophyta
Cyanobacteria
Metazoa
Actinobacteria
Exists only in Cyanobacteria
Exists in only one red algae in Eukaryotes
a.1.1.3 phycocyanin-like phycobilisome
proteins A light harvesting antennae of
photosystem II
Structural Domain Combinations
46Protein Kinases and Phosphatases
- Protein kinases and phosphatases are components
of numerous signal transduction pathways - They are responsible for regulating many cellular
processes - Implicated in many cancers and diseases
- Comprise a significant portion of genomes
- At least 518 protein kinase genes
- At least 107 protein tyrosine phosphatase genes
- Alonso et al. Cell. 2004 Jun 11117(6)699-711
Manning, et al. (2002) Science 2981912-1934
Co-evolution Kinases and Phosphatases
47Example ADF/Cofilin
- The Cofilin/ADF (actin depolymerizing factor)
family remodels the actin filaments of the
cytoskeleton - They sever actin filaments and increase the rate
that monomers leave the filaments pointed end - Cofilin/ADF proteins are phosphorylated at a
conserved N-terminal serine (Ser3) - When phosphorylated, cofilin/ADF is unable to
bind actin, and is thus inactive - When dephosphorylated, cofilin/ADF can bind and
depolymerize actin
Co-evolution Kinases and Phosphatases
48Phosphorylation and Dephosphorylation of
ADF/Cofilin
- Two serine/threonine kinase families can
phosphorylate (deactivate) ADF/cofilin - LIMK
- TESK
- Two phosphatase families have been identified
that dephosphorylate ADF/Cofilin - Slingshot (SSH) phosphatases
- Chronophin (CIN)
Co-evolution Kinases and Phosphatases
49Coordinated Divergence
- Slingshot phosphatase and TESK and LIMK protein
kinase families appear to have emerged at same
point in eukaryotic tree - They also underwent an apparent gene duplication
at the same time (after Ciona divergence) - Can point of divergence be more accurately
pinpointed as more organisms are sequenced?
Emergence
Gene Duplication
Co-evolution Kinases and Phosphatases
50Parting Comments
- Structure plays a useful role at various levels
of detail in the study of evolution - Much of the data used here are sitting on the Web
for anyone to apply - Perhaps we should do more to train students in
both the life sciences and the earth sciences?
51Parting Comments
- The reductionism used here seems useful, but
there is a growing sense that protein structure
represents more of a continuum perhaps composed
of unique fragments at the sub-fold level The
Russian Doll effect - Evidence is growing that proteins from different
superfamilies may share a functional site but
nothing else does this speak to a very distant
evolutionary relationship?
52Acknowledgements
- Kristine Briedis
- Andrew Butcher
- Russ Doolittle
- Chris Dupont
- Eric Scheeff
- Song Yang
- The Whole Group
- NSF NIH
Support Open Access All the work here does
53Backpocket
54The importance of small class Zn folds to
Eukarya
Distribution of 53 unique small class Zn
families
Chapter 4 Environmental Influence
55Conclusions
- Metallomes have diverse compositions, yet the
total abundances conform to evolutionary
constants - These constants exhibit Superkingdom-specific
differences consistent with ancient changes in
geochemistry, a hypothesis further supported by
the roles of Zn and Fe - These results provide genomic-based evidence for
the theory of Anbar and Knoll that Eukaryotic
diversification and oxygen-related changes in
trace metal chemistry are linked - Prokaryotes likely diverged in anoxic
environments, while Eukaryotes diverged in oxic
environments (supported by the fossil records)
56Possible Flaws in the Argument
- Proteome Coverage Currently only 40 of
Eukaryotes and 55 of Prokaryotes are covered by
structural families - Estimate that 90 of the unannotated space is
covered by existing families
57Possible Flaws in the Argument
- Genome Bias there is a disproportionate number
of thermophiles among Archaea, whereas the
Eukaryotes are almost entirely aerobic - Bacteria have a better distribution
- The dataset does include the Eukaryotic anaerobic
amitochondritic parasite Encephalitozoon cuniculi,
which has metallomic features typical of aerobic
Eukaryotes - Principal component analysis shows oxygen
tolerance and environment have little effect upon
the trends observed. Phylogeny groupings are
apparent however (suggests vertical inheritance)
58Possible Flaws in the Argument
- Zn concentrations are associated solely with
increased complexity not the environment - Eukaryotes of varying complexity follow the same
power law - Zn finger abundance not consistent with
complexity - 3 Zn superfamilies found in Prokaryotes and
Eukaryotes are more abundant across all
Eukaryotes
59Manual Annotation of SCOP (1.68) Superfamilies
and Families
- 281 of the 1495 superfamilies have at least one
metal associated structure at the domain level - 50 of the 281 metal associated superfamilies
are ambiguous 10 of the families - Zn associated superfamilies are the most
prevalent, followed by Fe, Cu, Mn, Co Mo Ni
Dupont, Briedis, Yang, Palenik, Bourne 2005 In
preparation.
60- Follows an orderly progression through evolution
- domain duplication events remain proportional
to genome size - Occasionally follow power law distribution
- Rough estimates of domain abundance e.g.,
thioredoxins 1 of global proteome
61Archaea (1-2 of the proteome) Bacteria
(.7-.8) Eukaryotes (0.01-.05)
Cytochrome c evolved after Bacteria/Archaea split
Proliferation of cytP450 in Eukaryotes
62Case study II Fe vs. Zn
- From 4Mya to the present
- Fe concentrations in the ocean have fallen 10,000
fold - Zn concentrations have risen 10,000,000 fold
63Fe Binding
- 2-3 of Bacteria
- and Archaea proteomes are Fe-binding
- 0.5-1.5 of Eukaryota
Zn Binding
- 1.5-2.5 of Bacteria and Archaea proteomes are
Zn-binding - 4.5-5 of Eukaryota
64Zn Binding by Kingdom
Hard ligands Asp, Glu, Ser, Tyr Soft ligands
Cys, His
Zn Lewis acid reactions to informational systems
(Zn fingers are gt60 of Zn containing
superfamilies in Eukaryotes!)
65Future Work
- Ca concentrations have also changed dramatically
is this evident in modern proteomes and if so
what are the evolutionary implications? - Proteins associated with the nervous system 9
before a rapid expansion .5 Mya around the time
of the TK transition - c.19 ubiquitous Mg binding
- Evolution of photosynthesis