The Tree of Life Viewed by Protein Domain Content

1 / 65
About This Presentation
Title:

The Tree of Life Viewed by Protein Domain Content

Description:

On-going work. The role of calcium over time. Applying structural domain combinations ... and manually correct in a pair-wise manner over a period of 1-2 person years ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 66
Provided by: soy3
Learn more at: http://www.sdsc.edu

less

Transcript and Presenter's Notes

Title: The Tree of Life Viewed by Protein Domain Content


1
Evolutionary Insights from Protein Structure
Philip E. Bourne University of California San
Diego pbourne_at_ucsd.edu
Support Open Access All the work here does
2
Agenda
  • Why is protein structure useful?
  • Tree construction using protein structure
  • One protein superfamily in more detail
  • Environmental influence
  • On-going work
  • The role of calcium over time
  • Applying structural domain combinations
  • Co-evolution of kinases and phosphatases

3
Phosphoinositide-3 Kinase (D) and Actin-Fragmin
Kinase (E)
PKA
ChaK (Channel Kinase)
Why is protein structure useful?
4
The Key is Natures Reductionism
There are 20300 possible proteins gtgtgtgt all the
atoms in the Universe
6.7M protein sequences from 4734 species
(source RefSeq)
34,494 protein structures yield 1086 folds (SCOP
1.73)
Why is protein structure useful?
5
It follows that structure is more conserved than
sequence
  • Hence, structure comparison reveals relationships
    not detectable from sequence alone

Stated another way, structure offers the
opportunity to look at more distant evolutionary
relationships
Why is protein structure useful?
6
Potential Problems in Using Structure on a
Proteomic Scale
  • Is structural space well enough populated?
  • Is proteome coverage by structure with current
    detection methods enough?
  • Currently 50-70

Why is protein structure useful?
7
Initial Bold QuestionWith this level of
coverage and assuming we know a high percentage
of all folds, is structure useful in
discriminating species?
Tree Construction Using Protein Structure
8
Song Yang Former Graduate Student Department of
Chemistry and Biochemistry UCSD
Russ Doolittle, Professor Center for Molecular
Genetics UCSD
Yang, Doolittle Bourne (2005) PNAS 102(2) 373-8
Tree Construction Using Protein Structure
9
To Answer this Question We Only Need to Make Use
of Existing Resources
  • SCOP Further catalogs Natures reductionism
    into structural domains, folds, families and
    superfamilies
  • SUPERFAMILY assigns the above to fully sequenced
    proteomes

Tree Construction Using Protein Structure
10
Use of SCOP Superfamilies
  • Using structure, how do you distinguish
    convergent versus divergent evolution?
  • The SCOP notion of SUPERFAMILY with evidence of
    weak sequence relationships can be used to
    discount convergence.

Tree Construction Using Protein Structure
11
Structural OrganizationSCOP v1.73
7
1086
1777
3464
97178
Tree Construction Using Protein Structure
12
Is Structure a Useful Discriminator - Maybe
Distribution among the three kingdoms as taken
from SUPERFAMILY
  • Superfamily distributions would seem to be
    related to the complexity of life
  • Update of the work of Caetano-Anolles2 (2003)
    Genome Biology 131563

153/14
21/2
310/0
645/49
1
9/1
29/0
68/0
Any genome / All genomes
Tree Construction Using Protein Structure
13
The Unique Superfamily in Archaea d.17.6
  • Archaeosine tRNA-guanine transglycosylase (tgt),
    C2 domain
  • First step in the biosynthesis of an
    archaea-specific modified base, archaeosine
    (7-formamidino-7-deazaguanosine)
  • Found in tRNAs
  • Was found exclusively in Archaea.

Reference Interpro IPR004804
Tree Construction Using Protein Structure
14
Method Distance Determination
Presence/Absence Data Matrix
(FSF) SCOP SUPERFAMILY organisms organisms organisms
(FSF) SCOP SUPERFAMILY C. intestinalis C. briggsae F. rubripes
a.1.1 1 1 1
a.1.2 1 1 1
a.10.1 0 0 1
a.100.1 1 1 1
a.101.1 0 0 0
a.102.1 0 1 1
a.102.2 1 1 1
Distance Matrix
C. intestinalis C. briggsae F. rubripes
C. intestinalis 0 101 109
C. briggsae 0 144
F. rubripes 0
Tree Construction Using Protein Structure
15
Is Structure a Useful Discriminator - Yes
Eukaryota
Bacteria
Archaea
The method cleanly placed all species in their
correct superkingdoms
Tree Construction Using Protein Structure
16
Presence/absence vs. Abundance
  • Abundance fails to distinctly separate the three
    superkingdoms
  • Presence/absence succeeds in distinctly
    separating the three superkingdoms
  • Why?
  • Emergence or loss of a FSF is a major
    evolutionary event
  • Emergence of a new FSF may lead to 1-n new
    functions
  • Gene loss likely FSF less likely
  • Horizontal gene transfer only relevant if it
    introduces a FSF
  • Not affected by gene duplication
  • Coverage and sensitivity while not perfect is
    enough

Tree Construction Using Protein Structure
17
Trees of Archaea
Our
NCBI
Crenarchaeota
Pyrococcus furiosus Pyrococcus horikoshii Pyrococc
us abyssi Thermoplasma volcanium Thermoplasma
acidophilum Halobacterium sp. NRC-1 Sulfolobus
tokodaii Sulfolobus solfataricus Pyrobaculum
aerophilum Aeropyrum pernix Methanosarcina
mazei Methanosarcina acetivorans Archaeoglobus
fulgidus Methanopyrus kandleri Methanocaldococcus
jannaschii Methanobacterium thermoautotrophicum Me
thanothermobacter thermautotrophicus
Sulfolobus tokodaii Sulfolobus solfataricus Pyroba
culum aerophilum Aeropyrum pernix Pyrococcus
furiosus Pyrococcus horikoshii Pyrococcus
abyssi Thermoplasma volcanium Thermoplasma
acidophilum Halobacterium sp. NRC-1 Methanosarcina
mazei Methanosarcina acetivorans Methanocaldococc
us jannaschii Archaeoglobus fulgidus Methanopyrus
kandleri Methanobacterium thermoautotrophicum Meth
anothermobacter thermautotrophicus
15 14 11 2 13 12 10 17 16 3 9 4 6 1 7 8 5
Pyrococcus
Thermoplasma
Crenarchaeota
Methanogen
Euryarchaeota
Tree Construction Using Protein Structure
18
Our Tree of Bacteria
  • 123 Bacteria
  • Parasitic bacteria are not grouped with their
    full gene complement counterparts
  • They are sorted into proper groupings that mirror
    the overall tree
  • A few anomalies

Tree Construction Using Protein Structure
19
Eukaryotes Anomalies May Point to Genome
Problems
Tree Construction Using Protein Structure
20
A Closer Look at One SuperfamilyThe Protein
Kinase-Like Superfamily
Eric Scheeff
Scheeff Bourne 2005 PLoS Comp. Biol. 1(5) e49
A Closer Look at One Superfamily
21
The Protein Kinase-like Superfamily
  • A large family important to signal transduction
    in eukaryotes and many bacteria.
  • Phosphotransferases transfer phosphate group
    from ATP to Ser/Thr or Tyr residue on target
    protein, producing a range of downstream
    signaling effects.
  • PKA an example of a typical protein kinase
    (TPK) fold, shown in open book format

A Closer Look at One Superfamily
22
The Protein Kinase-Like Superfamily
  • A range of different families, all
    phosphotransferases
  • A variety of different targets
  • All possess a core cassette of elements shared
    with the TPKs
  • ATP binding
  • Catalysis
  • Structures can be highly variable, particularly
    in the substrate binding regions

Family Structural Representative Phosphorylates Biological result
Typical Protein Kinases (TPKs) Protein Kinase A (PKA) Ser/Thr or Tyr residues of proteins Range of signaling effects
Alpha kinases Channel Kinase (ChaK) Ser/Thr residues in alpha-helices Range of signaling effects
Actin-Fragmin Kinase (AFK) Actin-Fragmin Kinase (AFK) Thr residue of actin Control of actin polymerization
Phosphatidyl -inositol 3- and 4-kinases Phosphatidylinositol 3-kinase (PI3K) Phosphatidylinositol (PI), PI-phosphates, PI-bisphosphates Range of second-messenger signaling effects
Phosphatidyl-inositol phosphate kinases Phosphatidylinositol phosphate kinase (PIPK) PI-phosphates Range of second-messenger signaling effects
Choline/ ethanolamine kinases Choline Kinase (CK) Choline Part of pathway that eventually produces phoshpatidylcholine, important constituent of membranes
Aminoglycoside Kinases Aminoglycoside Kinases (AK) Aminoglycoside antibiotics Antibiotic resistance
A Closer Look at One Superfamily
23
Method
  • Begin with a multiple structure alignment using
    CE-MC (NAR 2004) of 30 comparable TPKs and APKs
    and manually correct in a pair-wise manner over a
    period of 1-2 person years
  • Review the literature on each structure
  • Review the associated sequence alignments derived
    from structure

A Closer Look at One Superfamily
24
Phosphoinositide-3 Kinase (D) and Actin-Fragmin
Kinase (E)
PKA
ChaK (Channel Kinase)
A Closer Look at One Superfamily
25
Can We Propose an Evolutionary History for the
Protein Kinase-Like Superfamily?
1 2 3 4 5
  • Bayesian inference of phylogeny (MrBayes)
  • Manual structure alignment produces very
    high-quality sequence alignment of diverse
    homologues
  • But, sequence information too degraded to
    produce branching with sufficient support (i.e. a
    high posterior probability)
  • Addition of a matrix of structural
    characteristics (similar to morphological
    characteristics) produces a well supported
    combined model
  • Neither sequence structural characteristics
    sufficient to alone produce resolved tree, must
    be used in combination.

1BO1 Atypical 0 0 0 0 1
1IA9 Atypical 1 1 1 1 0
1E8X Atypical 1 0 1 1 1
1CJA Atypical 1 0 1 1 1
1NW1 Atypical 1 0 1 0 0
1J7U Atypical 1 0 1 0 1
1CDK AGC 1 1 1 0 1
1O6L AGC 1 1 1 0 1
1OMW AGC 1 1 1 0 1
1H1W AGC 1 1 1 0 1
1MUO Other 1 1 1 0 1
1TKI CAMK 1 0 1 0 1
1JKL CAMK 1 0 1 0 1
1A06 CAMK 1 0 1 0 1
1PHK CAMK 1 0 1 0 1
1KWP CAMK 1 0 1 0 1
1IA8 CAMK 1 0 1 0 0
1GNG CMGC 1 0 1 0 1
1HCK CMGC 1 0 1 0 1
1JNK CMGC 1 0 1 0 1
1HOW CMGC 1 0 1 0 1
1LP4 Other 1 0 1 0 1
1F3M STE 1 0 1 0 1
1O6Y Other 1 0 1 0 1
1CSN CK1 1 0 1 0 1
1B6C TKL 1 0 1 0 1
2SRC TK 1 0 1 0 1
1LUF TK 1 0 1 0 1
1IR3 TK 1 0 1 0 1
1M14 TK 1 0 1 0 1
1GJO TK 1 0 1 0 1
Example columns 1) Ion pair analogous to K72-E91
in PKA 2) a-Helix B present 3) State of a-Helix C
(0 kinked, 1 straight) 4) State of Strand 4 (0
kinked, 1 straight) 5) a-Helix D present
A Closer Look at One Superfamily
26
Proposed Evolutionary History for the Protein
Kinase-Like Superfamily
  • Suggests distinctive history for atypical
    kinases, as opposed to intermittent divergence
    from the typical protein kinases (TPKs)
  • TPK portion of tree shows high degree of
    agreement with Manning tree
  • Branching is supported by species representation
    of kinase families

APH
AGC
CK
CAMK
0.64
AFK
0.97
CMGC
1.0
0.85
0.78
TKL
PI3K
CK1
TK
  • Atypical kinase families Blue
  • Typical protein kinase groups (subfamilies) Red
  • Branch labels posterior probability of branch

PIPKIIß
A Closer Look at One Superfamily
ChaK
27
Has the Environment had an Influence on Modern
Day Proteomes?
Chris Dupont Scripps Institute of
Oceanography UCSD
Dupont, Yang, Palenik, Bourne. 2006 PNAS 103(47)
17822-17827
Environmental Influence
28
Consider the Distribution of Disulfide Bonds
among Folds
  • Disulphides are only stable under oxidizing
    conditions
  • Oxygen content gradually accumulated during the
    earths evolution
  • The divergence of the three kingdoms occurred
    1.8-2.2 billion years ago
  • Oxygen began to accumulate 2.0 billion years
    ago
  • Logical deduction disulfides more prevalent in
    folds (organisms) that evolved later
  • This would seem to hold true
  • Can we take this further?

1
Environmental Influence
29
Theoretical Levels of Trace Metals and Oxygen in
the Deep Ocean Through Earths History
  • Whether the deep ocean became oxic or euxinic
    following the rise in atmospheric oxygen (2.3
    Gya) is debated, therefore both are shown (oxic
    ocean-solid lines, euxinic ocean-dashed lines).
  • The phylogenetic tree symbols at the top of the
    figure show one idea as to the theoretical
    periods of diversification for each Superkingdom.

Replotted from Saito et al, 2003 Inorganica
Chimica Acta 356 308-318
Environmental Influence
30
Making the Metallome of Each Species Can Only
be Done from Structure
  1. Start with SCOP
  2. Each superfamily level assignment was checked
    manually for metal binding
  3. All the structures representing the family had to
    bind the metal for it to be considered
    unambiguous
  4. The literature was consulted to resolve
    ambiguities
  5. Superfamily database used to map to proteomes
  6. 23 Archaea, 233 Bacteria, 57 Eukaryota
  7. Cu, Ni, Mo ignored (lt0.3) of proteome

Environmental Influence
31
Levels of Ambiguity
  • Ambiguous superfamily binds different metals or
    have members that are not known to bind metals
  • Ditto families
  • Approx 50 of superfamilies and 10 of families
    are ambiguous
  • Only unambiguous families used in this study

Environmental Influence
32
Superfamily Distribution As Well As Overall
Content Has Changed
Environmental Influence
33
Metallomes are Discriminatory
  • A quantile plot showing the percent of Bacterial
    proteomes each Fe-binding fold family occurs in
    (x).
  • This plot also shows the average copy number of
    that fold family in the proteomes where it occurs
    (?).
  • Few Fe-binding folds are in most proteomes.
  • Widespread Fe-binding folds are not necessarily
    abundant.
  • Similar trends are observed for Zn, Mn, and Co in
    all three Superkingdoms.

Environmental Influence
34
Metal Binding Proteins are Not Consistent Across
Superkingdoms
Since these data are derived from current species
they are independent of evolutionary events such
as duplication, gene loss, horizontal transfer
and endosymbiosis
Environmental Influence
35
Power Laws Fundamental Constants in the
Evolution of Proteomes
  • A slope of 1 indicates that a group of structural
    domains is in equilibrium with genome growth,
    while a slope gt 1 indicates that the group of
    domains is being preferentially duplicated (or
    retained in the case of genome reductions).

van Nimwegen E (2006) in Koonin EV, Wolf YI,
Karev GP, (Ed.). Power laws, scale-free
networks, and genome biology
Environmental Influence
36
Metal Binding Proteins are Not Consistent Across
Superkingdoms
Environmental Influence
37
Why are the Power Laws Different for Each
Superkingdom?
  • Power laws are likely influenced by selective
    pressure. Qualitatively, the differences in the
    power law slopes describing Eukarya and Prokarya
    are correlated to the shifts in trace metal
    geochemistry that occur with the rise in oceanic
    oxygen
  • We hypothesize that proteomes contain an imprint
    of the environment at the time of the last common
    ancestor in each Superkingdom

Environmental Influence
38
Do the Metallomes Contain Further Support for
this Hypothesis?
Environmental Influence
39
e- Transfer ProteinsSame Broad Function, Same
Metal, Different Chemistry Induced by the
Environment?
  • Fe-S clusters
  • Fe bound by S
  • Cluster held in place by Cys
  • Generally negative reduction potentials
  • Very susceptible to oxidation
  • Cytochromes
  • Fe bound by heme (and amino-acids)
  • Generally positive reduction potentials
  • Less susceptible to oxidation

Environmental Influence
40
Agenda
  • Why is protein structure useful?
  • Tree construction using protein structure
  • One protein superfamily in more detail
  • Environmental Influence
  • On-going work
  • The role of calcium over time
  • Applying structural domain combinations
  • Co-evolution of kinases and phosphatases

41
The Role of Calcium
  • Calcium concentrations have not fluctuated over
    evolutionary time scales to the same degree as
    iron and zinc
  • Low diffusion rate and rapid kinetics
  • Calcium important for maintaining cell structure
  • Calcium became a very important signaling
    molecule in multi-cellular organisms

The Role of Calcium
42
Calcium Positive Selection Across All
Superkingdoms
Large number of arylsulfatases
The Role of Calcium
43
Calcium Uni vs. Multi Cellular
The Role of Calcium
44
Structural Domain Combinations
  • Definition
  • Compact, spatially distinct
  • Fold in isolation
  • Recurrence
  • Importance
  • Understand the structure and function of the
    whole protein

Structural Domain Combinations
45
Domain Trees Might Provide Insights into
Horizontal Gene Transfer
Chlamydiales
Alveolata
Rhodophyta
Cyanobacteria
Metazoa
Actinobacteria
Exists only in Cyanobacteria
Exists in only one red algae in Eukaryotes
a.1.1.3 phycocyanin-like phycobilisome
proteins A light harvesting antennae of
photosystem II
Structural Domain Combinations
46
Protein Kinases and Phosphatases
  • Protein kinases and phosphatases are components
    of numerous signal transduction pathways
  • They are responsible for regulating many cellular
    processes
  • Implicated in many cancers and diseases
  • Comprise a significant portion of genomes
  • At least 518 protein kinase genes
  • At least 107 protein tyrosine phosphatase genes
  • Alonso et al. Cell. 2004 Jun 11117(6)699-711

Manning, et al. (2002) Science 2981912-1934
Co-evolution Kinases and Phosphatases
47
Example ADF/Cofilin
  • The Cofilin/ADF (actin depolymerizing factor)
    family remodels the actin filaments of the
    cytoskeleton
  • They sever actin filaments and increase the rate
    that monomers leave the filaments pointed end
  • Cofilin/ADF proteins are phosphorylated at a
    conserved N-terminal serine (Ser3)
  • When phosphorylated, cofilin/ADF is unable to
    bind actin, and is thus inactive
  • When dephosphorylated, cofilin/ADF can bind and
    depolymerize actin

Co-evolution Kinases and Phosphatases
48
Phosphorylation and Dephosphorylation of
ADF/Cofilin
  • Two serine/threonine kinase families can
    phosphorylate (deactivate) ADF/cofilin
  • LIMK
  • TESK
  • Two phosphatase families have been identified
    that dephosphorylate ADF/Cofilin
  • Slingshot (SSH) phosphatases
  • Chronophin (CIN)

Co-evolution Kinases and Phosphatases
49
Coordinated Divergence
  • Slingshot phosphatase and TESK and LIMK protein
    kinase families appear to have emerged at same
    point in eukaryotic tree
  • They also underwent an apparent gene duplication
    at the same time (after Ciona divergence)
  • Can point of divergence be more accurately
    pinpointed as more organisms are sequenced?

Emergence
Gene Duplication
Co-evolution Kinases and Phosphatases
50
Parting Comments
  • Structure plays a useful role at various levels
    of detail in the study of evolution
  • Much of the data used here are sitting on the Web
    for anyone to apply
  • Perhaps we should do more to train students in
    both the life sciences and the earth sciences?

51
Parting Comments
  • The reductionism used here seems useful, but
    there is a growing sense that protein structure
    represents more of a continuum perhaps composed
    of unique fragments at the sub-fold level The
    Russian Doll effect
  • Evidence is growing that proteins from different
    superfamilies may share a functional site but
    nothing else does this speak to a very distant
    evolutionary relationship?

52
Acknowledgements
  • Kristine Briedis
  • Andrew Butcher
  • Russ Doolittle
  • Chris Dupont
  • Eric Scheeff
  • Song Yang
  • The Whole Group
  • NSF NIH

Support Open Access All the work here does
53
Backpocket
54
The importance of small class Zn folds to
Eukarya
Distribution of 53 unique small class Zn
families
Chapter 4 Environmental Influence
55
Conclusions
  • Metallomes have diverse compositions, yet the
    total abundances conform to evolutionary
    constants
  • These constants exhibit Superkingdom-specific
    differences consistent with ancient changes in
    geochemistry, a hypothesis further supported by
    the roles of Zn and Fe
  • These results provide genomic-based evidence for
    the theory of Anbar and Knoll that Eukaryotic
    diversification and oxygen-related changes in
    trace metal chemistry are linked
  • Prokaryotes likely diverged in anoxic
    environments, while Eukaryotes diverged in oxic
    environments (supported by the fossil records)

56
Possible Flaws in the Argument
  • Proteome Coverage Currently only 40 of
    Eukaryotes and 55 of Prokaryotes are covered by
    structural families
  • Estimate that 90 of the unannotated space is
    covered by existing families

57
Possible Flaws in the Argument
  • Genome Bias there is a disproportionate number
    of thermophiles among Archaea, whereas the
    Eukaryotes are almost entirely aerobic
  • Bacteria have a better distribution
  • The dataset does include the Eukaryotic anaerobic
    amitochondritic parasite Encephalitozoon cuniculi,
    which has metallomic features typical of aerobic
    Eukaryotes
  • Principal component analysis shows oxygen
    tolerance and environment have little effect upon
    the trends observed. Phylogeny groupings are
    apparent however (suggests vertical inheritance)

58
Possible Flaws in the Argument
  • Zn concentrations are associated solely with
    increased complexity not the environment
  • Eukaryotes of varying complexity follow the same
    power law
  • Zn finger abundance not consistent with
    complexity
  • 3 Zn superfamilies found in Prokaryotes and
    Eukaryotes are more abundant across all
    Eukaryotes

59
Manual Annotation of SCOP (1.68) Superfamilies
and Families
  • 281 of the 1495 superfamilies have at least one
    metal associated structure at the domain level
  • 50 of the 281 metal associated superfamilies
    are ambiguous 10 of the families
  • Zn associated superfamilies are the most
    prevalent, followed by Fe, Cu, Mn, Co Mo Ni

Dupont, Briedis, Yang, Palenik, Bourne 2005 In
preparation.
60
  • Follows an orderly progression through evolution
    - domain duplication events remain proportional
    to genome size
  • Occasionally follow power law distribution
  • Rough estimates of domain abundance e.g.,
    thioredoxins 1 of global proteome

61
Archaea (1-2 of the proteome) Bacteria
(.7-.8) Eukaryotes (0.01-.05)
Cytochrome c evolved after Bacteria/Archaea split
Proliferation of cytP450 in Eukaryotes
62
Case study II Fe vs. Zn
  • From 4Mya to the present
  • Fe concentrations in the ocean have fallen 10,000
    fold
  • Zn concentrations have risen 10,000,000 fold

63
Fe Binding
  • 2-3 of Bacteria
  • and Archaea proteomes are Fe-binding
  • 0.5-1.5 of Eukaryota

Zn Binding
  • 1.5-2.5 of Bacteria and Archaea proteomes are
    Zn-binding
  • 4.5-5 of Eukaryota

64
Zn Binding by Kingdom
Hard ligands Asp, Glu, Ser, Tyr Soft ligands
Cys, His
Zn Lewis acid reactions to informational systems
(Zn fingers are gt60 of Zn containing
superfamilies in Eukaryotes!)
65
Future Work
  • Ca concentrations have also changed dramatically
    is this evident in modern proteomes and if so
    what are the evolutionary implications?
  • Proteins associated with the nervous system 9
    before a rapid expansion .5 Mya around the time
    of the TK transition
  • c.19 ubiquitous Mg binding
  • Evolution of photosynthesis
Write a Comment
User Comments (0)