Day 5: Comparative genome analysis - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Day 5: Comparative genome analysis

Description:

Can we actually reconstruct genome evolution ? ... Gene content evolution is a highly dynamic process. Even in the evolution towards the largest genome a large ... – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 43
Provided by: cmbi
Category:

less

Transcript and Presenter's Notes

Title: Day 5: Comparative genome analysis


1
Day 5 Comparative genome analysis
  • Added value of complete genomes
  • More sequences
  • Large scale pattern detection in genomes
  • -Function/orthology prediction by bi-directional
    best hit approaches
  • More than bags of genes (?)
  • -Presence/absence/variation of pathways
  • -Prediction of new pathways

2
Exponential growth of the number of sequenced
genomes, doubling time of 16 months
3
Analyze and Compare genomes at various
levels. -DNA (e.g. GC content (we actually do
not need sequencing for that), dinucleotide
frequencies, coding densities of leading/lagging
strands. GC skew etc.). -Protein coding
potential (e.g. coding density). -Presence/absenc
e/size of Protein families. -Presence/absence of
genes/comparing at the level of orthologs. -Gene
Order evolution
4
Strand asymmetries (genes)
Asymmetry of the density of coding/non-coding in
B.subtilis (Kunst et al., Nature 1997)
5
Strand asymmetries (nucleotide frequencies)
GC skew (inner circle, G-C/GC) in a complete
genome (Nitrosomas_ europaea)
6
Number of Orfs per nucleotide is more or less
constant in prokaryotes (Doolittle, Nature 2002).
Interesting exceptions1100 pseudogenes in M.
leprae Overprediction of ORFs in A.pernix
7
Functional elements in the human genome
3.4 109 nt 20.000 protein genes
tRNA, rRNA, 0,5
Coding regions (proteins) 1.7
Satellite DNA (centromeres, telomeres) 12
non-translated RNA genes Xist, H19, His-1, bic,
microRNAs, etc. Regulatory elements promoters,
enhancers, etc. Transposable elements (LINEs,
SINEs, ...) 40-45
Introns 34
intergenic DNA 52
86 no (known) function
8
Gene family size distribution for Bacillus
subtilis (Kunst et al., Nature 1997)
9
A power-law in the gene family size
distribution YXb
10
Buchnera metabolism, deduced from the genome
(Shigenobu et al, Nature, 2000)
11
Genome annotation of Buchnera Classifying
functions into functional categories (Shigenobu
et al, Nature, 2000)
12
  • Evolution of gene content
  • 1) Quantitative approaches Count the number of
    genes that two genomes share (orthology) and
    relate that to their phylogenetic distance.
  • -Is there a rate of gene content evolution ?
    (quantitative trends)
  • Can we actually reconstruct genome evolution ?
    (what happened when and what are the primary
    processes ?)
  • 2) Qualitative approaches interpret the
    differences between two genomes in terms of the
    functions of the encoded proteins.
  • -To what extent can we explain the differences
    between the
  • phenotypes in terms of the genomes gene content
  • -Are there functional patterns e.g. in genome
    size variation (qualitative trends)

13
Rate of genome evolution in terms of gene content
(Huynen and Bork, PNAS, 1998)
14
Genome phylogeny based on gene content
  • Count the number of shared orthologs between
    genomes using the bi-directional best,
    significant, hit approach (include
    fusion/fission)
  • Create a similarity matrix by dividing number of
    shared orthologs by the genome size of the
    smallest genome
  • Create a distance based phylogeny from the
    similarity matrix

Snel et al, 1999, Nat. Gen, 21108 Huynen et
al., 1999, Science 2861441a
15
(No Transcript)
16
Convergence in gene content is also visible in
phylogenetic trees that are, instead of based in
the fraction of shared genes, based on the number
of shared genes. Large Bacterial genomes (E.coli,
B.subtilis, M.tuberculosis, Synechocystis)
cluster together, and so do small genomes
(R.prowazekii, C.trachomatis)
17
Shared gene content between Archaea and Bacteria
depends on genome size
18
The topology of a genome phylogeny based on gene
content shows a high similarity to 16S and 23S
rRNA phylogenies
so what !! (....) If instances of lateral
gene transfer can no longer be dismissed as the
exceptions that prove the rule it must be
admitted that () unless organisms are
constructed as either less or more than the sum
of their genes there is no unique organismal
phylogeny (W. F. Doolittle, Science 284,
2124-2128)
19
  • Horizontal (lateral) gene transfer
  • The evolutionary history of a gene is not always
    consistent with the history of the species
  • Discovering horizontal gene transfer by
  • Relative levels of sequence identity.
  • Comparing phylogenetic trees of the species (SSU
    rRNA) and that of the gene in question. Be
    careful however!! The sequences have to be
    orthologous to each other. Ancient gene
    duplications followed by differential loss can
    also give rise to horizontal gene transfer like
    trees.
  • Different codon usage than that of the other
    genes in the genome

20
Eukaryotes
Mitochondria
Archaea
Bacteria
No apparent Horizontal Gene Transfer in the
evolution of Leucine Aminoacyl-tRNA synthetase
(the phylogeny of the sequences fits more or
less the species phylogeny).
21
Apparent Horizontal Gene Transfer to the
parasites Bbu (B.burgdorferi) and Mge, Mpe
(Mycoplasmas) from the Eukaryotes represented by
Cel (C.elegans) and Sce (S.cerevisiae)
22
Relatively few families do not display any
horizontal gene transfer. This has led to the
discussion whether we can actually talk about a
genome phylogeny. (see Doolittle quote) We
argue that there is a strong, dominant
phylogenetic signal in gene content, and thus one
can speak about a genome phylogeny. But that is
of course open for discussion
23
Reconstructing the course of genome evolution via
a parsimonious approach. Primary
processes Gene gain -invention -gene
duplication -horizontal gene transfer Gene
loss -accumulation of mutations
(pseudogene) -gene deletion Gene fusion/fission
24
Determining the relative contribution of these
processes in genome evolution requires the
reconstruction of the most likely evolution per
orthologous group of proteins, and adding up the
results. Thus we also explicitly reconstruct
the ancestors of the present genomes. NB. These
approaches are based on the size of orthologous
groups, not based on phylogenetic trees.
Because these methods are not based on trees we
need a HGT penalty to make a distinction between
HGT and multiple losses.
25
Gene content evolution is a highly dynamic
process. Even in the evolution towards the
largest genome a large number of genes have been
lost (e.g. E.coli)
26
Rope as a metaphor to describe an organismal
lineage (Gary Olsen) Individual fibers genes
that travel for some time in a lineage.
While no individual fiber present at the
beginning might be present at the end, the rope
(or the organismal lineage) nevertheless has
continuity.
27
However, the genome as a whole will acquire the
character of the incoming genes (the rope turns
solidly red over time).
28
  • Qualitative differential genome analysis
  • Find pathogen specific specific proteins that
    can serve as drug targets
  • Relate the differences between genomes to the
    differences in the phenotypes

29
Interpreting the differences between genomes in
terms of the functions of their genes
H. influenzae genome
Huynen et al., 1997 Trends Genet 13, 389
30
Three-way comparisons
Huynen et al., 1998, FEBS Lett 426, 1-5
31
Although we can, qualitatively, interpret the
variations in shared gene content in terms of the
phenotypes of the species, quantitatively they
depend on the relative phylogenetic positions of
the species. The closer two species are the
larger fraction of their genes they share.
32
Correlation in the amount of regulation per gene
and the size of the genome. Small genomes tend to
lose their regulation ? have few alternative
modes of action, and live in relatively constant
environments.
33
Large genomes spend relatively many proteins on
regulation, few on cell division and other
household functions(van Nimwegen, Trends in
Genetics, 2003)
34
A bottom-up approach to superfamily Distribution
supralinear behaving families tend to be involved
in gene regulation (60 to 80), linear behaving
families tend to be involved in metabolism (82 to
87), logarithmically behaving families do not
show a specific preponderance of functional
classes Orengo et al, TIG 2004
35
The number of regulatory genes versus the number
of metabolic genes
The derivatives of the curves above
The difference between the numbers above is
maximal when the genome size is about 4800 ?
maximum amount of metabolic versatility for
minimum number of regulators
36
Gene order evolution -Establish orthologous
relations between pairs of genomes (e.g. S-W best
bidirectional hit approach -Put them in a
dotplot, color the relative direction of
transcription (Green for the same relative
direction. Red for the opposite direction.)
37
(No Transcript)
38
  • Evolution of genome organization
  • In prokaryotes, genome inversions centered around
    the origin/terminus of replication are a major
    source of genome rearrangements.
  • This suggests that both replication forks are in
    close contact - comparative genome analysis
    provides support for a hypothesis about genome
    replication
  • and a close proximity of the forks would
    increase the
  • probability of reciprocal recombination or
    transposition between sequences at the two forks.
    That the forks are near each other is also
    consistent with the 'replication factory' model
    based on immunolocalization of components of the
    replication machinery in Bacillus subtilis
    (Tillier and Collins, 2000. Nat. Gen)
  • Prokaryotic genomes tend to be shuffled in a
    comparatively short time (relative to the total
    of time in the evolutionary tree)

39
Newport Yan, Current Biology, 1996
40
Rapid shuffling of genomes (compared to 16S rRNA
identity)
41
Some species (MP-MG, CP-CT) show a significantly
lower rate of genome shuffling than others. A
possible explanation is the absence of the
protein RecA from these genomes. RecA is involved
in recombination ? absence of recombination would
slow down genome shuffling.
42
Further Reading
  • Comparative genome analysis Eppinger M, Baar C,
    Raddatz G, Huson DH, Schuster SC Comparative
    analysis of four Campylobacterales (2004) Nat Rev
    Microbiol, 11872-85
  • Gene order evolution by inversion Suyama M, Bork
    P., (2001) Evolution of prokaryotic gene order
    genome rearrangements in closely related species.
    Trends Genet 1710-3.
  • Scaling of gene functional classes van Nimwegen
    E. Links Scaling laws in the functional content
    of genomes. Trends Genet. 2003 Sep19(9)479-84.
Write a Comment
User Comments (0)
About PowerShow.com