Plastid genomes - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Plastid genomes

Description:

The most important are the chloroplasts. ... Polyphyletic (Reptiles) NODE. BRANCH. Phylogeny Estimation. Branch-and-Bound. Least Squares ... – PowerPoint PPT presentation

Number of Views:1242
Avg rating:3.0/5.0
Slides: 64
Provided by: Matth154
Category:

less

Transcript and Presenter's Notes

Title: Plastid genomes


1
Plastid genomes
  • A small structure occurring in the cytoplasm of
    plant cells. The most important are the
    chloroplasts. Other plastids contain red, orange,
    and yellow pigments, giving color to petals and
    fruits, and some contain starch, oil, etc.,
    acting as storage organelles.
  • 30 finished for 29 organisms
  • http//megasun.bch.umontreal.ca/ogmp/projects/othe
    r/cp_list.html

2
Chloroplast DNA (cpDNA)
  • circular double-helix 20-80 copies per chl.
  • sequences for
  • gene expression (tRNA, rRNA, etc.)
  • for photosynthesis (prot.)
  • no recombination
  • uniparental inheritance
  • conservative evolution
  • nuclear genetic code

3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
Genomes
  • The whole genomes of over 800 organisms can be
    found in Entrez Genomes. The genomes represent
    both completely sequenced organisms and those for
    which sequencing is in progress. All three main
    domains of life - bacteria, archaea, and
    eukaryota - are represented, as well as many
    viruses.

8
Genome miniaturization
  • Use and disuse philosophy
  • Mt Genome size following endosymbiosis
  • Reclinomonas (62 protein encoding genes)
  • Plastid Genome size in parasites
  • Epiphagus (Beech drop)

9
(No Transcript)
10
Phylogenetic distribution of gene loss from
chloroplast genomes. Colour keys designating
frequency of parallel gene losses are given at
top right. Numbers below species names indicate
the number of protein coding genes and ycfs in
the corresponding chloroplast genome. Numbers
above gene columns represent the number of genes
lost which are accounted for in the figure for
the given genome. The symbols for primary and
secondary symbiosis are indicated. Five genes
were excluded from gene-loss analysis for reasons
indicated at the lower left. Some highly
divergent proteins may have escaped detection
with BLAST searches. Functional, transferred
nuclear homologues of chloroplast origin are
indicted in white rectangles. In Pinus, four ndh
genes are completely missing (ndhA, ndhF, ndhG,
ndhJ), the other seven are pseudogenes23 and are
scored as losses here
11
Genes are just one of many types of DNA sequences
  • single copy genes
  • multiple copy genes
  • noncoding repetitive sequences (often, most of
    genome!)

12
increase in Genome size
  • Regional (particular sequence is multiplied)
  • Gene duplication, unequal crossing over
  • Global (entire genome or chromosome is
    duplicated)
  • Polyploidization
  • Trasposons

13
Polyploidy
  • Allopolyploidy the combination off genetically
    distinct chromosome sets
  • Autopolyploidy multiplication of one basic set
    of chromosomes

14
Tetraploidy
  • Genome doubling
  • Most common
  • Is found in most organisms

15
Survive only rarely
  • Prolongation of cell division time
  • Increase the volume of the nucleous
  • Increase of chromosome disjuctions
  • Genetic imbalance
  • Interference with sexual differentiation

16
Arabidopsis
  • 115.4 megabase out of 125 MB
  • Whole genome duplication, gene loss and lateral
    transfer from plastid

17
(No Transcript)
18
Arabidopsis genes
19
(No Transcript)
20
Appearance of genomes
What does 50 kb of sequence look like?
  • One to many chromosomes
  • Repeat sequences common in some genomes e.g. 35
    of human are transposable elements
  • Gene structure varies no. and length of introns

repeat
Pseudogene
Intron-exon components of a gene
Human very few genes - repeats
Yeast many genes (25) few repeats
Maize mostly repeats
21
Gene Duplication
  • Partial or internal gene duplication
  • Complete gene duplication
  • Partial chromosomal duplication
  • Polyploidy or genome duplication

22
Gene Duplication
  • Duplicative transposition
  • Unequal crossing-over
  • Replication slippage
  • Gene amplification (rolling circle replication)

23
Antifreeze glycoprotein gene
  • Fish living in Antarctic Ocean have body temps
    -1.0 to -0.7 C.
  • Freezing resistance is due to a protein in the
    blood that adsorbs small ice crystals and
    inhibits their growth

24
Internal Gene Duplication
1
2
3
4
5
6
5
3
Ancestral trypsinogen gene
Deletion
1
6
5
3
Thr Ala Ala Gly
4 fold duplication addition of spacer sequence
1
6
5
3
Internal duplications addition of intron
sequence
Spacer Gly

1
1
2
3
4
5
6
7
37
38
39
40
41
3
6
5
Antifreeze glycoprotein gene
25
Theory of gene duplication
26
Gene trees vs species trees
27
Gene trees vs species trees
28
Gene trees vs species trees
A
C
B
3
1
2
29
A
G
C
T
30
Rates of Nucleotide Substitution
  • Basic quantity in studying molecular evolution
  • Among genes
  • Within genes
  • Among organisms
  • Among codon positions or 2nd structure

31
Different Gene Regions
  • Coding regions
  • Nondegenerate sites
  • Twofold degenerate sites
  • Fourfold degenerate sites
  • Noncoding regions
  • 5 3 untranslated regions
  • Introns
  • Psuedogenes

32
Table 4.1 Rates of synonymous and nonsynonymous
nucleotide sustitutions ( standard errors) in
various mammalian protein-coding genesa
33
Table 4.2 Rates of transitional and
transversional substitutions (per site per 109
years) at nondegenerate, twofold degenerate, and
fourfold degenerate codon sitesa
aThe rates are averages over the genes in Table
4.1.
34
Noncoding regions
35
Causes of Rate Variation
  • Functional constraints

36
Causes of Rate Variation
  • Synonymous vs. Nonsynonymous rates
  • Should be similar in rate (Ka/Ks1)
  • Why not?
  • Selection
  • Advantageous
  • Purifying

37
Causes of Rate Variation
Variation within a gene
38
Causes of rate Variation
  • Variation among genes
  • Rate of mutation
  • The intensity of selection (1000 fold in Ks)
  • Intensity of purifying selection (functional
    cont)
  • Partial loss of function
  • Relaxation of selection

39
Nucleotide Substitution rates in Eukaryotic
Genomes
Genome
Ks rate
Relative Ks rate
Ka rate
Angiosperm mt 0.5 1 0.1 Angiosperm
cp single copy 1.5 3 0.2 inverted
Repeat 0.3 0.6 0.1 Angiosperm
nuc. 5.4 12 0.4 Mammalian
nuc. 2-8 4-16 0.5-1.3 Mammalian mt
20-50 40-100 2-3
Estimated rate of substitutions/site/10 9 years.
From Palmer, 1991
40
Phylogenetic trees are about visualizing
evolutionary relationships
Nothing in Biology Makes Sense Except in the
Light of Evolution Theodosius
Dobzhansky (1900-1975)
41
Trees
  • Diagram consisting of branches and nodes

A
B
C
D
E
terminal node (leaf)
interior node (vertex)
split (bipartition) also written ABCDE or
portrayed ---
branch (edge)
root of tree
42
Trees
  • Species tree (how are my species related?)
  • contains only one representative from each
    species
  • when did speciation take place?
  • all nodes indicate speciation events
  • Gene tree (how are my genes related?)
  • normally contains a number of genes from a single
    species
  • nodes relate either to speciation or gene
    duplication events

43
Cladogram
44
Phenogram or Phylogram
45
Number of unrooted trees
46
Terms
  • Clade A set of species which includes all of
    the species derived from a single common ancestor
  • Monophyly
  • Polyphyly
  • Paraphyly

47
Monophyletic
Paraphyletic
A A A B
B C
BRANCH
NODE
48
Polyphyletic (Reptiles)
A A A B
B C
BRANCH
NODE
49
Phylogeny Estimation
Camin-Sokal Parsimony Wagner Parsimony Fitch
Parsimony Transversion Parsimony Generalized
Parsimony
Transition/transversion bias Nucleotide
composition Among-site rate variation Synonymous/n
onsynonymous Relaxed clock models
50
Distance methods
  • Calculate the distance CORRECTING FOR MULTIPLE
    HITS
  • The Distance Matrix
  • 7
  • Rat 0.0000 0.0646 0.1434 0.1456
    0.3213 0.3213 0.7018
  • Mouse 0.0646 0.0000 0.1716 0.1743
    0.3253 0.3743 0.7673
  • Rabbit 0.1434 0.1716 0.0000 0.0649
    0.3582 0.3385 0.7522
  • Human 0.1456 0.1743 0.0649 0.0000
    0.3299 0.2915 0.7116
  • Oppossum 0.3213 0.3253 0.3582 0.3299
    0.0000 0.3279 0.6653
  • Chicken 0.3213 0.3743 0.3385 0.2915
    0.3279 0.0000 0.5721
  • Frog 0.7018 0.7673 0.7522 0.7116
    0.6653 0.5721 0.0000

51
Distance methods
  • Normally fast and simple
  • e.g. UPGMA, Neighbour Joining, Minimum Evolution,
    Fitch-Margoliash

52
Correction for multiple hits
  • Only differences can be observed directly not
    distances
  • All distance methods rely (crucially) on this
  • A great many models used for nucleotide sequences
    (e.g. JC, K2P, HKY, Rev, Maximum Likelihood)
  • aa sequences are infinitely more complicated!
  • Accuracy falls off drastically for highly
    divergent sequences

53
Maximum Parsimony
  • Occams Razor
  • Entia non sunt multiplicanda praeter
    necessitatem.
  • William of Occam (1300-1349)

The best tree is the one which requires the least
number of substitutions
54
Maximum Likelihood
  • Require a model of evolution
  • Each substitution has an associated likelihood
    given a branch of a certain length
  • A function is derived to represent the likelihood
    of the data given the tree, branch-lengths and
    additional parameters

55
The Likelihood Criterion
  • Given two trees, the one maximizing the
    probability of the observed data is best
  • Site likelihood probability of the data for one
    site conditional on the assumed model of
    evolution
  • Site log-likelihood natural logarithm of the site
    likelihood (often abbreviated lnL)
  • Tree score sum of site log-likelihoods (term
    score also general term for the derivative of the
    lnL)
  • Unlike parsimony tree lengths, log-likelihoods
    are comparable across models as well as trees

56
Models can be made more parameter rich to
increase their realism
  • The most common additional parameters are
  • A correction to allow different substitution
    rates for each type of nucleotide change
  • A correction for the proportion of sites which
    are unable to change
  • A correction for variable site rates at those
    sites which can change
  • The values of the additional parameters will be
    estimated in the process (e.g. PAUP)

57
A gamma distribution can be used to model site
rate heterogeneity
58
Comparison of methods
  • Inconsistency
  • Neighbour Joining (NJ) is very fast but depends
    on accurate estimates of distance. This is more
    difficult with very divergent data
  • Parsimony suffers from Long Branch Attraction.
    This may be a particular problem for very
    divergent data
  • NJ can suffer from Long Branch Attraction
  • Parsimony is also computationally intensive
  • Codon usage bias can be a problem for MP and NJ
  • Maximum Likelihood is the most reliable but
    depends on the choice of model and is very slow
  • Methods may be combined

59
How confident am I that my tree is correct?
  • Bootstrap values
  • Bootstrapping is a statistical technique that
    can use random resampling of data to determine
    sampling error for tree topologies

60
Bootstrapping phylogenies
  • Characters are resampled with replacement to
    create many bootstrap replicate data sets
  • Each bootstrap replicate data set is analysed
    (e.g. with parsimony, distance, ML etc.)
  • Agreement among the resulting trees is summarized
    with a majority-rule consensus tree
  • Frequencies of occurrence of groups, bootstrap
    proportions (BPs), are a measure of support for
    those groups

61
Bootstrapping - an example
Ciliate SSUrDNA - parsimony bootstrap
Ochromonas (1)
Symbiodinium (2)
100
Prorocentrum (3)
Euplotes (8)
84
Tetrahymena (9)
96
Loxodes (4)
100
Tracheloraphis (5)
100
Spirostomum (6)
100
Gruberia (7)
Majority-rule consensus
62
Bootstrapping
Majority-rule consensus (with minority components)
Wim de Grave et al. Fiocruz bioinformatics
training course
63
Bootstrap - interpretation
  • Bootstrapping is a very valuable and widely used
    technique (it is demanded by some journals)
  • BPs give an idea of how likely a given branch
    would be to be unaffected if additional data,
    with the same distribution, became available
  • BPs are not the same as confidence intervals.
    There is no simple mapping between bootstrap
    values and confidence intervals. There is no
    agreement about what constitutes a good
    bootstrap value (gt 70, gt 80, gt 85 ????)
  • Some theoretical work indicates that BPs can be a
    conservative estimate of confidence intervals
  • If the estimated tree is inconsistent all the
    bootstraps in the world wont help you..
Write a Comment
User Comments (0)
About PowerShow.com