Lecture 1: Overview of Phylogenetic methods and applications - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Lecture 1: Overview of Phylogenetic methods and applications

Description:

Charles Darwin and Alfred Russel Wallace Evolution as descent with modification, ... Cassowary AACCACCTCACCATATCCCTGTCCTATGCAATCCCAATTCTAATCGCA. ... – PowerPoint PPT presentation

Number of Views:169
Avg rating:3.0/5.0
Slides: 44
Provided by: MatthewP80
Category:

less

Transcript and Presenter's Notes

Title: Lecture 1: Overview of Phylogenetic methods and applications


1
Lecture 1 Overview of Phylogenetic methods and
applications
Allan Wilson
2
Charles Darwin and Alfred Russel Wallace
Evolution as descent with modification,
implying relationships between organisms by
unbroken genetic lines Phylogenetics seeks to
determine these genetic relationships
Alfred Russel Wallace
Darwins sketch the first phylogenetic tree?
Charles Darwin
3
Interpretation of morphological characters is
often subjective, so open to personal biases
Cynodonts (0)
Morganuconodonts (1)
Eutriconodonts (1)
Spalacotheriids (2)
Eupantotheres (2)
Ji et al.
Archaic therians (2)
Hu et al.
Opalized lower jaw of the monotreme Steropodon
Modern therians (2)
e.g. Jaw rotation weak (0), moderate (1), strong
(2) as indicated by vertical wear facets on
molars.
Hu et al. (Nature, 1997) and Ji et
al. (Nature, 1999) coded Steropodon (1) and (2)
respectively, helping to account for their
alternative placements of monotremes
4
Deoxyribonucleic acid (DNA) -Watson, Crick,
Wilkins and Franklin
5
  • Early Molecular phylogenetics
  • - Immunological distances
  • DNA-DNA hybridization
  • Without access to the actual sequences, these are
    difficult to apply corrections and statistical
    significance testing to

6
Phylogenetics is now dominated by the clearly
defined 4 nucleotides and 20 amino acids
Purines
A G
C T
Pyrimidines
Transitions Transversions
Millions of years
Hominid phylogeny from DNA
7
Tree terminology
Rooted tree
internal edge/branch
Unrooted tree
external edge/branch
node
Taxon 1
Taxon 3
Taxon 5
Taxon 7
Taxon 6
Taxon 8
Taxon 2
Taxon 4
internode
8
outgroup
ingroup
polyphyly
Sister taxa
paraphyly
polytomy
bifurcating
9
Overview of phylogenetic procedure - by example
  • Biological problem (the question)
  • Which data to obtain (data sampling)
  • Finding the best tree (search strategy)
  • Defining the best tree (optimality criterion)

10
1. Biological problem (the question)
What is the relationship of the extinct American
Cheetah (Miracinonyx trumani) to other cats?
  • Two main sister group hypotheses
  • Cheetahs (Acinonyx jubatus) Limb, skull,
    vertebrae morphology
  • B. Pumas (Felis concolor) Geography, early
    fossils less cheetah-like

See Barnett et al. (Curr. Biol., 2005)
11
2. Which data to obtain (data sampling)
  • Mitochondrial (mt) DNA
  • High mtDNA copy number is important because
    Ancient DNA is degraded
  • Inferring relatively recent (2-10 million year)
    divergences, so substantial sequence variation is
    required

mt control region best lt 2 million years
mt Protein/RNA coding, best 2 ? 25 million years
Observed divergence
Nuclear protein-coding, best gt 25 million years
time
12
Mitochondrial partial NADH1 alignment for birds
Nexus Begin DATA

Dimensions ntax29 nchar10692
Format datatypedna gap-
Matrix

Tinamou AACTATCTATTCATATCCTTATCATACATCATTC
CTATTCTTATTGCA.. Emu
AACCATCTCACTATATCACTCTCCTATGCAATCCCCATTCTAATCGCA..
Cassowary AACCACCTCACCATATCCCTGTCCTATGCAATCC
CAATTCTAATCGCA.. Kiwi
AACTACCTCACTATATCACTATCATATGTCATCCCAATTCTGATTGCA..
Rhea AACTACCTAATTATGTCCCTGTCATATGCTATCC
CAATTCTAATCGCA.. Ostrich
ACACACCTGACTATAGCACTCTCATACGCTGTTCCAATCCTAATTGCA..
Chicken AACCTTCTAATCATAACCTTATCCTATATTCTCC
CCATCCTAATCGCC.. BrushTurkey
AAACACCTCATCATATCCCTATCCTATGTTCTCCCAATTTTAATCGCC..
MagpieGoose AATCACCTCATTATAACCCTATCGTATGCCATCC
CAATCCTAATCGCC.. Duck
AGCTACCTCATTATATCCCTCCTATACGCCATCCCCATTCTAATCGCC..
Broadbill ACTAACCTTACCATATCCCTATCCTACGCCATCC
CCGTCCTAGTTGCC.. Flycatcher
ACCCACCTCATTATATCACTATCCTATGCCGTACCCATCCTAATTGCT..
ZebraFinch ATTAACCTCATCATAGCCCTCTCCTATGCCCTCC
CAATCCTGATCGCA.. Rook
GTCAACCTCATTATAGCACTTTCTTATGCTATCCCTATTCTAATCGCC..
Oystercatcher ACCTATCTCATTATATCCCTATCCTATGCCATCC
CAATCCTGATCGCA.. Turnstone
ACCTACTTCATCATATCCCTATCCTATGCAATCCCAATTCTAATTGCA..
Penguin GCTCACTTAGCCATATCCCTATCCTATGCCATCC
CAATCCTCATTGCA.. Albatross
ACCTATCTTGTCATGTCCCTATCATATGCCATCCCAATCCTAATCGCC..

End
13
Tree reconstruction
Type of data
Distances Discrete (e.g. nucleotides)
Information loss often statistical
power loss
Unweighted pair group method with arithmetic
means (UPGMA)
Clustering algorithm
Neighbour-joining (NJ)
Tree-building method
Slower Faster
Maximum parsimony (MP)
Optimality criterion
Minimum evolution (ME)
Maximum likelihood (ML)
14
3. Finding the best tree (search strategy)
Number of possible trees (where n is the number
of taxa)
Unrooted trees (2n-5) ? (2n-7) ? 3?1 Rooted
trees (2n-3) ? (2n-5) ? 3?1
For the 11-taxon cat phylogeny Unrooted 17 ? 5
? 13 ? 11 ? 9 ? 7 ? 5 ? 3 ? 1 34,459,425 Rooted
Unrooted ? (2n-3) 654,729,075
An exhaustive search will examine all trees, but
is not practical for n gt 12
15
Reducing the time for searching tree space
Heuristic search
Find an initial tree, and move within near-by
tree-space, discarding worse alternatives
Only a small amount of tree-space is searched and
there is no guarantee of finding the optimal tree
- can be trapped in local maxima
Global optima
X
Local optima
X
X
Starting point
16
Branch and Bound search
As trees are built and branches added, if the
addition of a taxon to a particular branch
results in a tree-length greater than a
previously determined upper bound for the tree,
then this topology and all those derived from it
are ignored and the search continues with a new
placement for that taxon Branch and bound
guarantees finding globally optimal trees
Global optima
X
Local optima
X
X
Starting point
17
4. Defining the best tree (optimality criteria)
Distance methods
Absolute distance matrix 1
2 3 4 5 6 7 8 9 10 11
1 Mongoose - 2 Hyena 156 - 3
Sabretooth 207 147 - 4 Am.Cheetah 192
140 159 - 5 Lion 186 134 148 131
- 6 Tiger 160 143 132 111 64 -
7 Puma 194 139 162 70 124 100 -
8 House.Cat 206 133 163 124 118 100 117
- 9 Cheetah 192 139 162 108 127 109
96 110 - 10 Ocelot 206 123 165 116
116 98 111 98 113 - 11 Jaguarundi 204
147 177 123 143 121 101 119 128 131 -
18
Early phenetics (distance/similarity) studies
would note that taxon X and taxon Z are the most
similar
Taxon Y TCAGCTA Taxon X ACATGTG Taxon Z
ACGTCAG
XZ 3 difference YZ 5 differences XY 4
differences
Taxon X
Taxon Z
Taxon Y
19
Cladistic methods, rather than being concerned
with similarity, are concerned with the nature of
changes (apomorphies)
synapomorphy
Taxon Y TC A GCTA Taxon X AC A TGTG Taxon Z
AC G TCAG Outgroup AA G TCTG
autapomorphy
symplesiomorphy
Synapomorphies are shared derived characters and
so are considered to define clades (relationship
groupings)
20
Maximum Parsimony chooses the tree topology that
minimises the number of changes required
Character 3 changes G to A
Homoplasy
synapomorphy
Taxon X
Taxon Z


Taxon Y
Taxon X

Taxon Z
Taxon Y
Outgroup
Outgroup
8 step sub-optimal phenetic tree
7 steps (MP tree)
21
Maximum Likelihood The explanation that makes
the observed outcome the most likely
L Pr(DH)
Probability of the data, given an hypothesis The
hypothesis is a tree topology, its branch-lengths
and a model under which the data evolved
First use in phylogenetics Cavalli-Sforza and
Edwards (1967) for gene frequency data
Felsenstein (1981) for DNA sequences
22
A A
Model of rate change e.g. Kishino-Hasegawa
(1985) 4 base frequencies, transition/transversio
n (ti/tv ratio)
0.5 substitutions per site
0.5
0.6
0.4
0.4
A A
A A
A A
A A
A A
A A
A G C T A
G
G G
A A A A C
C
G G
G G
G G
G G
G G
G G
A A
A A
A A
A A
A A
Sum the probabilities for each of the 16 internal
node combinations to get the likelihood for this
single nucleotide site
C T A G C

C C T T T

G G
G G
G G
G G
G G
A A
A A
A A
A A
A A
T A G C T
T G G G G
G G
G G
G G
G G
G G
23
The likelihood of a tree is the product of the
site likelihoods. Taken as natural logs, the site
likelihoods can be summed to give the log
likelihood The tree with the highest lnL is
the ML tree
  • ML is computationally intensive (slow)
  • If branch-lengths are long, such that
    substitutions occur multiple times along the same
    branch for the same site, ML will be more
    consistent than MP if the evolutionary process
    is sufficiently well modelled.

24
Bayesian Inference The explanation with the
highest posterior probability
Prior probability, the probability of the
hypothesis on previous knowledge
Bayes Theorem
Likelihood function, probability of the data
given the hypothesis
Pr(H) Pr(D H)
Pr(H D)
Pr(D)
Posterior probability, the probability of the
hypothesis given the data
Unconditional probability of the data, a
normalizing constant ensuring the posterior
probabilities sum to 1.00
First use in phylogenetics Li (1996, PhD
thesis), Rannala and Yang (1996)
25
  • Bayesian inference in phylogenetics is
    essentially a likelihood method, but may more
    closely reflect the way humans think.
  • It is Informed by prior knowledge (e.g. fossil
    data)
  • emphasis is placed on Pr(H D) instead of Pr(D H)

Markov chain Monte Carlo (MCMC) is used to
approximate Bayesian posterior probabilities
(BPP) over 1,000s 1,000,000s of generations
New state rejected
New state accepted
Tree 1
Tree 2
BPP(tree 1) 4/6
Tree 3
Generation 1 2 3
4 5 6
26
Posterior probabilities are integrated over all
trees in the posterior distribution providing
density distributions rather than the
optimization of likelihood
(Flat prior)
0 0.5 1.0
0 0.5 1.0
Prior for a parameter value (e.g. proportion of
invariant sites)
Posterior for the proportion of invariant sites
27
The American cheetah is related to the puma -
morphological similarity to the cheetah is
convergence
Mongoose
Mongoose
Hyena
Hyena
Sabretooth
Sabretooth
Am.Cheetah
Am.Cheetah
American felids
Puma
Puma
Jaguarundi
Jaguarundi
Cheetah
Cheetah
Cat
Cat
Ocelot
Ocelot
Lion
Lion
0.05 substitutions/site
Tiger
Tiger
Maximum parsimony and neighbour-joining
(distance) cladogram
Maximum likelihood and Bayesian inference
phylogram
28
Applications
The tree of life and inferring our origins
29
146 gene phylogeny Delsuc et al. (Nature,
2006) Little evidence from fossils
30
Identifying selection
ACA GAG CGC Threonine - Glutamic acid -
Arginine ACG GAG AGC Threonine - Glutamic
acid - Serine
Decreased dN/dS suggests purifying selection
Synonymous (S)
non-synonymous (N) substitutions
The dN/dS ratio can be estimated along branches
of phylogenetic trees (e.g. Guindon et al. PNAS,
2004) Here dN/dS is indicated by branch width
Increased dN/dS suggests Positive selection
31
Cohen (Molec. Biol. Evol., 2002) found increased
positive selection at binding sites in the MHC
proteins of estuarine fish Fundulus heteroclitus
populations subject to severe chemical pollution.
Non-synonymous/synonymous ratios for peptide
binding regions and non-peptide binding regions
MHC (Major histocompatibility complex) binds
antigens and presents them to T-cells as part of
the immune response.
Positive selection at binding sites provides high
MHC variability with which to confront new
pathogenic threats.
32
Fish from the Hot spot and Gloucester populations
are genetically adapted to severe chemical
pollution and show novel patterns of DNA
substitution for Mhc class II B locus including
strong signals of positive selection at inferred
antigen-binding sites
Mhc class II B with inferred locations of
population-specific amino acid changes for
Gloucester and Hot Spot.
33
Stanhope et al. (Infect. Genet. Evol., 2004)
Severe Acute Respiratory Syndrome coronavirus
(SARS-CoV) has a recombinant history with
lineages of types I and III coronavirus
34
Using more sophisticated models of sequence
evolution, Holmes and Rambaut (Phil. Trans. Roy.
Soc. B, 2004) could not reject a single history
across the SARS genome
II
I
III
SARS-TOR2
Understanding sequence evolution and the biases
that may result from models (which necessarily
are simplifications) are of vital importance in
phylogenetic inference
35
  • Host-Parasite coevolution/co-speciation
  • Etherington et al. (J. Gen Virol, 2006)

Carnivoran strains
Artiodactyl strains
Caliciviruses infect diverse mammalian hosts and
include Norovirus, the major cause of food-borne
viral gastroenteritis in humans. Host switching
by caliciviruses is rare, although pigs have
strains from co-speciation (artiodactyl strain)
and host switching (carnivoran strain).
36
Fig (Ficus) and fig wasp mutualism is reflected
by co-speciation patterns Machado et al. (PNAS,
2006)
37
Biogeography vicariance and dispersal
38
Most frequent Area cladoragms mapping taxa onto
landmasses
Many plants follows wind dispersal patterns
Many land animals follows continental break-up
Africa
S. South America
Australia
midges
New Zealand
Southern beech
Cushion herb
Marsupial mammals
From SanMartin and Ronquist (Syst. Biol. 2004)
39
Conservation genetics Amur leopard (Panthera
pardus orientalis)
Relict population of 25-40 individuals in the
Russian Far East.
  • Nuclear microsatellites and mtDNA Uphyrkina et
    al. (J. Hered., 2002)
  • validates subspecies distinctiveness
  • extreme reduction in genetic diversity in the
    wild
  • captive population genetically mixed with the
    Chinese subspecies

40
Macroevolutionary inference
Cretaceous
Tertiary
65 Ma
Present
Does the 65 Ma meteor impact (Alvarez et al.
Science, 1980) fully explain the great reptile
extinction and the rise of modern birds and
mammals?
41
Molecular clock DNA/protein divergence between
organisms is a function of time
K/T boundary
71-68 Ma
144-83 Ma
83-71 Ma
68-65 Ma
95Ma 65Ma
42
Megafaunal extinctions (human induced or climate
change)
Macrauchenia
Bison (Lascaux, France)
43
Arrival of humans in North America
The distribution of coalescence events over time
on the tree allow inference of relative
population size
Last glacial maximum
Write a Comment
User Comments (0)
About PowerShow.com