Bayesian inference of phylogeny and evolution - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Bayesian inference of phylogeny and evolution

Description:

... the probability of the data (X) given the hypothesis, ... Model space for a data set with 100 taxa and 1000 sites. Strict clock. models. Standard non-clock ... – PowerPoint PPT presentation

Number of Views:156
Avg rating:3.0/5.0
Slides: 47
Provided by: fredrikr
Category:

less

Transcript and Presenter's Notes

Title: Bayesian inference of phylogeny and evolution


1
(No Transcript)
2
(No Transcript)
3
Models
  • Binary (Restriction) Model
  • Morphology (Standard) Model
  • Parsimony Model

4
Binary Model
  • Markov model for binary data (0-1,
    presence-absence, etc)
  • Typical example restriction sites
  • Peciularities
  • Stationary state frequencies fully determine
    substitution rates
  • There may be a coding bias

5
Instantaneous Rate Matrix
  • Characteristic Properties
  • Rates equal
  • Rate ratio ? 1
  • Stat. freq. of state 0 ?0 1/2
  • Stat. freq. of state 1 ?1 1/2
  • Base substitution rate ?
  • Scaling of Q
  • Rate measured in substitution units per time
    (not half-substitutions or anything else)
  • Time 1.0 means one expected substitution (per
    site)

6
(No Transcript)
7
No matter what the starting frequency of state 1,
the frequency will evolve towards a stationary
value, namely ?1
8
Coding Bias
  • Assume you are studying restriction site
    characters
  • States absence (0) and presence (1)
  • Clearly a rate bias
  • ?1 ltlt ?0
  • All-absence characters common
  • but cannot be observed!

9
How do we calculate the likelihood of data or,
more precisely, the probability of the data (X)
given the hypothesis, Pr(X?)?
0
1
0
1
10
Calculating Pr(X?)
Calculating L Pr(X?)
11
Unbiased case
Calculating L Pr(X?)
Biased case
Calculating L Pr(X?)
12
Why correct for coding bias?
  • If we do not correct for coding bias the branch
    lengths will be
  • overestimated
  • If branch lengths are biased, inference could be
    inconsistent even if model is correct
  • Most important to correct for small datasets
    (small numbers of terminals)

13
Morphological Models
  • Based on Lewis (2002 Syst. Bio.) with several
    extensions
  • Incomplete coding (coding bias)
  • Varying state space
  • Unordered and ordered characters
  • Transformation weighting for arbitrarily labeled
    states

14
Incomplete coding
All
15
Incomplete coding
Variable
16
Incomplete coding
Informative
17
Types of characters
  • A (All), V (Variable), I (Informative)

V
V
A
I
18
Conditional character probability
Cumulative character probability
0.0
1.0
?I
?V
?A
Conditional probability of one character xi given
that only informative characters are coded
19
Branch length estimates
All characters Assuming all
Variable chars Assuming variable
Informative chars Assuming informative
Inferred length
terminal internal
2000 chars
1176 chars
740 chars
True length
20
Branch length estimates
Variable characters Assuming all
Informative chars Assuming all
Informative chars Assuming variable
Inferred length
terminal internal
2000 chars
1176 chars
740 chars
True length
21
Transformation series
0
  • Unordered (Fitch)

2
1
Ordered (Wagner)
0
1
2
22
Probabilistic models
  • Unordered (M3u)

Ordered (M3o)
23
Probabilitic models
0
?
?
  • Unordered (M3u)

2
1
?
?
?
Ordered (M3o)
0
1
2
24
Transformation weighting
0
1
?1
?0
BUT State labels are typically
arbitrary SOLUTION Mixture model over ?
25
Simultaneous analysis of fossils and recent taxa
26
Simultaneous analysis of extant forms and fossils
  • Uncertainty in phylogeny of extant forms
  • Uncertainty in placement of fossils
  • Model distance fossils to ancestors
  • Model preservation probability

27
Example Bayesian analysis of directional
morphological evolution in the Hymenoptera
28
Character systems
  • Musculature present ? absent, separate ? fused
  • Wing veins present ? absent
  • Sclerites present ? absent, separate ? fused
  • Articulations / skeletal integration absent ?
    present

29
(No Transcript)
30
Equal frequencies
Starting state
Process
Probability of state 1
Probability of state 1
31
Asymmetric equilibrium
Starting state
Process
Probability of state 1
Probability of state 1
32
Driving process
Starting state
Process
Probability of state 1
Probability of state 1
33
Data
  • Morphological dataset on basal hymenopteran
    relationships from Vilhelmsen (2001, Zool. J.
    Linn. Soc.)
  • Divided into five partitions muscles, wing
    veins, sclerites, articulations and others
  • Four potentially asymmetric character types
    recoded into binary characters with predicted
    change from state 0 to state 1
  • Final dataset had 254 characters

34
Analysis
  • Character systems believed to show evolutionary
    asymmetry
  • Binary model with potential rate bias
  • Starting states allowed to be different from
    stationary states of the process
  • Other characters
  • Standard morphological model
  • No variation in rate bias across sites

35
Partitions
Total 254 chars
Muscles
Wing veins
Others
Sclerites
Articulations
36
Tree
91
62
80
98
71
90
98
100
100
85
93
69
100
78
99
99
94
100
78
100
100
100
78
85
62
100
83
100
91
100
51
94
100
100
100
100
93
37
Muscles
Starting state
Process
Probability of state 1
Probability of state 1
38
Wing veins
Starting state
Process
Probability of state 1
Probability of state 1
39
Sclerites
Starting state
Process
Probability of state 1
Probability of state 1
40
Articulations
Starting state
Process
Probability of state 1
Probability of state 1
41
Parsimony Model
  • Under certain types of stochastic models, ML
    always chooses the same tree as parsimony
  • The no-common mechanism (NCM) model (Tuffley and
    Steel 1997 Bull Math. Biol.) is one. It forms
    the basis for the so-called Parsimony Model in
    MrBayes.
  • In NCM, every branch length is estimated
    separately for each character. The ML branch
    length is infinity if the character changes, 0 if
    the character does not change in the most
    parsimonious reconstruction.
  • The number of parameters grows linearly with the
    number of characters in NCM. For instance, a
    100-taxon tree with 1,000 characters will have
    roughly 200,000 branch length parameters.
  • Because of the large number of parameters, the
    parsimony model is a very unparsimonious
    statistical model.
  • Like parsimony, NCM is statistically inconsistent.

42
Parsimony Model Misconceptions
  • The parsimony model is NOT the default model used
    by MrBayes to model morphological (standard)
    characters
  • The parsimony model can be used for any type of
    character in MrBayes. By default it is not used
    at all. You specifically have to request it if
    you want to use it.
  • The parsimony model, as implemented in MrBayes,
    is not a true Bayesian model because it uses the
    maximum likelihood branch length instead of
    integrating out branch lengths against a prior.
    The parsimony model implementation is a so-called
    Empirical Bayes approach.
  • We do not recommend the Parsimony Model for use
    in any standard phylogenetic analysis.

43
Model space for a data set with 100 taxa and 1000
sites
Parsimony-like models
Standard non-clock models
Goldman
GTRI?
JC
UCM
NCM
parameters
Strict clock models
Super model
  • Super model could potentially account for
  • Evolution at the codon, amino acid and
    nucleotide levels
  • Insertion and deletion
  • Correlation across sites
  • Process heterogeneity across sites and across
    tree
  • Rate heterogeneity across tree
  • 3D structure

44
Mapping characters onto phylogenies
45
Mapping Uncertainty
parsimony
ML
Bayesian
46
Phylogenetic and Mapping Uncertainty
Write a Comment
User Comments (0)
About PowerShow.com