Title: Bayesian inference of phylogeny and evolution
1(No Transcript)
2(No Transcript)
3Models
- Binary (Restriction) Model
- Morphology (Standard) Model
- Parsimony Model
4Binary Model
- Markov model for binary data (0-1,
presence-absence, etc) - Typical example restriction sites
- Peciularities
- Stationary state frequencies fully determine
substitution rates - There may be a coding bias
5Instantaneous Rate Matrix
- Characteristic Properties
- Rates equal
- Rate ratio ? 1
- Stat. freq. of state 0 ?0 1/2
- Stat. freq. of state 1 ?1 1/2
- Base substitution rate ?
- Scaling of Q
- Rate measured in substitution units per time
(not half-substitutions or anything else) - Time 1.0 means one expected substitution (per
site)
6(No Transcript)
7No matter what the starting frequency of state 1,
the frequency will evolve towards a stationary
value, namely ?1
8Coding Bias
- Assume you are studying restriction site
characters - States absence (0) and presence (1)
- Clearly a rate bias
- ?1 ltlt ?0
- All-absence characters common
- but cannot be observed!
9How do we calculate the likelihood of data or,
more precisely, the probability of the data (X)
given the hypothesis, Pr(X?)?
0
1
0
1
10Calculating Pr(X?)
Calculating L Pr(X?)
11Unbiased case
Calculating L Pr(X?)
Biased case
Calculating L Pr(X?)
12Why correct for coding bias?
- If we do not correct for coding bias the branch
lengths will be - overestimated
- If branch lengths are biased, inference could be
inconsistent even if model is correct - Most important to correct for small datasets
(small numbers of terminals)
13Morphological Models
- Based on Lewis (2002 Syst. Bio.) with several
extensions - Incomplete coding (coding bias)
- Varying state space
- Unordered and ordered characters
- Transformation weighting for arbitrarily labeled
states
14Incomplete coding
All
15Incomplete coding
Variable
16Incomplete coding
Informative
17Types of characters
- A (All), V (Variable), I (Informative)
V
V
A
I
18Conditional character probability
Cumulative character probability
0.0
1.0
?I
?V
?A
Conditional probability of one character xi given
that only informative characters are coded
19Branch length estimates
All characters Assuming all
Variable chars Assuming variable
Informative chars Assuming informative
Inferred length
terminal internal
2000 chars
1176 chars
740 chars
True length
20Branch length estimates
Variable characters Assuming all
Informative chars Assuming all
Informative chars Assuming variable
Inferred length
terminal internal
2000 chars
1176 chars
740 chars
True length
21Transformation series
0
2
1
Ordered (Wagner)
0
1
2
22Probabilistic models
Ordered (M3o)
23Probabilitic models
0
?
?
2
1
?
?
?
Ordered (M3o)
0
1
2
24Transformation weighting
0
1
?1
?0
BUT State labels are typically
arbitrary SOLUTION Mixture model over ?
25Simultaneous analysis of fossils and recent taxa
26Simultaneous analysis of extant forms and fossils
- Uncertainty in phylogeny of extant forms
- Uncertainty in placement of fossils
- Model distance fossils to ancestors
- Model preservation probability
27Example Bayesian analysis of directional
morphological evolution in the Hymenoptera
28Character systems
- Musculature present ? absent, separate ? fused
- Wing veins present ? absent
- Sclerites present ? absent, separate ? fused
- Articulations / skeletal integration absent ?
present
29(No Transcript)
30Equal frequencies
Starting state
Process
Probability of state 1
Probability of state 1
31Asymmetric equilibrium
Starting state
Process
Probability of state 1
Probability of state 1
32Driving process
Starting state
Process
Probability of state 1
Probability of state 1
33Data
- Morphological dataset on basal hymenopteran
relationships from Vilhelmsen (2001, Zool. J.
Linn. Soc.) - Divided into five partitions muscles, wing
veins, sclerites, articulations and others - Four potentially asymmetric character types
recoded into binary characters with predicted
change from state 0 to state 1 - Final dataset had 254 characters
34Analysis
- Character systems believed to show evolutionary
asymmetry - Binary model with potential rate bias
- Starting states allowed to be different from
stationary states of the process - Other characters
- Standard morphological model
- No variation in rate bias across sites
35Partitions
Total 254 chars
Muscles
Wing veins
Others
Sclerites
Articulations
36Tree
91
62
80
98
71
90
98
100
100
85
93
69
100
78
99
99
94
100
78
100
100
100
78
85
62
100
83
100
91
100
51
94
100
100
100
100
93
37Muscles
Starting state
Process
Probability of state 1
Probability of state 1
38Wing veins
Starting state
Process
Probability of state 1
Probability of state 1
39Sclerites
Starting state
Process
Probability of state 1
Probability of state 1
40Articulations
Starting state
Process
Probability of state 1
Probability of state 1
41Parsimony Model
- Under certain types of stochastic models, ML
always chooses the same tree as parsimony - The no-common mechanism (NCM) model (Tuffley and
Steel 1997 Bull Math. Biol.) is one. It forms
the basis for the so-called Parsimony Model in
MrBayes. - In NCM, every branch length is estimated
separately for each character. The ML branch
length is infinity if the character changes, 0 if
the character does not change in the most
parsimonious reconstruction. - The number of parameters grows linearly with the
number of characters in NCM. For instance, a
100-taxon tree with 1,000 characters will have
roughly 200,000 branch length parameters. - Because of the large number of parameters, the
parsimony model is a very unparsimonious
statistical model. - Like parsimony, NCM is statistically inconsistent.
42Parsimony Model Misconceptions
- The parsimony model is NOT the default model used
by MrBayes to model morphological (standard)
characters - The parsimony model can be used for any type of
character in MrBayes. By default it is not used
at all. You specifically have to request it if
you want to use it. - The parsimony model, as implemented in MrBayes,
is not a true Bayesian model because it uses the
maximum likelihood branch length instead of
integrating out branch lengths against a prior.
The parsimony model implementation is a so-called
Empirical Bayes approach. - We do not recommend the Parsimony Model for use
in any standard phylogenetic analysis.
43Model space for a data set with 100 taxa and 1000
sites
Parsimony-like models
Standard non-clock models
Goldman
GTRI?
JC
UCM
NCM
parameters
Strict clock models
Super model
- Super model could potentially account for
- Evolution at the codon, amino acid and
nucleotide levels - Insertion and deletion
- Correlation across sites
- Process heterogeneity across sites and across
tree - Rate heterogeneity across tree
- 3D structure
44Mapping characters onto phylogenies
45Mapping Uncertainty
parsimony
ML
Bayesian
46Phylogenetic and Mapping Uncertainty