Title: Comparison of methods
1Comparison of methods
From Holder Lewis 2003
2Bayesian Inference of Phylogenies
- Closely related to ML methods, differing only in
the use of a PRIOR DISTRIBUTION (which would
typically be a tree) - Use of a prior enables us to interpret the result
as the probability distribution of the tree given
the data - Bayes described this in 1790, and controversy
among statisticians over its appropriateness is
almost that old - Recently, the introduction of Markov Chain Monte
Carlos (MCMC) methods has given a new impetus to
Bayesian inference
3Simple example of Bayesian inference
Box with 90 fair and 10 biased dice
Take a die at random from the box and roll it
twice get a 4 and a 6 What is the probability
that the die is biased?
4A Bayesian analysis combines ones prior beliefs
about the probability of a hypothesis with its
likelihood
Likelihood assuming a fair die
2.78
Likelihood assuming a biased die
5.44
Probability of observing the data is 1.96 times
greater under the hypothesis that the die is
biased
5Bayesian inference is based upon the POSTERIOR
probability of a hypothesis
The posterior probability that the die is biased
can be obtained using Bayess formula
Our opinion of the die being biased changed from
0.1 to 0.179 after observing a 4 and a 6
6Bayesian inference of phylogeny
Based upon the posterior probability of a
phylogenetic tree ( )
Posterior probability of the ith tree, can be
interpreted as the probability that this tree is
the correct tree given the data
prior probability of the ith tree, typically
The summation in the denominator is over all B
trees possible for s species (taxa) (in other
words, our prior expectation is that all possible
trees are equally probable)
7Bayesian inference of phylogeny
Based upon the posterior probability of a
phylogenetic tree ( )
Likelihood
Posterior probability of the ith tree, can be
interpreted as the probability that this tree is
the correct tree given the data
prior probability of the ith tree, typically
The summation in the denominator is over all B
trees possible for s species (taxa)
8Markov Chain Monte Carlo
Typically, the posterior probability
cannot be calculated analytically. However,
the posterior probability of phylogenies can be
approximated by sampling trees from the posterior
probability distribution. MCMC can be used to
sample phylogenies (and parameter values)
according to their posterior probabilities Goal
of the MCMC wander randomly in tree (and
parameter values) space such that it will settle
down into an equilibrium distribution of trees
(and parameter values) that has the desired
distribution (i.e., Bayesian posterior) Let
be a specific tree,
combination of branch lengths, substitution
parameters, and gamma shape parameter The MH
(Metropolis-Hastings) algorithm is an MCMC
algorithm that has been successfully used to
approximate the posterior probability of trees.
9Steps in MH Markov Chain Monte Carlo
- MCMC takes a series of steps that form a
conceptual chain - At each step, a new location in parameter space
is proposed as the next link in the chain - This proposed location is usually similar to the
present one because it is generated by the random
perturbation of a few of the parameters in the
present state of the chain - The relative posterior-probability density at the
new location is calculated (ratio of
probabilities between new and current state) - If the new location has a higher probability
(ratio gt 1), the move is accepted (and the
proposed location becomes the next link in the
chain) and the cycle is repeated - If the new location has a lower probability
(ratio lt 1), the move will be accepted only a
proportion of the time - In short, small steps downward are accepted
often, whereas big leaps downward are discouraged - If the proposed location is rejected, the present
location is added as the next link in the chain
(so, the last two links in the chain will be
identical) and the cycle is repeated
10MCMC continued
- This process of proposing a new state,
calculating the acceptance probability, and
accepting or rejecting the move is repeated
thousands of times - The sequence of states visited (and sampled)
forms the Markov Chain - The chain tends to stay in regions of high
posterior probability from these regions, almost
all proposed moves are downhill and are rarely
accepted - By design, the proportion of time that the chain
spends in any region of parameter space can be
used as an estimate of the posterior probability
of that region - By creating long chains, this method of
estimation can be made arbitrarily accurate
11Summarizing the posterior
- The chain is sampled after it reaches
stationarity and the sampled trees represent
the posterior probability distribution (in most
cases such as MrBayes, it is sampled before, but
the samples obtained prior to reaching
stationarity are discarded for the estimation of
posterior probabilities) - Discarded samples obtained prior to convergence
on stable likelihoods or stationarity are known
as the burnin
12The proportion of times a single tree is found
among these samples is the posterior prob of that
tree A majority rule consensus can be derived
from the sample and the proportions obtained for
each clade are an approximation of the posterior
probability of the clades
13Burn-in discarded samples
The proportion of times a single tree is found
among these samples is the posterior prob of that
tree A majority rule consensus can be derived
from the sample and the proportions obtained for
each clade are an approximation of the posterior
probability of the clades
14x
x
Joint estimation likelihood Marginal estimation
Bayesian
From Holder Lewis 2003
15Markov-chain Monte Carlo (used in Bayesian
analysis)
Bootstrap (non-parametric)
Or new branch lengths or new model parameters
16Bootstrap versus MCMC
Compared to bootstrap, MCMC yields a much larger
sample of trees in the same computational time,
because it produces one tree for every proposal
cycle versus one tree per tree search (which
assesses numerous alternative trees) in the
traditional approach. However, the sample of
trees produced by MCMC is highly auto-correlated.
As a result, millions of cycles through MCMC are
usually required,whereas many fewer (of the order
of 1,000) bootstrap replicates are sufficient for
most problems.
17Prior Distribution
- Prior probabilities convey the scientists
beliefs before having seen the data - In most applications researchers specify prior
probability distributions that are largely
uninformative, so that most of the differences in
the posterior distribution are attributable to
likelihood differences - One way of doing this is to specify a uniform (or
flat) prior in which every possible parameter
value has the same probability a priori - There are many issues with specifying priors
- a flat prior for one parameter may result in a
prior that is not flat for another parameter (see
textbook) - Most importantly if it is too restrictive (i.e.,
does not include realistic values) then
inferences could be wrong
18Flat prior distribution of trees
Huelsenbeck et al. 2001
19Implementing a Bayesian analysis Specifying
priors for parameters
- First, specify the substitution model (HKY, GTR,
etc. with or without gamma, pinvar, site-specific
rates,etc)
Model settings for partition 1 Parameter
Options Current Setting -----------------------
------------------------------------------- Nucmod
el 4by4/Doublet/Codon 4by4 Nst 1/2/6 6 Code
Universal/Vertmt/Mycoplasma/ Yeast/Ciliates/Metm
t Universal Ploidy Haploid/Diploid
Diploid Rates Equal/Gamma/Propinv/ Invgamma/Ad
gamma Invgamma Ngammacat ltnumbergt 4 Nbetacat
ltnumbergt 5 Omegavar Equal/Ny98/M3
Equal Covarion No/Yes No Coding
All/Variable/Noabsencesites/ Nopresencesites
All Parsmodel No/Yes No ---------------------
---------------------------------------------
20Huelsenbeck et al. 2001
21Specifying priors for parameters
- Second, specify prior distribution for parameter
values
Parameter Options Current Setting --------------
--------------------------------------------------
-- Tratiopr Beta/Fixed Beta(1.0,1.0) Revmatpr
Dirichlet/Fixed Dirichlet(1.0,1.0,1.0,1.0,1.0,1.
0) Statefreqpr Dirichlet/Fixed
Dirichlet(1.0,1.0,1.0,1.0) Ratepr
Fixed/VariableDirichlet Fixed Shapepr
Uniform/Exponential/Fixed Uniform(0.0,50.0) Ratec
orrpr Uniform/Fixed Uniform(-1.0,1.0) Pinvarpr
Uniform/Fixed Uniform(0.0,1.0) Topologypr
Uniform/Constraints Uniform Brlenspr
Unconstrained/Clock UnconstrainedExp(10.0) ----
--------------------------------------------------
------------
Flat Dirichlet all values 1 (appropriate if we
have no prior knowledge) (Not the same as a
Dirichlet of 100 each)
22Setting up the analysis
Third, set up the conditions for the MCMC
Parameter Options Current Setting --------------
--------------------------------------------------
-------------------------- Seed ltnumbergt
1115403472 Swapseed ltnumbergt 1115403472 Ngen
ltnumbergt 1000000 Nruns ltnumbergt 2 Nchains
ltnumbergt 4 Temp ltnumbergt 0.200000 Samplefreq
ltnumbergt 100 Printfreq ltnumbergt 100 Burnin
ltnumbergt 0 Startingtree Random/User
Random Nperts ltnumbergt 0 Savebrlens Yes/No
Yes ---------------------------------------------
-----------------------------------------------
23References
Bayesian inference chapter in Inferring
Phylogenies textbook. Holder M, Lewis PO (2003)
Phylogeny estimation traditional and Bayesian
approaches. Nat Rev Genet 4, 275-284.
Huelsenbeck JP, Ronquist R (2001) MRBAYES
Bayesian inference of phylogeny. Bioinformatics
17, 754-755. Huelsenbeck JP, Ronquist F, Nielsen
R, Bollback JP (2001) Bayesian inference of
phylogeny and its impact on evolutionary biology.
Science 294, 2310-2314.