Comparison of methods - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Comparison of methods

Description:

Closely related to ML methods, differing only in ... Bayes described this in 1790, and controversy among statisticians over its ... Nat Rev Genet 4, 275-284. ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 24
Provided by: guille3
Category:

less

Transcript and Presenter's Notes

Title: Comparison of methods


1
Comparison of methods
From Holder Lewis 2003
2
Bayesian Inference of Phylogenies
  • Closely related to ML methods, differing only in
    the use of a PRIOR DISTRIBUTION (which would
    typically be a tree)
  • Use of a prior enables us to interpret the result
    as the probability distribution of the tree given
    the data
  • Bayes described this in 1790, and controversy
    among statisticians over its appropriateness is
    almost that old
  • Recently, the introduction of Markov Chain Monte
    Carlos (MCMC) methods has given a new impetus to
    Bayesian inference

3
Simple example of Bayesian inference
Box with 90 fair and 10 biased dice
Take a die at random from the box and roll it
twice get a 4 and a 6 What is the probability
that the die is biased?
4
A Bayesian analysis combines ones prior beliefs
about the probability of a hypothesis with its
likelihood
Likelihood assuming a fair die
2.78
Likelihood assuming a biased die
5.44
Probability of observing the data is 1.96 times
greater under the hypothesis that the die is
biased
5
Bayesian inference is based upon the POSTERIOR
probability of a hypothesis
The posterior probability that the die is biased
can be obtained using Bayess formula
Our opinion of the die being biased changed from
0.1 to 0.179 after observing a 4 and a 6
6
Bayesian inference of phylogeny
Based upon the posterior probability of a
phylogenetic tree ( )
Posterior probability of the ith tree, can be
interpreted as the probability that this tree is
the correct tree given the data
prior probability of the ith tree, typically
The summation in the denominator is over all B
trees possible for s species (taxa) (in other
words, our prior expectation is that all possible
trees are equally probable)
7
Bayesian inference of phylogeny
Based upon the posterior probability of a
phylogenetic tree ( )
Likelihood
Posterior probability of the ith tree, can be
interpreted as the probability that this tree is
the correct tree given the data
prior probability of the ith tree, typically
The summation in the denominator is over all B
trees possible for s species (taxa)
8
Markov Chain Monte Carlo
Typically, the posterior probability
cannot be calculated analytically. However,
the posterior probability of phylogenies can be
approximated by sampling trees from the posterior
probability distribution. MCMC can be used to
sample phylogenies (and parameter values)
according to their posterior probabilities Goal
of the MCMC wander randomly in tree (and
parameter values) space such that it will settle
down into an equilibrium distribution of trees
(and parameter values) that has the desired
distribution (i.e., Bayesian posterior) Let
be a specific tree,
combination of branch lengths, substitution
parameters, and gamma shape parameter The MH
(Metropolis-Hastings) algorithm is an MCMC
algorithm that has been successfully used to
approximate the posterior probability of trees.
9
Steps in MH Markov Chain Monte Carlo
  • MCMC takes a series of steps that form a
    conceptual chain
  • At each step, a new location in parameter space
    is proposed as the next link in the chain
  • This proposed location is usually similar to the
    present one because it is generated by the random
    perturbation of a few of the parameters in the
    present state of the chain
  • The relative posterior-probability density at the
    new location is calculated (ratio of
    probabilities between new and current state)
  • If the new location has a higher probability
    (ratio gt 1), the move is accepted (and the
    proposed location becomes the next link in the
    chain) and the cycle is repeated
  • If the new location has a lower probability
    (ratio lt 1), the move will be accepted only a
    proportion of the time
  • In short, small steps downward are accepted
    often, whereas big leaps downward are discouraged
  • If the proposed location is rejected, the present
    location is added as the next link in the chain
    (so, the last two links in the chain will be
    identical) and the cycle is repeated

10
MCMC continued
  • This process of proposing a new state,
    calculating the acceptance probability, and
    accepting or rejecting the move is repeated
    thousands of times
  • The sequence of states visited (and sampled)
    forms the Markov Chain
  • The chain tends to stay in regions of high
    posterior probability from these regions, almost
    all proposed moves are downhill and are rarely
    accepted
  • By design, the proportion of time that the chain
    spends in any region of parameter space can be
    used as an estimate of the posterior probability
    of that region
  • By creating long chains, this method of
    estimation can be made arbitrarily accurate

11
Summarizing the posterior
  • The chain is sampled after it reaches
    stationarity and the sampled trees represent
    the posterior probability distribution (in most
    cases such as MrBayes, it is sampled before, but
    the samples obtained prior to reaching
    stationarity are discarded for the estimation of
    posterior probabilities)
  • Discarded samples obtained prior to convergence
    on stable likelihoods or stationarity are known
    as the burnin

12
The proportion of times a single tree is found
among these samples is the posterior prob of that
tree A majority rule consensus can be derived
from the sample and the proportions obtained for
each clade are an approximation of the posterior
probability of the clades
13
Burn-in discarded samples
The proportion of times a single tree is found
among these samples is the posterior prob of that
tree A majority rule consensus can be derived
from the sample and the proportions obtained for
each clade are an approximation of the posterior
probability of the clades
14
x
x
Joint estimation likelihood Marginal estimation
Bayesian
From Holder Lewis 2003
15
Markov-chain Monte Carlo (used in Bayesian
analysis)
Bootstrap (non-parametric)
Or new branch lengths or new model parameters
16
Bootstrap versus MCMC
Compared to bootstrap, MCMC yields a much larger
sample of trees in the same computational time,
because it produces one tree for every proposal
cycle versus one tree per tree search (which
assesses numerous alternative trees) in the
traditional approach. However, the sample of
trees produced by MCMC is highly auto-correlated.
As a result, millions of cycles through MCMC are
usually required,whereas many fewer (of the order
of 1,000) bootstrap replicates are sufficient for
most problems.
17
Prior Distribution
  • Prior probabilities convey the scientists
    beliefs before having seen the data
  • In most applications researchers specify prior
    probability distributions that are largely
    uninformative, so that most of the differences in
    the posterior distribution are attributable to
    likelihood differences
  • One way of doing this is to specify a uniform (or
    flat) prior in which every possible parameter
    value has the same probability a priori
  • There are many issues with specifying priors
  • a flat prior for one parameter may result in a
    prior that is not flat for another parameter (see
    textbook)
  • Most importantly if it is too restrictive (i.e.,
    does not include realistic values) then
    inferences could be wrong

18
Flat prior distribution of trees
Huelsenbeck et al. 2001
19
Implementing a Bayesian analysis Specifying
priors for parameters
  • First, specify the substitution model (HKY, GTR,
    etc. with or without gamma, pinvar, site-specific
    rates,etc)

Model settings for partition 1 Parameter
Options Current Setting -----------------------
------------------------------------------- Nucmod
el 4by4/Doublet/Codon 4by4 Nst 1/2/6 6 Code
Universal/Vertmt/Mycoplasma/ Yeast/Ciliates/Metm
t Universal Ploidy Haploid/Diploid
Diploid Rates Equal/Gamma/Propinv/ Invgamma/Ad
gamma Invgamma Ngammacat ltnumbergt 4 Nbetacat
ltnumbergt 5 Omegavar Equal/Ny98/M3
Equal Covarion No/Yes No Coding
All/Variable/Noabsencesites/ Nopresencesites
All Parsmodel No/Yes No ---------------------
---------------------------------------------
20
Huelsenbeck et al. 2001
21
Specifying priors for parameters
  • Second, specify prior distribution for parameter
    values

Parameter Options Current Setting --------------
--------------------------------------------------
-- Tratiopr Beta/Fixed Beta(1.0,1.0) Revmatpr
Dirichlet/Fixed Dirichlet(1.0,1.0,1.0,1.0,1.0,1.
0) Statefreqpr Dirichlet/Fixed
Dirichlet(1.0,1.0,1.0,1.0) Ratepr
Fixed/VariableDirichlet Fixed Shapepr
Uniform/Exponential/Fixed Uniform(0.0,50.0) Ratec
orrpr Uniform/Fixed Uniform(-1.0,1.0) Pinvarpr
Uniform/Fixed Uniform(0.0,1.0) Topologypr
Uniform/Constraints Uniform Brlenspr
Unconstrained/Clock UnconstrainedExp(10.0) ----
--------------------------------------------------
------------
Flat Dirichlet all values 1 (appropriate if we
have no prior knowledge) (Not the same as a
Dirichlet of 100 each)
22
Setting up the analysis
Third, set up the conditions for the MCMC
Parameter Options Current Setting --------------
--------------------------------------------------
-------------------------- Seed ltnumbergt
1115403472 Swapseed ltnumbergt 1115403472 Ngen
ltnumbergt 1000000 Nruns ltnumbergt 2 Nchains
ltnumbergt 4 Temp ltnumbergt 0.200000 Samplefreq
ltnumbergt 100 Printfreq ltnumbergt 100 Burnin
ltnumbergt 0 Startingtree Random/User
Random Nperts ltnumbergt 0 Savebrlens Yes/No
Yes ---------------------------------------------
-----------------------------------------------
23
References
Bayesian inference chapter in Inferring
Phylogenies textbook. Holder M, Lewis PO (2003)
Phylogeny estimation traditional and Bayesian
approaches. Nat Rev Genet 4, 275-284.
Huelsenbeck JP, Ronquist R (2001) MRBAYES
Bayesian inference of phylogeny. Bioinformatics
17, 754-755. Huelsenbeck JP, Ronquist F, Nielsen
R, Bollback JP (2001) Bayesian inference of
phylogeny and its impact on evolutionary biology.
Science 294, 2310-2314.
Write a Comment
User Comments (0)
About PowerShow.com