Assessing reliability of results measures of clade support - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Assessing reliability of results measures of clade support

Description:

Likelihood only: Parametric bootstrap (simulation of data) ... Likelihood-Based Tests of Topologies in Phylogenetics. Systematic Biology 49, (652-670) ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 34
Provided by: jimwo1
Category:

less

Transcript and Presenter's Notes

Title: Assessing reliability of results measures of clade support


1
Assessing reliability of results(measures of
clade support)
2
(No Transcript)
3
Measures of clade support
  • All methods
  • Number of unambiguous features (e.g.,
    substitutions with sequence data) supporting
    monophyly
  • Resampling
  • Bootstrap (widely used)
  • Jackknife (not as widely used)
  • Parsimony only
  • Bremer Support (Decay Index)
  • Permutation
  • Permutation Tail Probability (PTP)
  • Likelihood only
  • Parametric bootstrap (simulation of data)
  • Bayesian posterior probabilities (next weeks
    topic)

4
Bootstrap jackknife
  • Jackknife
  • Drop one observation at a time and calculate the
    estimate each time
  • Bootstrap (Efron 1979)
  • Resampling with replacement to make fictional
    sample of the same sample size

5
The variability of the distribution of these
estimates could be estimated if - If we knew
the family of the distribution (normal, binomial,
etc.) - If the estimator were mathematically
tractableIf neither of these is the case, but
our sample size is sufficiently large, we can use
the empirical distribution of our sample to
estimate the true distribution F(?)To find out
the variation of our estimate of ?, we can draw
new datasets (with replacement) from our sample,
of the same size as the original dataset, and
analyze them in the same way
F(?)
(from Felsenstein, 2004)
6
(from Felsenstein, 2004)
7
Goal Mimic the variability that we would get if
we could sample more data sets from the
underlying true distributionThe data should be
a series of independently sample points.In
phylogenetics, data are matrix of taxa x
characters -Taxa are not independent -Characters
are a better candidate for independent points
F(?)
(from Felsenstein, 2004)
8
Bootstrap in Phylogenetics
  • Resample characters with replacement
  • Conduct new analyses with resampled data
  • Produce majority-rule consensus tree from
    pseudo-replicates

9
(from Felsenstein, 2004)
10
  • Resample characters with replacement
  • Conduct new analyses with resampled data
  • Produce majority-rule consensus tree from
    pseudo-replicates

11
Bootstrapping assumes characters are I.I.D.
  • Independent
  • Value of one character is not dependent upon
    value of another character
  • The outcome of evolution at each character should
    not be predicted by the outcome at neighboring
    characters
  • If correlated characters are adjacent, Künsch
    (1989) block bootstrap
  • And Identically Distributed
  • Characters all have same probabilities of change
  • Assume
  • the characters as samples from a larger pool of
    characters
  • Rates are assigned independently to sites in a
    molecule
  • Thus, each site has a rate randomly drawn from a
    distribution of rates

12
Interpretation and Bias of the Bootstrap
  • Felsenstein 1985
  • a bootstrap proportion is a measure of
    repeatability, or the probability that a
    particular internal branch would be found in an
    analysis of a new independent sample of
    characters (replay of the evolutionary tape)
  • Felsenstein and Kishino 1993
  • a bootstrap proportion is a measure of accuracy,
    or the probability that a particular branch is
    contained in the true tree (assuming the
    phylogenetic method is consistent)
  • Hillis and Bull 1993
  • bootstrap proportions provide relatively
    unbiased, but highly imprecise, estimates of
    repeatability
  • Bootstrap proportions provide biased estimates of
    accuracy
  • When phylogenetic method is consistent,
    bootstrapping gives underestimates of accuracy at
    high bootstrap values, and overestimates of
    accuracy at low bootstrap values

13
Methods to reduce bias of the Bootstrap
  • Complete and partial bootstrap (Zharkikh and Li
    1995)
  • Efron, Halloran and Holmes 1996
  • Iterated bootstrap (Rodrigo 1993)
  • Approximately Unbiased (AU) method of Shimodaira
    (2002)

14
Measures of clade support
  • All methods
  • Number of unambiguous features (e.g.,
    substitutions with sequence data) supporting
    monophyly
  • Resampling
  • Bootstrap (widely used)
  • Jackknife (not as widely used)
  • Parsimony only
  • Bremer Support (Decay Index)
  • Permutation
  • Permutation Tail Probability (PTP)
  • Likelihood only
  • Parametric bootstrap (simulation of data)
  • Bayesian posterior probabilities (after Spring
    Break)

15
Bremer support (Parsimony)
trees
1 step
6
5
4
3
2
Most parsimonious trees
16
Bremer support
trees
1 step
6
5
4
3
2
0
1
2
3
6
5
4
17
Bremer support a.k.a. Decay Index
Where does clade of interest disappear (decay)?
trees
1 step
6
5
4
3
2
0
1
2
3
6
5
4
18
Bremer support a.k.a. Decay Index
Problem not clear what the value must be for a
group to be considered well supported
trees
1 step
6
5
4
3
2
0
1
2
3
6
5
4
19
Permutation tests-Parsimony
  • Reshuffling of data (rather than resampling)
  • Reshuffle characters among taxa
  • Permutation Tail Probability test (PTP)
  • Does the dataset contain more hierarchical
    structure than would be expected at random?
  • For each character, reshuffle characters states
    among taxa (with or without outgroup)
  • Compare tree-length of original dataset against
    distribution of tree-lengths of permuted dataset
  • If it is in the tail of the permuted
    distribution, reject null hypothesis of no
    hierarchical structure
  • Topology-dependent PTP

20
Number of Tree length
replicates -------------------------
. . 8008 1
8010 2 8011
1 8014 4 8015
1 8016 2
8020 1 8022 1
8023 2 8026
1 8030 1 8031
1 8044 1
length for original (unpermuted) data P
0.010000
  • Results of PTP test
  • Number of
  • Tree length replicates
  • -------------------------
  • 3399 1
  • 7937 1
  • 7944 1
  • 7947 1
  • 7961 2
  • 7962 2
  • 7964 3
  • 7967 1
  • 7968 2
  • 7969 2
  • 7970 1
  • 7971 4
  • . .
  • . .

21
Permutation tests-continued
  • Topology-dependent PTP (Faith 1991)
  • Is there non-random support for a particular
    grouping (constraint)
  • Same reshuffling as PTP
  • Compare length of shortest tree not compatible
    with constraints minus length of shortest tree
    compatible with constraints
  • If this difference is in the tail of the
    probability, support for the grouping in question
    is not random
  • Can be extended to comparisons between two
    alternate groupings

22
  • Results of T-PTP test
  • Tree length Number of
  • difference replicates
  • -------------------------
  • 23 1
  • 14 1
  • 8 2
  • 7 1
  • 5 1
  • 4 1
  • 2 1
  • 1 2
  • 0 1
  • -2 3
  • . .
  • . .

Number of Tree length
replicates -------------------------
. . -65
1 -68 1 -71
1 -74 1
length difference for original (unpermuted) data
length of shortest tree not
compatible with constraints minus length of
shortest tree compatible with
constraints P 0.300000
23
Measures of clade support
  • All methods
  • Number of unambiguous features (e.g.,
    substitutions with sequence data) supporting
    monophyly
  • Resampling
  • Bootstrap (widely used)
  • Jackknife (not as widely used)
  • Parsimony only
  • Bremer Support (Decay Index)
  • Permutation
  • Permutation Tail Probability (PTP)
  • Likelihood only
  • Parametric bootstrap (simulation of data)
  • Bayesian posterior probabilities (after Spring
    Break)

24
Parametric Bootstrap
  • Simulate the data under the most likely tree and
    model of evolution
  • Infer the ML tree for each simulated replicate
  • Compute a majority-rule consensus tree with P
    values for branches (same as with non-parametric
    bootstrap)
  • Concerns
  • Close reliance on the correctness of statistical
    model evolution
  • When model is correct, variation from parametric
    bootstrap will be very similar to that of
    non-parametric bootstrap
  • Alternative use (and much more widely used) to
    compare alternative topologies to the most likely
    tree (SOWH test see http//www.ebi.ac.uk/goldman/
    tests/SOWHinstr.html for a step-by-step detailed
    description)

25
Parametric Bootstrap
(from Felsenstein, 2004)
26
Parametric Bootstrap
  • Most useful
  • In many cases, the overall tree structure, rather
    than a particular branch may suggest that a null
    hypothesis is incorrect
  • If no single branch is particularly well
    supported, but the cumulative effects of many
    branches contain enough phylogenetic signal to
    reject a particular null hypothesis

27
From Zanis et al. 2002
28
Parametric bootstrap test of monophyly (SOWH test)
  • Build constraint tree for hypothesis B
  • Perform MP analyses under constraint
  • Estimate parameter values under selected model on
    this constraint tree
  • Use these parameters to search for ML tree under
    constraint
  • Re-estimate parameter values on new tree
  • Use tree and parameter values to simulate
    datasets (e.g., 100)

29
Parametric bootstrap test of monophyly (SOWH test)
  • For each of the simulated dataset

ML analyses ML analyses
Difference no constraints
constraint Hyp A in
likelihood Replicate Likelihood score
Likelihood score score 1 x0
xA ?1 2 x0

xA ?2 3 x0
xA ?3 . . . . . .
. . . . . . 100 x0
xA ?100
30
Parametric bootstrap test of monophyly (SOWH test)

31
Parametric bootstrap test of monophyly (SOWH test)

Observed difference is not significantly
different from the values in the
distribution. Cant reject hypothesis B
Observed difference is significantly different
from the values in the distribution (p lt
0.01. Reject hypothesis B
32
In other words...

If hypothesis B were true, we could recover
hypothesis A in an analysis
If hypothesis B were true, we would be unlikely
to recover hypothesis A in an analysis
33
Additional References
  • Hillis, D. M., and J. J. Bull. 1993. An
    empirical test of bootstrapping as a method for
    assessing confidence in phylogenetic analysis.
    Syst. Biol. 42182-192.
  • Hillis, D. M. 1995. Approaches for Assessing
    Phylogenetic Accuracy. Systematic Biology
    443-16.
  • Goldman, Anderson and Rodrigo. 2000.
    Likelihood-Based Tests of Topologies in
    Phylogenetics. Systematic Biology 49, (652-670)
  • Zharkikh, A., and W. H. Li. 1995. Estimation of
    confidence in phylogeny the complete-and-partial
    bootstrap technique. Mol. Phylogenet. Evol.
    444-63.
  • Swofford, D. L., G. J. Olsen, P. J. Waddell, and
    D. M. Hillis. 1996. Phylogenetic inference. Pp.
    407-514 in D. M. Hillis, C. Moritz and B. K.
    Mable, eds. Molecular Systematics. Sinauer
    Associates, Sunderland.
  • Felsenstein 2004. Inferring Phylogenies. Chapter
    20
Write a Comment
User Comments (0)
About PowerShow.com