Title: Assessing reliability of results measures of clade support
1Assessing reliability of results(measures of
clade support)
2(No Transcript)
3Measures of clade support
- All methods
- Number of unambiguous features (e.g.,
substitutions with sequence data) supporting
monophyly - Resampling
- Bootstrap (widely used)
- Jackknife (not as widely used)
- Parsimony only
- Bremer Support (Decay Index)
- Permutation
- Permutation Tail Probability (PTP)
- Likelihood only
- Parametric bootstrap (simulation of data)
- Bayesian posterior probabilities (next weeks
topic)
4Bootstrap jackknife
- Jackknife
- Drop one observation at a time and calculate the
estimate each time - Bootstrap (Efron 1979)
- Resampling with replacement to make fictional
sample of the same sample size
5The variability of the distribution of these
estimates could be estimated if - If we knew
the family of the distribution (normal, binomial,
etc.) - If the estimator were mathematically
tractableIf neither of these is the case, but
our sample size is sufficiently large, we can use
the empirical distribution of our sample to
estimate the true distribution F(?)To find out
the variation of our estimate of ?, we can draw
new datasets (with replacement) from our sample,
of the same size as the original dataset, and
analyze them in the same way
F(?)
(from Felsenstein, 2004)
6(from Felsenstein, 2004)
7Goal Mimic the variability that we would get if
we could sample more data sets from the
underlying true distributionThe data should be
a series of independently sample points.In
phylogenetics, data are matrix of taxa x
characters -Taxa are not independent -Characters
are a better candidate for independent points
F(?)
(from Felsenstein, 2004)
8Bootstrap in Phylogenetics
- Resample characters with replacement
- Conduct new analyses with resampled data
- Produce majority-rule consensus tree from
pseudo-replicates
9(from Felsenstein, 2004)
10- Resample characters with replacement
- Conduct new analyses with resampled data
- Produce majority-rule consensus tree from
pseudo-replicates
11Bootstrapping assumes characters are I.I.D.
- Independent
- Value of one character is not dependent upon
value of another character - The outcome of evolution at each character should
not be predicted by the outcome at neighboring
characters - If correlated characters are adjacent, Künsch
(1989) block bootstrap - And Identically Distributed
- Characters all have same probabilities of change
- Assume
- the characters as samples from a larger pool of
characters - Rates are assigned independently to sites in a
molecule - Thus, each site has a rate randomly drawn from a
distribution of rates
12Interpretation and Bias of the Bootstrap
- Felsenstein 1985
- a bootstrap proportion is a measure of
repeatability, or the probability that a
particular internal branch would be found in an
analysis of a new independent sample of
characters (replay of the evolutionary tape) - Felsenstein and Kishino 1993
- a bootstrap proportion is a measure of accuracy,
or the probability that a particular branch is
contained in the true tree (assuming the
phylogenetic method is consistent) - Hillis and Bull 1993
- bootstrap proportions provide relatively
unbiased, but highly imprecise, estimates of
repeatability - Bootstrap proportions provide biased estimates of
accuracy - When phylogenetic method is consistent,
bootstrapping gives underestimates of accuracy at
high bootstrap values, and overestimates of
accuracy at low bootstrap values
13Methods to reduce bias of the Bootstrap
- Complete and partial bootstrap (Zharkikh and Li
1995) - Efron, Halloran and Holmes 1996
- Iterated bootstrap (Rodrigo 1993)
- Approximately Unbiased (AU) method of Shimodaira
(2002)
14Measures of clade support
- All methods
- Number of unambiguous features (e.g.,
substitutions with sequence data) supporting
monophyly - Resampling
- Bootstrap (widely used)
- Jackknife (not as widely used)
- Parsimony only
- Bremer Support (Decay Index)
- Permutation
- Permutation Tail Probability (PTP)
- Likelihood only
- Parametric bootstrap (simulation of data)
- Bayesian posterior probabilities (after Spring
Break)
15Bremer support (Parsimony)
trees
1 step
6
5
4
3
2
Most parsimonious trees
16Bremer support
trees
1 step
6
5
4
3
2
0
1
2
3
6
5
4
17Bremer support a.k.a. Decay Index
Where does clade of interest disappear (decay)?
trees
1 step
6
5
4
3
2
0
1
2
3
6
5
4
18Bremer support a.k.a. Decay Index
Problem not clear what the value must be for a
group to be considered well supported
trees
1 step
6
5
4
3
2
0
1
2
3
6
5
4
19Permutation tests-Parsimony
- Reshuffling of data (rather than resampling)
- Reshuffle characters among taxa
- Permutation Tail Probability test (PTP)
- Does the dataset contain more hierarchical
structure than would be expected at random? - For each character, reshuffle characters states
among taxa (with or without outgroup) - Compare tree-length of original dataset against
distribution of tree-lengths of permuted dataset - If it is in the tail of the permuted
distribution, reject null hypothesis of no
hierarchical structure - Topology-dependent PTP
20 Number of Tree length
replicates -------------------------
. . 8008 1
8010 2 8011
1 8014 4 8015
1 8016 2
8020 1 8022 1
8023 2 8026
1 8030 1 8031
1 8044 1
length for original (unpermuted) data P
0.010000
- Results of PTP test
- Number of
- Tree length replicates
- -------------------------
- 3399 1
- 7937 1
- 7944 1
- 7947 1
- 7961 2
- 7962 2
- 7964 3
- 7967 1
- 7968 2
- 7969 2
- 7970 1
- 7971 4
- . .
- . .
21Permutation tests-continued
- Topology-dependent PTP (Faith 1991)
- Is there non-random support for a particular
grouping (constraint) - Same reshuffling as PTP
- Compare length of shortest tree not compatible
with constraints minus length of shortest tree
compatible with constraints - If this difference is in the tail of the
probability, support for the grouping in question
is not random - Can be extended to comparisons between two
alternate groupings
22- Results of T-PTP test
- Tree length Number of
- difference replicates
- -------------------------
- 23 1
- 14 1
- 8 2
- 7 1
- 5 1
- 4 1
- 2 1
- 1 2
- 0 1
- -2 3
- . .
- . .
Number of Tree length
replicates -------------------------
. . -65
1 -68 1 -71
1 -74 1
length difference for original (unpermuted) data
length of shortest tree not
compatible with constraints minus length of
shortest tree compatible with
constraints P 0.300000
23Measures of clade support
- All methods
- Number of unambiguous features (e.g.,
substitutions with sequence data) supporting
monophyly - Resampling
- Bootstrap (widely used)
- Jackknife (not as widely used)
- Parsimony only
- Bremer Support (Decay Index)
- Permutation
- Permutation Tail Probability (PTP)
- Likelihood only
- Parametric bootstrap (simulation of data)
- Bayesian posterior probabilities (after Spring
Break)
24Parametric Bootstrap
- Simulate the data under the most likely tree and
model of evolution - Infer the ML tree for each simulated replicate
- Compute a majority-rule consensus tree with P
values for branches (same as with non-parametric
bootstrap) - Concerns
- Close reliance on the correctness of statistical
model evolution - When model is correct, variation from parametric
bootstrap will be very similar to that of
non-parametric bootstrap - Alternative use (and much more widely used) to
compare alternative topologies to the most likely
tree (SOWH test see http//www.ebi.ac.uk/goldman/
tests/SOWHinstr.html for a step-by-step detailed
description)
25Parametric Bootstrap
(from Felsenstein, 2004)
26Parametric Bootstrap
- Most useful
- In many cases, the overall tree structure, rather
than a particular branch may suggest that a null
hypothesis is incorrect - If no single branch is particularly well
supported, but the cumulative effects of many
branches contain enough phylogenetic signal to
reject a particular null hypothesis
27From Zanis et al. 2002
28Parametric bootstrap test of monophyly (SOWH test)
- Build constraint tree for hypothesis B
- Perform MP analyses under constraint
- Estimate parameter values under selected model on
this constraint tree - Use these parameters to search for ML tree under
constraint - Re-estimate parameter values on new tree
- Use tree and parameter values to simulate
datasets (e.g., 100)
29Parametric bootstrap test of monophyly (SOWH test)
- For each of the simulated dataset
ML analyses ML analyses
Difference no constraints
constraint Hyp A in
likelihood Replicate Likelihood score
Likelihood score score 1 x0
xA ?1 2 x0
xA ?2 3 x0
xA ?3 . . . . . .
. . . . . . 100 x0
xA ?100
30Parametric bootstrap test of monophyly (SOWH test)
31Parametric bootstrap test of monophyly (SOWH test)
Observed difference is not significantly
different from the values in the
distribution. Cant reject hypothesis B
Observed difference is significantly different
from the values in the distribution (p lt
0.01. Reject hypothesis B
32In other words...
If hypothesis B were true, we could recover
hypothesis A in an analysis
If hypothesis B were true, we would be unlikely
to recover hypothesis A in an analysis
33Additional References
- Hillis, D. M., and J. J. Bull. 1993. An
empirical test of bootstrapping as a method for
assessing confidence in phylogenetic analysis.
Syst. Biol. 42182-192. - Hillis, D. M. 1995. Approaches for Assessing
Phylogenetic Accuracy. Systematic Biology
443-16. - Goldman, Anderson and Rodrigo. 2000.
Likelihood-Based Tests of Topologies in
Phylogenetics. Systematic Biology 49, (652-670) - Zharkikh, A., and W. H. Li. 1995. Estimation of
confidence in phylogeny the complete-and-partial
bootstrap technique. Mol. Phylogenet. Evol.
444-63. - Swofford, D. L., G. J. Olsen, P. J. Waddell, and
D. M. Hillis. 1996. Phylogenetic inference. Pp.
407-514 in D. M. Hillis, C. Moritz and B. K.
Mable, eds. Molecular Systematics. Sinauer
Associates, Sunderland. - Felsenstein 2004. Inferring Phylogenies. Chapter
20