Assessing Phylogenetic Hypotheses and Phylogenetic Data - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Assessing Phylogenetic Hypotheses and Phylogenetic Data

Description:

... may be highly affected by inclusion or exclusion of only a few characters ... BPs depend on two aspects of the support for a group - the numbers of characters ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 20

Provided by: bioin6

Category:

more less

Transcript and Presenter's Notes

Title: Assessing Phylogenetic Hypotheses and Phylogenetic Data

1
Assessing Phylogenetic Hypotheses and
Phylogenetic Data

We use numerical phylogenetic methods because
most data includes potentially misleading
evidence of relationships
We should not be content with constructing
phylogenetic hypotheses but should also assess
what confidence we can place in our hypotheses
This is not always simple! (but do not despair!)

2
Assessing Data Quality

We expect (or hope) our data will be well
structured and contain strong phylogenetic signal
We can test this using randomization tests of
explicit null hypotheses
The behaviour of some measure of the quality of
our real data is contrasted with that of
comparable but phylogenetically uninformative
data determined by randomization of the data

3
Random Permutation

Random permutation destroys any correlation among
characters to that expected by chance alone
It preserves number of taxa, characters and
character states in each character (and the
theoretical maximum and minimum tree lengths)

T
A
X
A

C
H
A
R
A
C
T
E
R
S

Original structured data with strong correlations
among characters
1
2
3
4
5
6
7
8
R
-
P
R
P
R
P
R
P
R
P
A
-
E
A
E
A
E
A
E
A
E
N
-
R
N
R
N
R
N
R
N
R
D
-
M
D
M
D
M
D
M
D
M
O
-
U
O
U
O
U
O
U
O
U
M
-
T
M
T
M
T
M
T
M
T
L
-
E
L
E
L
E
L
E
L
E
Y
-
D
Y
D
Y
D
Y
D
Y
D

T
A
X
A

C
H
A
R
A
C
T
E
R
S

1
2
3
4
5
6
7
8
Randomly permuted data with any correlation
among characters due to chance
R
-
P
N
U
D
E
R
T
O
U
A
-
E
R
E
A
P
L
E
A
D
N
-
R
M
R
M
M
A
D
N
P
D
-
M
L
T
R
E
Y
M
D
R
O
-
U
D
E
Y
U
D
E
Y
M
M
-
T
O
M
O
T
O
U
L
T
L
-
E
Y
D
N
D
M
P
M
E
Y
-
D
A
P
L
R
N
R
R
E
4
Matrix Randomization Tests

Compare some measure of data quality/hierarchical
structure for the real and many randomly permuted
data sets
This allows us to define a test statistic for the
null hypothesis that the real data are no better
structured than randomly permuted and
phylogenetically uninformative data
A permutation tail probability (PTP) is the
proportion of data sets with as good or better
measure of quality than the real data

5
Structure of Randomization Tests

Reject null hypothesis if, for example, more than
5 of random permutations have as good or better
measure than the real data

6
Matrix Randomization Tests

Measures of data quality include
1. Tree length for most parsimonious trees - the
shorter the tree length the better the data
(PAUP)
2. Skewness of the distribution of tree lengths
(PAUP)

7
Matrix Randomization Tests
Ciliate SSUrDNA
Min 430 Max 927
1 MPT L 618 PTP 0.01 Significantly non random
Real data
3 MPTs L 792 PTP 0.68 Not significantly
different from random
Random data
Strict consensus
8
Skewness of Tree Length Distributions

Studies with random (and phylogenetically
uninformative) data showed that the distribution
of tree lengths tends to be normal

shortest
NUMBER OF TREES
tree
Tree length

In contrast, phylogenetically informative data
is expected to have a strongly skewed
distribution with few shortest trees and few
trees nearly as short

shortest
NUMBER OF TREES
tree
Tree length
9
Skewness - example
10
Assessing Phylogenetic Hypotheses - groups on
trees

Several methods have been proposed that attach
numerical values to internal branches in trees
that are intended to provide some measure of the
strength of support for those branches and the
corresponding groups
These methods include
character resampling methods - the bootstrap and
jackknife

11
Bootstrapping (non-parametric)

Bootstrapping is a modern statistical technique
that uses computer intensive random resampling of
data to determine sampling error or confidence
intervals for some estimated parameter

12
Bootstrapping (non-parametric)

Characters are resampled with replacement to
create many bootstrap replicate data sets
Each bootstrap replicate data set is analysed
Agreement among the resulting trees is summarized
with a majority-rule consensus tree
Frequency of occurrence of groups, bootstrap
proportions (BPs), is a measure of support for
those groups
Additional information is given in partition
tables

13
Bootstrapping
Resampled data matrix
Original data matrix

Characters
Characters
Summarise the results of multiple analyses with a
majority-rule consensus tree Bootstrap
proportions (BPs) are the frequencies with which
groups are encountered in analyses of replicate
data sets
Taxa 1 2 2 5 5 6 6 8
Taxa 1 2 3 4 5 6 7 8
A R R R Y Y Y Y Y
A R R Y Y Y Y Y Y
B R R R Y Y Y Y Y
B R R Y Y Y Y Y Y
C Y Y Y Y Y R R R
C Y Y Y Y Y R R R
D Y Y Y R R R R R
D Y Y R R R R R R
Outgp R R R R R R R R
Outgp R R R R R R R R
Randomly resample characters from the original
data with replacement to build many bootstrap
replicate data sets of the same size as the
original - analyse each replicate data set
D
A
B
C
D
A
B
C
B
C
D
A
1
5
1
2
5
96
2
8
8
7
2
6
6
66
2
6
5
1
4
3
Outgroup
Outgroup
Outgroup
14
Bootstrapping - an example
Partition Table
Ciliate SSUrDNA - parsimony bootstrap
123456789 Freq ----------------- .......
100.00 ....... 100.00 .......
100.00 ..... 100.00 ...
95.50 ....... 84.33 ....
11.83 .... 3.83 ..
2.50 ...... 1.00 ...... 1.00
Ochromonas (1)
Symbiodinium (2)
100
Prorocentrum (3)
Euplotes (8)
84
Tetrahymena (9)
96
Loxodes (4)
100
Tracheloraphis (5)
100
Spirostomum (6)
100
Gruberia (7)
Majority-rule consensus
15
Bootstrapping - random data
Partition Table
123456789 Freq ----------------- ..
71.17 ....... 58.87 .......
26.43 ....... 25.67 ...
23.83 ....... 21.00 ....
18.50 ....... 16.00 ......
15.67 ..... 13.17 .....
12.67 ...... 12.00 .......
12.00 ..... 11.00 .......
10.80 ...... 10.50 ...... 10.00
Randomly permuted data - parsimony bootstrap
Majority-rule consensus (with minority components)
16
Bootstrap - interpretation

Bootstrapping was introduced as a way of
establishing confidence intervals for phylogenies
This interpretation of bootstrap proportions
(BPs) depends on the assumption that the original
data is a random sample from a much larger set of
independent and identically distributed data
However, several things complicate this
interpretation
Perhhaps the assumptions are unreasonable -
making any statistical interpretation of BPs
invalid
Some theoretical work indicates that BPs are very
conservative, and may underestimate confidence
intervals - problem increases with numbers of
taxa
BPs can be high for incongruent relationships in
separate analyses - and can therefore be
misleading (misleading data -gt misleading BPs)
with parsimony it may be highly affected by
inclusion or exclusion of only a few characters

17
Bootstrap - interpretation

Bootstrapping is a very valuable and widely used
technique - it (or some suitable) alternative is
demanded by some journals, but it may require a
pragmatic interpretation
BPs depend on two aspects of the support for a
group - the numbers of characters supporting a
group and the level of support for incongruent
groups
BPs thus provides an index of the relative
support for groups provided by a set of data
under whatever interpretation of the data (method
of analysis) is used

18
Bootstrap - interpretation

High BPs (e.g. gt 85) is indicative of strong
signal in the data
Provided we have no evidence of strong misleading
signal (e.g. base composition biases, great
differences in branch lengths) high BPs are
likely to reflect strong phylogenetic signal
Low BPs need not mean the relationship is false,
only that it is poorly supported
Bootstrapping can be viewed as a way of exploring
the robustness of phylogenetic inferences to
perturbations in the the balance of supporting
and conflicting evidence for groups

19
Jackknifing

Jackknifing is very similar to bootstrapping and
differs only in the character resampling strategy
Some proportion of characters (e.g. 50) are
randomly selected and deleted
Replicate data sets are analysed and the results
summarised with a majority-rule consensus tree
Jackknifing and bootstrapping tend to produce
broadly similar results and have similar
interpretations

Write a Comment

User Comments (0)