Assessing Phylogenetic Hypotheses and Phylogenetic Data - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Assessing Phylogenetic Hypotheses and Phylogenetic Data

Description:

... may be highly affected by inclusion or exclusion of only a few characters ... BPs depend on two aspects of the support for a group - the numbers of characters ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 20
Provided by: bioin6
Category:

less

Transcript and Presenter's Notes

Title: Assessing Phylogenetic Hypotheses and Phylogenetic Data


1
Assessing Phylogenetic Hypotheses and
Phylogenetic Data
  • We use numerical phylogenetic methods because
    most data includes potentially misleading
    evidence of relationships
  • We should not be content with constructing
    phylogenetic hypotheses but should also assess
    what confidence we can place in our hypotheses
  • This is not always simple! (but do not despair!)

2
Assessing Data Quality
  • We expect (or hope) our data will be well
    structured and contain strong phylogenetic signal
  • We can test this using randomization tests of
    explicit null hypotheses
  • The behaviour of some measure of the quality of
    our real data is contrasted with that of
    comparable but phylogenetically uninformative
    data determined by randomization of the data

3
Random Permutation
  • Random permutation destroys any correlation among
    characters to that expected by chance alone
  • It preserves number of taxa, characters and
    character states in each character (and the
    theoretical maximum and minimum tree lengths)


T
A
X
A


C
H
A
R
A
C
T
E
R
S

Original structured data with strong correlations
among characters
1
2
3
4
5
6
7
8
R
-
P
R
P
R
P
R
P
R
P
A
-
E
A
E
A
E
A
E
A
E
N
-
R
N
R
N
R
N
R
N
R
D
-
M
D
M
D
M
D
M
D
M
O
-
U
O
U
O
U
O
U
O
U
M
-
T
M
T
M
T
M
T
M
T
L
-
E
L
E
L
E
L
E
L
E
Y
-
D
Y
D
Y
D
Y
D
Y
D

T
A
X
A


C
H
A
R
A
C
T
E
R
S

1
2
3
4
5
6
7
8
Randomly permuted data with any correlation
among characters due to chance
R
-
P
N
U
D
E
R
T
O
U
A
-
E
R
E
A
P
L
E
A
D
N
-
R
M
R
M
M
A
D
N
P
D
-
M
L
T
R
E
Y
M
D
R
O
-
U
D
E
Y
U
D
E
Y
M
M
-
T
O
M
O
T
O
U
L
T
L
-
E
Y
D
N
D
M
P
M
E
Y
-
D
A
P
L
R
N
R
R
E
4
Matrix Randomization Tests
  • Compare some measure of data quality/hierarchical
    structure for the real and many randomly permuted
    data sets
  • This allows us to define a test statistic for the
    null hypothesis that the real data are no better
    structured than randomly permuted and
    phylogenetically uninformative data
  • A permutation tail probability (PTP) is the
    proportion of data sets with as good or better
    measure of quality than the real data

5
Structure of Randomization Tests
  • Reject null hypothesis if, for example, more than
    5 of random permutations have as good or better
    measure than the real data

6
Matrix Randomization Tests
  • Measures of data quality include
  • 1. Tree length for most parsimonious trees - the
    shorter the tree length the better the data
    (PAUP)
  • 2. Skewness of the distribution of tree lengths
    (PAUP)

7
Matrix Randomization Tests
Ciliate SSUrDNA
Min 430 Max 927
1 MPT L 618 PTP 0.01 Significantly non random
Real data
3 MPTs L 792 PTP 0.68 Not significantly
different from random
Random data
Strict consensus
8
Skewness of Tree Length Distributions
  • Studies with random (and phylogenetically
    uninformative) data showed that the distribution
    of tree lengths tends to be normal

shortest
NUMBER OF TREES
tree
Tree length
  • In contrast, phylogenetically informative data
    is expected to have a strongly skewed
    distribution with few shortest trees and few
    trees nearly as short

shortest
NUMBER OF TREES
tree
Tree length
9
Skewness - example
10
Assessing Phylogenetic Hypotheses - groups on
trees
  • Several methods have been proposed that attach
    numerical values to internal branches in trees
    that are intended to provide some measure of the
    strength of support for those branches and the
    corresponding groups
  • These methods include
  • character resampling methods - the bootstrap and
    jackknife

11
Bootstrapping (non-parametric)
  • Bootstrapping is a modern statistical technique
    that uses computer intensive random resampling of
    data to determine sampling error or confidence
    intervals for some estimated parameter

12
Bootstrapping (non-parametric)
  • Characters are resampled with replacement to
    create many bootstrap replicate data sets
  • Each bootstrap replicate data set is analysed
  • Agreement among the resulting trees is summarized
    with a majority-rule consensus tree
  • Frequency of occurrence of groups, bootstrap
    proportions (BPs), is a measure of support for
    those groups
  • Additional information is given in partition
    tables

13
Bootstrapping
Resampled data matrix
Original data matrix


Characters
Characters
Summarise the results of multiple analyses with a
majority-rule consensus tree Bootstrap
proportions (BPs) are the frequencies with which
groups are encountered in analyses of replicate
data sets
Taxa 1 2 2 5 5 6 6 8
Taxa 1 2 3 4 5 6 7 8
A R R R Y Y Y Y Y
A R R Y Y Y Y Y Y
B R R R Y Y Y Y Y
B R R Y Y Y Y Y Y
C Y Y Y Y Y R R R
C Y Y Y Y Y R R R
D Y Y Y R R R R R
D Y Y R R R R R R
Outgp R R R R R R R R
Outgp R R R R R R R R
Randomly resample characters from the original
data with replacement to build many bootstrap
replicate data sets of the same size as the
original - analyse each replicate data set
D
A
B
C
D
A
B
C
B
C
D
A
1
5
1
2
5
96
2
8
8
7
2
6
6
66
2
6
5
1
4
3
Outgroup
Outgroup
Outgroup
14
Bootstrapping - an example
Partition Table
Ciliate SSUrDNA - parsimony bootstrap
123456789 Freq ----------------- .......
100.00 ....... 100.00 .......
100.00 ..... 100.00 ...
95.50 ....... 84.33 ....
11.83 .... 3.83 ..
2.50 ...... 1.00 ...... 1.00
Ochromonas (1)
Symbiodinium (2)
100
Prorocentrum (3)
Euplotes (8)
84
Tetrahymena (9)
96
Loxodes (4)
100
Tracheloraphis (5)
100
Spirostomum (6)
100
Gruberia (7)
Majority-rule consensus
15
Bootstrapping - random data
Partition Table
123456789 Freq ----------------- ..
71.17 ....... 58.87 .......
26.43 ....... 25.67 ...
23.83 ....... 21.00 ....
18.50 ....... 16.00 ......
15.67 ..... 13.17 .....
12.67 ...... 12.00 .......
12.00 ..... 11.00 .......
10.80 ...... 10.50 ...... 10.00
Randomly permuted data - parsimony bootstrap
Majority-rule consensus (with minority components)
16
Bootstrap - interpretation
  • Bootstrapping was introduced as a way of
    establishing confidence intervals for phylogenies
  • This interpretation of bootstrap proportions
    (BPs) depends on the assumption that the original
    data is a random sample from a much larger set of
    independent and identically distributed data
  • However, several things complicate this
    interpretation
  • Perhhaps the assumptions are unreasonable -
    making any statistical interpretation of BPs
    invalid
  • Some theoretical work indicates that BPs are very
    conservative, and may underestimate confidence
    intervals - problem increases with numbers of
    taxa
  • BPs can be high for incongruent relationships in
    separate analyses - and can therefore be
    misleading (misleading data -gt misleading BPs)
  • with parsimony it may be highly affected by
    inclusion or exclusion of only a few characters

17
Bootstrap - interpretation
  • Bootstrapping is a very valuable and widely used
    technique - it (or some suitable) alternative is
    demanded by some journals, but it may require a
    pragmatic interpretation
  • BPs depend on two aspects of the support for a
    group - the numbers of characters supporting a
    group and the level of support for incongruent
    groups
  • BPs thus provides an index of the relative
    support for groups provided by a set of data
    under whatever interpretation of the data (method
    of analysis) is used

18
Bootstrap - interpretation
  • High BPs (e.g. gt 85) is indicative of strong
    signal in the data
  • Provided we have no evidence of strong misleading
    signal (e.g. base composition biases, great
    differences in branch lengths) high BPs are
    likely to reflect strong phylogenetic signal
  • Low BPs need not mean the relationship is false,
    only that it is poorly supported
  • Bootstrapping can be viewed as a way of exploring
    the robustness of phylogenetic inferences to
    perturbations in the the balance of supporting
    and conflicting evidence for groups

19
Jackknifing
  • Jackknifing is very similar to bootstrapping and
    differs only in the character resampling strategy
  • Some proportion of characters (e.g. 50) are
    randomly selected and deleted
  • Replicate data sets are analysed and the results
    summarised with a majority-rule consensus tree
  • Jackknifing and bootstrapping tend to produce
    broadly similar results and have similar
    interpretations
Write a Comment
User Comments (0)
About PowerShow.com