Treesearches and tests for phylogenetic signal - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Treesearches and tests for phylogenetic signal

Description:

Tree-searches and tests for phylogenetic signal. Many issues glossed over ... Build a tree by adding taxa to the location that is optimal ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 41
Provided by: David51
Category:

less

Transcript and Presenter's Notes

Title: Treesearches and tests for phylogenetic signal


1
Tree-searches and tests for phylogenetic signal
2
Many issues glossed over
  • What if characters disagree?
  • How is the tree score determined?
  • How can we root the trees?
  • How do we find the optimal tree?
  • How can we evaluate the robustness of our
    conclusions?

3
Terminology
Polytomy
Binary/dichotomous/fully-resolved
polyotomous/unresolved
4
Unrooted trees
Polytomy
5
Number of unrooted fully resolved trees for t taxa
i t
  • ? (2i-5)

i 3
6
How many places can you add another taxon?
  • Two taxa
  • Three taxa
  • Four Taxa
  • Five taxa

7
What is the number of rooted trees?
  • The root is just one more taxon same formula but
    t number of taxa 1

8
The number of trees gets big
  • Number of binary unrooted trees
  • 1
  • 3
  • 15
  • 105
  • 2,027,025
  • 2.2 x 1020
  • 2.8 x 1074
  • 1 x 101074
  • Number of tips
  • 3
  • 4
  • 5
  • 6
  • 10
  • 20
  • 50
  • 500

9
How do you find the optimal tree?
  • Exhaustive (

10
How do you find the optimal tree?
  • Exhaustive (
  • Branch-and-bound (
  • Obtain the length of a random tree (initial upper
    bound)
  • As trees are built determine length
  • If length exceeds upper bound then that tree and
    all its descendant trees are ignored

11
How do you find the optimal tree?
  • Exhaustive (
  • Branch-and-bound (
  • Heuristic search (unlimited?)

12
Heuristic searches
  • Search for optimal trees by finding good trees
    and then rearranging them in the hopes of finding
    an even better tree

13
Getting starting trees
  • Random tree - not done
  • User tree (e.g., a NJ tree)
  • Build a tree by adding taxa to the location that
    is optimal
  • Can hold more than one tree at each step

14
Taxon addition order
  • As-is
  • In the order of the matrix (not done for
    parsimony)
  • Simple taxon addition
  • use a distance algorithm to decide order
  • Closest taxon addition
  • Add the taxon that makes the optimal tree
  • Random taxon addition order
  • Repeat many times

15
Heuristic search
Suboptimal island of trees
Global optimum
Starting trees
Treespace
16
Branch swapping
  • Nearest-neighbour interchange (NNI)

17
Branch swapping
  • Subtree pruning and regrafting (SPR)

18
Branch swapping
  • Tree-bisection reconnection (TBR)

19
Many issues glossed over
  • What if characters disagree?
  • How is the tree score determined?
  • How can we root the trees?
  • How do we find the optimal tree?
  • How can we evaluate the robustness of our
    conclusions?

20
Even if the shortest trees is the best estimate
of the true tree - the true tree might not be the
shortest
We should consider suboptimal trees
We should use statistical tests to help us
determine what to actually believe
21
Questions we can ask
  • Are the data random or do they have signal?
  • How much homoplasy is there?
  • To what extent are particular elements of the
    trees (clades) supported?
  • What alternative results can we reject?

22
How can we evaluate the reliability of the
tree(s) we obtain?
  • Is there agreement within the data?

23
The logic of looking at consistency indices
  • If all the characters have the same signal then
    the tree is more trustworthy
  • The more agreement there is, the less homoplasy
    (more consistency) the characters will show on
    the most parsimonious tree
  • We need statistics to measure consistency

24
How much homoplasy is there?
  • Taxon 1 A C A T T T A
  • Taxon 2 A C G A T T A
  • Taxon 3 A G G A T A G
  • Taxon 4 G A A A A C ?
  • Taxon 5 G A T A ? C G
  • ObsL 1 2 3 1 1 2 1

Min L 1 2 2 1 1 2 1
Minimum length overall 10 Length of MP tree 11
25
Consistency index
  • CI Min L 10 0.91
  • Obs L 11

Homoplasy index
HI 1-CI 0.09
26
How much homoplasy is there?
  • Taxon 1 A C A T T T A
  • Taxon 2 A C G A T T A
  • Taxon 3 A G G A T A G
  • Taxon 4 G A A A A C ?
  • Taxon 5 G A T A ? C G
  • ObsL 1 2 3 1 1 2 1

Min L 1 2 2 1 1 2 1
CI 1 1 .67 1 1 1 1
27
CI is affected by uninformative characters
  • Taxon 1 A C A T T T A
  • Taxon 2 A C G A T T A
  • Taxon 3 A G G A T A G
  • Taxon 4 G A A A A C ?
  • Taxon 5 G A T A ? C G
  • Min L 1 2 2 1 1 2 1

CI 1 1 .67 1 1 1 1
Minimum length overall 8 Length of MP tree
9 CI 0.89
28
Retention Index
  • Taxon 1 A C A T T T A
  • Taxon 2 A C G A T T A
  • Taxon 3 A G G A T A G
  • Taxon 4 G A A A A C ?
  • Taxon 5 G A T A ? C G
  • Min L 1 2 2 1 1 2 1
  • Max L 2 3 3 1 1 3 2

Maximum length overall 15
29
Retention index (RI)
  • MaxL - ObsL 5 0.83
  • MaxL - MinL 6

30
General trends observed with CI/RIs
  • Strong negative correlation between taxon number
    and CI/RI
  • Data sets with few characters can show unexpected
    high CI/RI

31
How can we evaluate the significance of CI/RI?
  • CI depends directly on tree length
  • We can compare the observed tree length with that
    we would obtain if there were no phylogenetic
    signal

The permutation tail probability (PTP) test
32
Permuting data removes phylogenetic signal
  • Taxon 1 ACATTTA
  • Taxon 2 ACGATTA
  • Taxon 3 AGGATAG
  • Taxon 4 GAAAAC?
  • Taxon 5 GATA?CG

Permuted data sets
Taxon 1 GAAA?AA Taxon 2 ACAATC? Taxon 3
GAGTATG Taxon 4 AGTATCG Taxon 5 ACGATTA
33
Example with signal
Number of Tree length replicates
------------------------- 1222 1
1669 1 1671 1
1672 1 1673 1 1674
1 1675 2 1676 2
1678 1 1679 2
1680 4 1681 5 1682
8 1683 4 1684 4
1685 2
Number of Tree length replicates
------------------------- 1686 8
1687 7 1688 6
1689 8 1690 6 1691
3 1692 2 1693
3 1694 3 1695 3
1696 3 1697 2
1699 2 1702 1 1704
2 1705 1
34
Example without signal
Number of Tree length replicates
------------------------- 1924
3 1926 1 1927
4 1928 1
1929 2 1930 8
1931 6 1932
5 1933 4 1934
4 1935 5
1936 1 1937 8
1938 11 1939
7
Number of Tree length replicates
------------------------- 1940 6
1941 7 1942
4 1943 2 1944
1 1945 1
1946 1 1947 1
1950 3 1952
1 1953 1 1955
1 1958 1
35
The PTP test is slow
  • Hillis and Huelsenbeck (1991) observed a
    difference between the shape of the tree length
    distribution as a function of phylogenetic signal

36
A data set without signal
mean599.182107 sd4.944738 g1-0.150922 582.0000
0 /-----------------------------------------------
------------------------- 583.80000
(5) 585.60000 (25) 587.40000
(71) 589.20000 (209) 591.00000
(161) 592.80000
(521) 594.60000
(883) 596.40000
(1132) 598.20000

(1469) 600.00000

(788) 601.80000

(1631) 603.60000

(1486) 605.40000
(1047) 607.20000
(567) 609.00000
(157) 610.80000
(171) 612.60000 (57) 614.40000
(11) 616.20000 (3) 618.00000 (1)
\-------------------------------------------------
-----------------------
37
A data set with signal
mean611.572872 sd31.049455 g1-0.942643 501.000
00 /----------------------------------------------
-------------------------- 508.65000
(15) 516.30000 (60) 523.95000
(84) 531.60000 (135) 539.25000
(21) 546.90000 (26) 554.55000
(96) 562.20000 (166) 569.85000
(290) 577.50000
(737) 585.15000
(1118) 592.80000
(665) 600.45000 (120) 608.10000
(268) 615.75000
(497) 623.40000
(796) 631.05000
(1337) 638.70000

(2031) 646.35000

(1610) 654.00000 (323)
\--------------------------------------------
----------------------------
38
Skewness test for phylogenetic signal
  • Hillis and Huelsenbeck (1991) generated random
    data for different numbers of taxa/characters to
    find the null distribution of g1 scores
  • One can compare observed g1 statistics with this
    null distribution

39
Tests for phylogenetic signal (g1 and PTP)
  • Are sensitive to any signal in the data
  • For example
  • g1 of permuted data -0.04 (ns)
  • Duplicate one taxon and g1 -1.56
  • Useful for identifying truly useless data (very
    rare)
  • But otherwise does not tell you much about data
    quality

40
Tests of signal
  • These methods seek to determine overall data
    quality as a guide to whether we should believe
    particular results
  • We can, instead, evaluate particular results
  • Clade support measures bootstrap/decay
  • Statistical tests of alternative hypotheses
Write a Comment
User Comments (0)
About PowerShow.com