Parsimony and searching tree-space - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Parsimony and searching tree-space

Description:

Parsimony and searching tree-space – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 34
Provided by: Holla154
Category:

less

Transcript and Presenter's Notes

Title: Parsimony and searching tree-space


1
Parsimony and searching tree-space


2
The basic idea
  • To infer trees we want to find clades (groups)
    that are supported by synapomorpies (shared
    derived traits).
  • More simply put, we assume that if species are
    similar it is usually due to common descent
    rather than due to chance.

3
Sometimes the data agrees
ACCGCTTA
ACTGCTTA
ACCCCTTA
Time
ACTGCTAA
ACTGCTTA
ACCCCTTA
ACCCCATA
ACCCCTTA ACCCCATA ACTGCTTA ACTGCTAA
4
Sometimes not
ACCGCTTA
ACTGCTTA
ACCCCTTA
Time
ACTGCTAA
ACTGCTTC
ACCCCTTC
ACCCCATA
ACCCCTTC ACCCCATA ACTGCTTC ACTGCTAA
5
Homoplasy
  • When we have two or more characters that cant
    possibly fit on the same tree without requiring
    one character to undergo a parallel change or
    reversal it is called homoplasy.

ACCGCTTA
ACCCCTTA
ACTGCTTA
Time
ACTGCTAA
ACTGCTTC
ACCCCTTC
ACCCCATA
6
How can we choose the best tree?
  • To decide which tree is best we can use an
    optimality criterion.
  • Parsimony is one such criterion.
  • It chooses the tree which requires the fewest
    mutations to explain the data.
  • The Principle of Parsimony is the general
    scientific principle that accepts the simplest of
    two explanations as preferable.

7
S1 ACCCCTTC S2 ACCCCATA
S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4)
(1,3),(2,4)
1
2
1
3
3
4
2
4
8
S1 ACCCCTTC S2 ACCCCATA
S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4)
0 (1,3),(2,4) 0
A
A
A
A
A
A
A
A
9
S1 ACCCCTTC S2 ACCCCATA
S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4)
001 (1,3),(2,4) 002
C
C
C
T
C
T
T
T
C
T
10
S1 ACCCCTTC S2 ACCCCATA
S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4)
0011 (1,3),(2,4) 0022
C
C
C
G
C
G
G
G
C
G
11
S1 ACCCCTTC S2 ACCCCATA
S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4)
001101 (1,3),(2,4) 002201
T
A
T
T
T
A
T
T
A
T
12
S1 ACCCCTTC S2 ACCCCATA
S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4)
0011011 (1,3),(2,4) 0022011
T
T
T
T
T
A
T
A
13
S1 ACCCCTTC S2 ACCCCATA
S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4)
00110112 6 (1,3),(2,4) 00220111 7
C
A
C
C
C
A
C
A
A
A
According to the parsimony optimality criterion
we should prefer the tree (1,2),(3,4) over the
tree (1,3),(2,4) as it requires the fewest
mutations.
14
Maximum Parsimony
  • The parsimony criterion tries to minimise the
    number of mutations required to explain the data
  • The Small Parsimony Problem is to compute the
    number of mutations required on a given tree.
  • For small examples it is straightforward to see
    how many mutations are needed

Cat
Cat
Rat
Rat
A
G
A
G
G
A
A
A
A
G
G
A
Dog
Dog
Mouse
Mouse
15
The Fitch algorithm
  • For larger examples we need an algorithm to solve
    the small parsimony problem

f
Site a A b A c C d C e G f G g T h A
g
h
e
a
d
b
c
16
The Fitch algorithm
  • Label the tips of the tree with the observed
    sequence at the site

G
T
A
G
A
C
A
C
17
The Fitch algorithm
  • Pick an arbitrary root to work towards

G
T
A
G
A
C
A
C
18
The Fitch algorithm
  • Work from the tips of the tree towards the root.
    Label each node with the intersection of the
    states of its child nodes.
  • If the intersection is empty label the node with
    the union and add one to the cost

G
T
A
A,C,G
A,T
G
C
A
A
C,G
A
C
A
C
Cost 4
19
Fitch continued
  • The Fitch algorithm also has a second phase that
    allocates states to the internal nodes but it
    does not affect the cost.
  • To find the Fitch cost of an alignment for a
    particular tree we just sum the Fitch costs of
    all the sites.

20
(No Transcript)
21
The large parsimony problem
  • The small parsimony problem to find the score
    of a given tree - can be solved in linear time in
    the size of the tree.
  • The large parsimony problem is to find the tree
    with minimum score.
  • It is known to be NP-Hard.

22
How many trees are there?
species unrooted binary tip-labelled trees
4 3
5 3515
6 357105
7 3579945
10 2,027,025
20 2.21020
n (2n-5)!!
An exact search for the best tree, where each
tree is evaluated according to some optimality
criterion such as parsimony quickly becomes
intractable as the number of species increases
23
Counting trees
1
1
2
3
1
1
1
2
2
3
1 x 3 3
3
3
4
4
2
4
1
5
2
1
2
5
1
4
2
1
3
2
5
1
2
1 x 3 x 5 15
4
3
5
4
5
3
4
3
3
4
24
Search strategies
  • Exact search
  • possible for small n only
  • Branch and Bound
  • up to 20 taxa
  • Local Search - Heuristics
  • pick a good starting tree and use moves within a
    neighbourhood to find a better tree.
  • Meta-heuristics
  • Genetic algorithms
  • Simulated annealing
  • The ratchet

25
Exact searches
  • for small number of taxa (nlt12) it is possible
    to compute the score of every tree
  • Branch and Bound searches also guarantee to find
    the optimal solution but use some clever rules to
    avoid having to check all trees. They may be
    effective for up to 25 taxa.

26
http//evolution.gs.washington.edu/gs541/2005/lect
ure25.pdf
27
No need to evaluate this whole branch of the
search tree, as no tree can have a score better
than 9
http//evolution.gs.washington.edu/gs541/2005/lect
ure25.pdf
28
(No Transcript)
29
(No Transcript)
30
The problem of local optima
31
Nearest Neighbor Interchange(NNI)
32
Subtree Pruning Regrafting (SPR)
33
Tree Bisection Reconnection(TBR)
Write a Comment
User Comments (0)
About PowerShow.com