Title: Parsimony and searching tree-space
1Parsimony and searching tree-space
2The basic idea
- To infer trees we want to find clades (groups)
that are supported by synapomorpies (shared
derived traits). - More simply put, we assume that if species are
similar it is usually due to common descent
rather than due to chance.
3Sometimes the data agrees
ACCGCTTA
ACTGCTTA
ACCCCTTA
Time
ACTGCTAA
ACTGCTTA
ACCCCTTA
ACCCCATA
ACCCCTTA ACCCCATA ACTGCTTA ACTGCTAA
4Sometimes not
ACCGCTTA
ACTGCTTA
ACCCCTTA
Time
ACTGCTAA
ACTGCTTC
ACCCCTTC
ACCCCATA
ACCCCTTC ACCCCATA ACTGCTTC ACTGCTAA
5Homoplasy
- When we have two or more characters that cant
possibly fit on the same tree without requiring
one character to undergo a parallel change or
reversal it is called homoplasy.
ACCGCTTA
ACCCCTTA
ACTGCTTA
Time
ACTGCTAA
ACTGCTTC
ACCCCTTC
ACCCCATA
6How can we choose the best tree?
- To decide which tree is best we can use an
optimality criterion. - Parsimony is one such criterion.
- It chooses the tree which requires the fewest
mutations to explain the data. - The Principle of Parsimony is the general
scientific principle that accepts the simplest of
two explanations as preferable.
7 S1 ACCCCTTC S2 ACCCCATA
S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4)
(1,3),(2,4)
1
2
1
3
3
4
2
4
8 S1 ACCCCTTC S2 ACCCCATA
S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4)
0 (1,3),(2,4) 0
A
A
A
A
A
A
A
A
9 S1 ACCCCTTC S2 ACCCCATA
S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4)
001 (1,3),(2,4) 002
C
C
C
T
C
T
T
T
C
T
10 S1 ACCCCTTC S2 ACCCCATA
S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4)
0011 (1,3),(2,4) 0022
C
C
C
G
C
G
G
G
C
G
11 S1 ACCCCTTC S2 ACCCCATA
S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4)
001101 (1,3),(2,4) 002201
T
A
T
T
T
A
T
T
A
T
12 S1 ACCCCTTC S2 ACCCCATA
S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4)
0011011 (1,3),(2,4) 0022011
T
T
T
T
T
A
T
A
13 S1 ACCCCTTC S2 ACCCCATA
S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4)
00110112 6 (1,3),(2,4) 00220111 7
C
A
C
C
C
A
C
A
A
A
According to the parsimony optimality criterion
we should prefer the tree (1,2),(3,4) over the
tree (1,3),(2,4) as it requires the fewest
mutations.
14Maximum Parsimony
- The parsimony criterion tries to minimise the
number of mutations required to explain the data - The Small Parsimony Problem is to compute the
number of mutations required on a given tree. - For small examples it is straightforward to see
how many mutations are needed
Cat
Cat
Rat
Rat
A
G
A
G
G
A
A
A
A
G
G
A
Dog
Dog
Mouse
Mouse
15The Fitch algorithm
- For larger examples we need an algorithm to solve
the small parsimony problem
f
Site a A b A c C d C e G f G g T h A
g
h
e
a
d
b
c
16The Fitch algorithm
- Label the tips of the tree with the observed
sequence at the site
G
T
A
G
A
C
A
C
17The Fitch algorithm
- Pick an arbitrary root to work towards
G
T
A
G
A
C
A
C
18The Fitch algorithm
- Work from the tips of the tree towards the root.
Label each node with the intersection of the
states of its child nodes. - If the intersection is empty label the node with
the union and add one to the cost
G
T
A
A,C,G
A,T
G
C
A
A
C,G
A
C
A
C
Cost 4
19Fitch continued
- The Fitch algorithm also has a second phase that
allocates states to the internal nodes but it
does not affect the cost. - To find the Fitch cost of an alignment for a
particular tree we just sum the Fitch costs of
all the sites.
20(No Transcript)
21The large parsimony problem
- The small parsimony problem to find the score
of a given tree - can be solved in linear time in
the size of the tree. - The large parsimony problem is to find the tree
with minimum score. - It is known to be NP-Hard.
22How many trees are there?
species unrooted binary tip-labelled trees
4 3
5 3515
6 357105
7 3579945
10 2,027,025
20 2.21020
n (2n-5)!!
An exact search for the best tree, where each
tree is evaluated according to some optimality
criterion such as parsimony quickly becomes
intractable as the number of species increases
23Counting trees
1
1
2
3
1
1
1
2
2
3
1 x 3 3
3
3
4
4
2
4
1
5
2
1
2
5
1
4
2
1
3
2
5
1
2
1 x 3 x 5 15
4
3
5
4
5
3
4
3
3
4
24Search strategies
- Exact search
- possible for small n only
- Branch and Bound
- up to 20 taxa
- Local Search - Heuristics
- pick a good starting tree and use moves within a
neighbourhood to find a better tree. - Meta-heuristics
- Genetic algorithms
- Simulated annealing
- The ratchet
25Exact searches
- for small number of taxa (nlt12) it is possible
to compute the score of every tree - Branch and Bound searches also guarantee to find
the optimal solution but use some clever rules to
avoid having to check all trees. They may be
effective for up to 25 taxa.
26http//evolution.gs.washington.edu/gs541/2005/lect
ure25.pdf
27No need to evaluate this whole branch of the
search tree, as no tree can have a score better
than 9
http//evolution.gs.washington.edu/gs541/2005/lect
ure25.pdf
28(No Transcript)
29(No Transcript)
30The problem of local optima
31Nearest Neighbor Interchange(NNI)
32Subtree Pruning Regrafting (SPR)
33Tree Bisection Reconnection(TBR)