Title: Treethinking cont. Introduction to parsimony
1Tree-thinking (cont.)Introduction to parsimony
2The most important feature of a phylogenetic
trees is its topology (the order of branching)
A
B
C
D
F
G
F
G
C
D
A
B
E
E
Draw this topology with the taxa in the order
E-G-F-C-D-A-B
3Which of the following has a different topology?
A
B
C
D
E
A
B
C
D
E
A
B
A
B
E
D
C
B
A
C
E
D
C
D
4Various types of trees you will see
R
R
R
5Which topology is different?
A
B
A
B
C
D
E
F
R
C
D
E
F
A
B
R
A
B
A
R
D
B
D
E
R
C
E
D
C
F
F
C
6Evolutionary relatedness
- Evolutionary relatedness recency of common
ancestry - Topology contains the information needed to
assess relative degree of relatedness
7Fish
Newt
Lizard
Mouse
Human
Is a newt more closely related to a fish or a
human?
8Why do people go wrong?Looking along the top
Fish
Newt
Lizard
Mouse
Human
Is a newt more closely related to a fish or a
human?
9Fish
Newt
Lizard
Mouse
Human
- This is not how evolution happened
- All these species are alive today A living fish
is not an ancestor of a newt - The order along the top can change without
changing the content of the tree
10Now, is a newt more closely related to a fish or
a human?
Fish
Newt
Lizard
Mouse
Human
11The tree has the same topology
12Trees depict descent not similarity
Turtle
Lizard
Crocodile
Sparrow
Is a crocodile more closely related to a lizard
or an sparrow?
13Dont be distracted by similarity
Turtle
Lizard
Crocodile
Sparrow
It doesnt matter how many changes occurred here,
the tree shape remains the same
14Fish
Newt
Lizard
Mouse
Human
Is a newt more closely related to a lizard or a
human?
15The principle of phylogenetic inference
16General procedure
- We score tips for some variable characters
- We have a model of how evolution might have given
rise to the states we see - We identify the tree (etc.) that is most
compatible with our data
17A hypothetical example
AGTTGTACGTATGCCGA
18O
A
B
C
AGTTGTAGGTATGCCGA
AGTAGTACGTATGCCTA
AGTAGTACGTATGCCGA
AGTAGCACGTATGACTA
19Typical experimental strategy
Extract DNA
PCR with gene-specific probes
Sequence PCR product
Align DNA sequences from different species (data
matrix)
20Data matrix
Taxa
O
A
B
C
21Typical experimental strategy
Extract DNA
PCR with gene-specific probes
Sequence PCR product
Align DNA sequences from different species (data
matrix)
Phylogeny
22Matrix -gt Tree
- Use an algorithm
- Apply an optimality criterion
23The algorithmic approach
- Make assumptions about how evolution works
- Identify properties of the true tree under these
assumptions - Develop an algorithm for finding the tree with
these properties (can be very fast) - Two main ones
- UPGMA - Assumes ultrametricity
- Neigbor-joining - Assumes additivity
24The problem with algorithms
- Even if the real world matches our model there is
an element of chance in evolution - The true tree may not be the one found
- We have no way of evaluating the degree of
support for the algorithmic tree relative to
other possible trees
25Optimality criteria
- Make assumptions about how evolution works
- Identify properties that will tend to be
maximized or minimized on true trees - Score that property for all possible trees
- Trees with better scores will be more likely to
be true (if the model is correct) - Trees can be compared based on their score
26Example of an optimality criterion Parsimony
- Favor the tree that can explain the distribution
of character states with the minimum number of
character-state changes
27A hypothetical example
AGTTGTAGGTATGCCGA
AGTAGTACGTATGCCTA
AGTAGTACGTATGCCGA
AGTAGCACGTATGACTA
AGTAGTACGT -ATGCCTA
AGTAGTACGTATGCCGA
AGTTGTACGTATGCCGA
28Data matrix
1111111
12345678901234567
O
AGTTGTAGGTATGCCGA
A
AGTAGTACGTATGCCGA
B
AGTAGTACGTATGCCTA
C
AGTAGCACGTATGACTA
29Remove invariant characters
1111111
12345678901234567
O
T T G C G
A
A T C C G
B
A T C C T
C
A C C A T
30There are three possible arrangements that we
need to consider
C
B
A
O
Tree 1
Tree 2
Tree 3
31These trees can be drawn without the root
R
R
R
32These trees can be drawn without the root
33Map the characters onto tree 1
1
2
3
4
5
T
T
G
C
G
O
A
A
T
C
C
G
B
A
T
C
C
T
A
O
C
A
G
C
A
T
C
B
Total cost (length) steps
34Actually there are two ways to map character 5
3
O
G
A
O
A
G
B
T
A
O
C
T
C
B
C
B
Either way the character contributes __ steps to
the overall cost
35Map the characters onto tree 2
1
2
3
4
5
O
T
T
G
C
G
A
A
T
C
C
G
B
C
O
A
T
C
C
T
C
A
G
C
A
T
A
B
Total cost
36Map the characters onto tree 3
1
2
3
4
5
T
T
G
C
G
O
A
A
T
C
C
G
B
A
T
C
C
T
A
O
C
A
G
C
A
T
B
C
Total cost steps
37What was the cost of each tree?
38The difference in tree length is all due to
character 5
Parsimony informative
1
2
3
4
5
T
T
G
C
G
O
A
A
T
C
C
G
B
A
T
C
C
T
C
A
G
C
A
T
Parsimony uninformative
39Parsimony informative characters
- At least two states that occur in at least two
taxa - A C G T T T A
- T C G A T T A
- G G G T T A G
- G G A A A T ?
- C A T G ? C G
40Redraw tree 2 with root in place
R
This is the correct tree
R
41Which rooted tree is correct?
A
H
G
A
B
E
O
F
G
H
C
D
F
B
O
E
C
D
B
A
O
A
D
E
H
G
F
B
C
O
F
E
D
C
A
B
G
H
C
C
A
B
42Many issues glossed over
- What if characters disagree?
- How is the tree score determined?
- How can we root the trees?
- How do we find the optimal tree?
- How can we evaluate the robustness of our
conclusions?
43How does character conflict arise?
- The tree is not divergent (ignore)
- A particular character changes more than once
(Homoplasy)
A
B
C
D
E
F
G
G-gtA
Reversal
A-gtG
44How can characters conflict arise?
- The tree is not divergent (ignore)
- A particular character changes more than once
(Homoplasy)
A
B
C
D
E
F
G
G-gtA
G-gtA
Parallelism/ Convergence
45Parsimony can still work
- If characters are independent (a key assumption)
homoplasy will be randomly distributed - Homoplasies will tend to cancel each other out
- Non-homoplastic changes will tend to agree
- Therefore,with enough characters the shortest
tree is a good estimate of the true tree
46The justification of parsimony
Good characters - mark real clades
Bad characters - the rest
Only bad characters contradict each other
47Many issues glossed over
- What if characters disagree?
- How is the tree score determined?
- How can we root the trees?
- How do we find the optimal tree?
- How can we evaluate the robustness of our
conclusions?
48Tree score calculation
In
Ltot ? Ln Wn
I1
The tree score is the sum of the minimum number
of weighted steps (Ln) for each character
multiplied by the weight of that character (Wn)
49How is the minimum number of steps calculated?
- Postorder traversal algorithm
- The tree is arbitrarily rooted
- Each internal node is inspected to see if there
is an intersection in the possible states of its
descendant nodes if not tree length is increased - It is not necessary to identify all ancestral
state reconstructions (this requires a preorder
traversal)
50Why weight characters?
- If we think some characters are less prone to
homoplasy, we can upweight them - Character weights are multiplied by the character
length
51We can also weight character state transitions
- Common examples
- Ordered character states (morphology)
To state
From state
Step matrix
52We can also weight character state transitions
- Common examples
- Transitions vs. transversions
To state
From state
Step matrix
53We can also weight character state transitions
- Common examples
- Gains less likely than loss (restriction sites)
To state
From state
Step matrix (Asymmetric)
54The weighting game
- When should you weight characters/character-states
? - If you think that they differ in evidential power
- How much should you modify weights?
- There is no simple formula
- It is probably better to err on the side of less
extreme weights - Often sensible to try a range of weights