In this approach, trees are constructed by comparing the ..

1 / 72
About This Presentation
Title:

In this approach, trees are constructed by comparing the ..

Description:

In this approach, trees are constructed by comparing the ... Assume that a tree is given. ... Do a post-order (from leaves to root) traversal of tree ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 73
Provided by: aleph0
Learn more at: http://aleph0.clarku.edu

less

Transcript and Presenter's Notes

Title: In this approach, trees are constructed by comparing the ..


1
Phylogenetic TreesLecture 2
Based on Durbin et al 7.4 Gusfield 17
2
Character-based methodsfor constructing
phylogenies
  • In this approach, trees are constructed by
    comparing the characters of the corresponding
    species.
  • Characters may be morphological (teeth
    structures) or molecular (homologous DNA
    sequences). One common approach is Maximum
    Parsimony.
  • Assumptions
  • Independence of characters (no interactions)
  • Best tree is one where minimal changes take place

3
1. Maximum Parsimony
Input four nucleotide sequences AAG, AAA, GGA,
AGA taken from four species. Question Which
evolutionary tree best explains these sequences ?
4
Example Continued
There are many trees possible. For example
The left tree is preferred over the right tree.
The total number of changes is called the
parsimony score.
5
Simple Example
  • Suppose we have five species, such that three
    have C and two T at a specified position
  • Minimal tree has one evolutionary change

C
T
C
T
C
C
C
T
T ? C
6
Extension to Many Letters
  • What is the parsimony score of

A CAGGTA B CAGACA C CGGGTA D TGCACT E TGCGTA
We do it character after character each score is
computed independently of the others.
7
Fitchs Algorithm of Evaluating Trees
  • Assume that a tree is given.
  • Traverse tree from leaves to root determining
    set of possible states (e.g. nucleotides) for
    each internal node
  • Traverse tree from root to leaves picking
    ancestral states for internal nodes

8
Fitchs Algorithm Step 1
  • of changes union operations

9
Fitchs Algorithm Step 1
  • Do a post-order (from leaves to root) traversal
    of tree
  • Determine possible states Ri of internal node
    i with children j and k

10
Fitchs Algorithm Step 2
11
Fitchs Algorithm Step 2
  • Do a pre-order (from root to leaves) traversal
    of tree
  • Select state rj of internal node j with
    parent i

12
Weighted Version of Fitchs Algorithm
  • Instead of assuming all state changes are
    equally likely, use different costs c(a, b)
    for different changes
  • 1st step of algorithm is to propagate costs up
    through tree

13
Weighted Version of Fitchs Algorithm
  • Want to determine minimal cost S(i, a)
  • of assigning character a to node i
  • For leave nodes i

14
Weighted Version of Fitchs Algorithm
  • Want to determine minimal cost S(i, a)
  • of assigning character a to node i
  • For internal nodes

a
i
j
k
b
15
Weighted Version of Fitchs Algorithm Step 2
  • Do a pre-order (from root to leaves) traversal
    of tree
  • Select minimal cost character for root
  • For each internal node j, select character
    that produced minimal cost at parent i

16
Weighted Parsimony Scores
  • Weighted Parsimony score
  • Each change is weighted by a score c(a, b).
  • The weighted parsimony score reduces to the
    parsimony score when c(a,a)0 and c(a,b)1 for
    all b ? a.

17
Evaluating Weighted Parsimony Scores
  • Each position is independent and computed by
    itself.
  • Use Dynamic Programming on a given tree.
  • If i is a node with children j and k , then
    S(i, a) minx(S(j, x)c(a, x)) miny(S(k,
    y)c(a, y))

S(i, a)?the minimum score of subtree rooted at k
when k has character a.
S(i,a)
S(j,x)
S(k,y)
18
Evaluating Parsimony Scores
  • Dynamic programming on a given tree
  • Initialization
  • For each leaf i set S(i,a) 0 if i is labeled
    by a, otherwise S(i,a) ?
  • Iteration
  • if i is node with children j and k, then
  • S(i,a) minx(S(j,x)c(a,x))
    miny(S(k,y)c(a,y))
  • Termination
  • cost of tree is minxS(r,x) where r is the root

Comment To reconstruct an optimal assignment,
we need to keep in each node i and for each
character a the two characters x, y that
bring about the minimum when i has character a.
19
Cost of Evaluating Parsimony for binary trees
  • If there are n nodes, m characters, and k
    possible values for each character, then
    complexity is O(nmk2).
  • Of course, we still need to search over ALL
    possible trees and find the best one. One usually
    resorts to heuristic search techniques.

20
Exploring the Space of Trees
  • Weve considered how to find the minimum number
    of changes for a given tree topology
  • Need some search procedure for exploring the
    space of tree topologies
  • Given n sequences there are
    possible rooted trees


21
Counting Trees
n 3 One Unrooted Tree
n 4 3 Unrooted Trees
A rooted tree with n leaves has (2n-1) nodes and
(2n-2) edges, discounting the edge to the root
hence an unrooted tree has (2n-3) edges. For
each additional leaf we add two edges. Therefore
we have 1 3 5 (2n-5) unrooted trees
with n leaves. Each of such trees has (2n-3)
edges, which can be chosen as a root of the
rooted tree. Hence we have 1 3 5
(2n-5) (2n-3) rooted trees with n leaves
22
Exploring the Space of Trees

23
Maximum Parsimony
1 2 3 4 5 6 7 8 9 10 Species
1 A G G G T A A C T G Species 2 - A C G A T T
A T T A Species 3 - A T A A T T G T C T Species
4 - A A T G T T G T C G
How many possible unrooted trees?
24
Maximum Parsimony
How many possible unrooted trees?
1 2 3 4 5 6 7 8 9 10 Species
1 - A G G G T A A C T G Species 2 - A C G A T T
A T T A Species 3 - A T A A T T G T C T Species
4 - A A T G T T G T C G
25
Maximum Parsimony
How many substitutions?
MP
26
Maximum Parsimony
1 2 3 4 5 6 7 8 9 10 1 - A G G G T A A C T
G 2 - A C G A T T A T T A 3 - A T A A T T G T C
T 4 - A A T G T T G T C G
27
Maximum Parsimony
1 2 3 4 5 6 7 8 9 10 1 - A G G G T A A C T
G 2 - A C G A T T A T T A 3 - A T A A T T G T C
T 4 - A A T G T T G T C G
28
Maximum Parsimony
1 - G 2 - C 3 - T 4 - A
29
Maximum Parsimony
1 2 3 4 5 6 7 8 9 10 1 - A G G G T A A C T
G 2 - A C G A T T A T T A 3 - A T A A T T G T C
T 4 - A A T G T T G T C G
30
Maximum Parsimony
1 2 3 4 5 6 7 8 9 10 1 - A G G G T A A C T
G 2 - A C G A T T A T T A 3 - A T A A T T G T C
T 4 - A A T G T T G T C G
31
Maximum Parsimony
G
A
4 1 - G 2 - A 3 - A 4 - G
2
A
G
A
G
A
2
A
G
A
1
A
G
A
32
Maximum Parsimony
33
Maximum Parsimony
1 2 3 4 5 6 7 8 9 10 1 - A G G G T A A C T
G 2 - A C G A T T A T T A 3 - A T A A T T G T C
T 4 - A A T G T T G T C G

0 3 2 2 0 1 1 1 1 3 14
34
Finding most parsimonious trees - exact solutions
  • Exact solutions can only be used for small
    numbers of taxa.
  • Exhaustive search examines all possible trees.
  • Typically used for problems with less than 10
    taxa.

35
Finding most parsimonious trees - exhaustive
search
(1)
B
C
Starting tree, any 3 taxa
A
Add fourth taxon (D) in each of three possible
positions three trees
E
D
C
D
B
B
C
(2b)
(2a)
(2c)
A
A
Add fifth taxon (E) in each of the five possible
positions on each of the three trees -gt 15
trees, and so on
36
Finding most parsimonious trees - exact solutions
  • Branch and bound saves time by discarding
    families of trees during tree construction that
    can not be smaller than the smallest tree found
    so far.
  • (Here smaller means smaller score i.e.,
    more parsimonious.)
  • Can be enhanced by specifying an initial upper
    bound for tree length (total of changes on the
    tree) e.g., from distance method.
  • Typically used only for problems with less than
    20 taxa.

37
Finding most parsimonious trees branch and bound
C2.1
B
C
C
C3.1
D
C
B
C2.2
B
C3.2
D
C2.3
C3.3
A
C2.4
C3.4
B2
B3
A
A
C2.5
C3.5
D
B
B
E
E
B
C
D
C
D
C
B1
C1.1
C1.5
A
A
A
B
E
D
D
B
D
B
E
C
C
C1.3
C
E
C1.2
C1.4
A
A
A
38
Finding most parsimonious trees - heuristics
  • The number of possible trees increases
    exponentially with the number of taxa making
    exhaustive searches impractical for many data
    sets (an NP complete problem)
  • Heuristic methods are used to search tree space
    for most parsimonious trees
  • The trees found are not guaranteed to be the most
    parsimonious - they are best guesses

39
Finding most parsimonious trees - heuristics
  • Stepwise addition
  • Asis - the order in the distance matrix
  • Closest -starts with shortest 3-taxon tree and
    adds taxa in order that produces the least
    increase in tree length
  • Simple - the first taxon in the matrix is a taken
    as a reference - taxa are added to it in the
    order of their decreasing similarity to the
    reference
  • Random - taxa are added in a random sequence,
    many different sequences can be used
  • Recommend random with as many (e.g. 10-100)
    addition sequences as practical

40
Finding most parsimonious trees - heuristics
  • Branch Swapping
  • Nearest neighbor interchange (NNI)
  • Subtree pruning and regrafting (SPR)
  • Tree bisection and reconnection (TBR)

41
Finding most parsimonious trees - heuristics 1
  • Nearest neighbor interchange (NNI)

C
D
E
A
F
B
G
D
C
C
D
E
A
E
A
F
B
F
B
G
G
42
Finding most parsimonious trees - heuristics 2
  • Subtree pruning and regrafting (SPR)

C
D
E
A
F
B
G
E
C
D
F
E
G
C
F
B
D
A
G
43
Finding most parsimonious trees - heuristics 3
  • Tree bisection and reconnection (TBR)

C
D
E
A
F
B
G
B
G
E
F
A
D
C
F
D
C
E
G
44
Finding most parsimonious trees - heuristics -
summary
  • Branch Swapping
  • Nearest neighbor interchange (NNI)
  • Subtree pruning and regrafting (SPR)
  • Tree bisection and reconnection (TBR)
  • The nature of heuristic searches means we cannot
    know which method will find the most parsimonious
    trees or all such trees.
  • However, TBR is the most extensive swapping
    routine and its use with multiple random addition
    sequences should work well.

45
Tree space may be populated by local minima and
islands of most parsimonious trees
RANDOM ADDITION SEQUENCE REPLICATES
Tree
SUCCESS
FAILURE
FAILURE
Length
Branch
Swapping
Branch Swapping
Branch Swapping
Local
Minimum
Local
GLOBAL
Minima
MINIMUM
46
Multiple most parsimonious trees
  • Many parsimony analyses yield multiple equally
    optimal trees
  • Multiple trees are due to either
  • Alternative equally parsimonious optimizations of
    homoplastic characters
  • Missing data
  • Or both
  • We can further select among these trees with
    additional criteria, but
  • Most commonly relationships common to all the
    optimal trees are summarized with consensus trees

47
Consensus methods
  • A consensus tree is a summary of the agreement
    among a set of fundamental trees
  • There are many different consensus methods that
    differ in
  • 1. the kind of agreement
  • 2. the level of agreement
  • Consensus methods can be used with any types of
    tree - not just parsimony

48
Strict consensus methods
  • Strict consensus methods require agreement across
    all the fundamental trees
  • They show only those relationships that are
    unambiguously supported by the parsimonious
    interpretation of the data
  • The commonest method (strict component consensus)
    focuses on clades
  • This method produces a consensus tree that
    includes all and only those clades found in all
    the fundamental trees
  • Other relationships (those in which the
    fundamental trees disagree) are shown as
    unresolved polytomies

49
Strict consensus methods
TWO FUNDAMENTAL TREES
A
B
C
D
E
F
G
B
E
F
G
A
C
D
B
D
F
G
A
C
E
STRICT COMPONENT CONSENSUS TREE
50
Majority-rule consensus methods
  • Majority-rule consensus methods require agreement
    across a majority of the fundamental trees
  • May include relationships that are not supported
    by the most parsimonious interpretation of the
    data
  • The commonest method focuses on clades
  • This method produces a consensus tree that
    includes all and only those clades found in a
    majority (gt50) of the fundamental trees
  • Other relationships are shown as unresolved
    polytomies
  • Of particular use in bootstrapping

51
Majority rule consensus
THREE FUNDAMENTAL TREES
A
B
C
D
E
F
G
B
E
F
G
A
C
D
B
E
D
G
A
C
F
A
B
C
E
D
F
G
66
100
66
66
Numbers indicate frequency of clades in the
fundamental trees
66
MAJORITY-RULE COMPONENT CONSENSUS TREE
52
Reduced consensus methods
  • Focuses upon any cladistic relationships
    (statements that some taxa are more closely
    related to each other than to some other taxa)
  • Reduced consensus methods occur in strict and
    majority-rule varieties
  • Other relationships are shown as unresolved
    polytomies
  • May be more sensitive than methods focusing only
    on clades

53
Reduced consensus methods
TWO FUNDAMENTAL TREES
B
D
F
G
A
G
B
C
D
E
F
A
C
E
B
D
F
A
C
E
Strict component consensus
completely unresolved
STRICT REDUCED CLADISTIC CONSENSUS TREE
Taxon G is excluded
54
Consensus methods - 2
strict reduced cladistic
Three fundamental trees
strict (component)
Euplotes excluded
Ochromonas
Ochromonas
Symbiodinium
Symbiodinium
Symbiodinium
Prorocentrum
Prorocentrum
Loxodes
Prorocentrum
Loxodes
Tetrahymena
Loxodes
Tetrahymena
Spirostomumum
Tracheloraphis
Tracheloraphis
Tetrahymena
Euplotes
Spirostomum
Spirostomum
Gruberia
Euplotes
Tracheloraphis
Ochromonas
Gruberia
Symbiodinium
Gruberia
Prorocentrum
Ochromonas
Loxodes
majority-rule
Tetrahymena
Spirostomumum
Euplotes
Tracheloraphis
100
Gruberia
Ochromonas
100
Symbiodinium
100
66
Prorocentrum
Loxodes
66
Tetrahymena
Euplotes
Spirostomumum
100
Tracheloraphis
Gruberia
55
Consensus methods
  • Use strict methods to identify those
    relationships unambiguously supported by
    parsimonious interpretation of the data
  • Use reduced methods where consensus trees are
    poorly resolved
  • Use majority-rule methods in bootstrapping
  • Avoid other methods which have ambiguous
    interpretations

56
Parsimony - advantages
  • A simple method - easily understood operation
  • Does not seem to depend on an explicit model of
    evolution
  • Gives both trees and associated hypotheses of
    character evolution
  • Should give reliable results if the data is well
    structured and homoplasy is either rare or
    randomly distributed on the tree

57
(No Transcript)
58
Parsimony - disadvantages
  • May give misleading results if homoplasy is
    common or concentrated in particular parts of the
    tree, e.g
  • thermophilic convergence
  • base composition biases
  • long branch attraction
  • Underestimates branch lengths
  • Model of evolution is implicit - behaviour of
    method not well understood
  • Parsimony often justified on purely philosophical
    grounds - we must prefer simplest hypotheses -
    particularly by morphologists
  • For most molecular systematists this is
    uncompelling

59
Parsimony can be inconsistent
  • Felsenstein (1978) developed a simple model
    phylogeny including four taxa and a mixture of
    short and long branches
  • Under this model parsimony will give the wrong
    tree

A
B
Parsimony tree
Long branches are attracted but the similarity
is homoplastic
Model tree
C
A
Rates or
p
p
Branch lengths
q
p gtgt q
Wrong
q
q
B
D
C
D
  • With more data the certainty that parsimony will
    give the wrong tree increases - so that parsimony
    is statistically inconsistent.
  • Advocates of parsimony initially responded by
    claiming that Felsensteins result showed only
    that his model was unrealistic.
  • It is now recognized that the long-branch
    attraction (the Felsenstein Zone) is one of the
    most serious problems in phylogenetic inference.

60
2. Perfect Phylogeny
  • Data on species is given by a Character State
    Matrix.
  • Cell (p, i) has value j iff character i of object
    (species) p has state j .
  • Goal constructing evolution tree for the species.

61
Motivation Evolution Tree
Internal nodes correspond to speciation events,
where some character (attribute) is
acquired. Assumptions 1. No reversals
(characters are not lost) 2. No convergences (a
character is created only once)
62
(No Transcript)
63
Perfect Phylogeny for a 0-1 Matrix
  • A 0-1 matrix Each character is either 0 (non
    exists) or 1 (exists).
  • Each of the n objects label exactly one leaf of T
  • Each of the m characters labels exactly one edge
    of T
  • Object p has exactly the characters labeling the
    path from p to the root.
  • A perfect phylogeny for the matrix Tree with no
    convergence, no reversals.

2
3
1
4
E
B
D
5
A
C
64
The (Binary) Perfect Phylogeny Problem
  • Problem Given a 0-1 matrix M, determine if it
    has a perfect phylogeny, and construct one if it
    does.
  • (Note edges are labeled by characters edge
    labeled by i represent changing character is
    state from 0 to 1).

65
Solution to Perfect Phylogeny Problem
  • Definition Given a 0-1 matrix M, Okj Mjk1
    i.e., Ok is the set of objects that have
    character k.
  • Theorem M has a perfect phylogenetic tree iff
    the sets Oi are laminar, ie for all i, j,
    either Oi and Oj are disjoint, or one includes
    the other.

Laminar
Not Laminar
66
Proof
  • ? Assume M has a perfect phylogeny, and let
    i, j be given.
  • Consider the edges labeled i and j.
  • Case 1 There is a root to leaf path containing
    both. Then one is included in the other (2 and 1
    below).
  • Case 2 not case 1. Then they are disjoint (2 and
    3 below).

2
3
1
4
E
D
B
5
A
C
67
Proof (cont.)
  • ? Assume for all i, j, either Oi and Oj are
    disjoint, or one includes the other. We prove by
    induction on the number of characters that it
    has.
  • Basis one character. Then there are at most two
    objects, one with and one without this character.

68
Proof (cont.)
  • ? Induction step Assume correctness for n-1
    characters, and consider a matrix with n
    characters (non-zero columns).
  • WLOG assume that O1 is not contained in Oj for j
    gt 1.
  • Let S1 be the set of objects that have character
    1, and S2 be the remaining objects. Then each
    character belongs to objects in S1 or S2, but not
    both. By inductive hypothesis there are trees T1
    and T2 for S1 and S2. Combining them as below
    gives the desired tree.

69
Efficient Implementation
  • 1. Sort the columns by decreasing value when
    considered as binary numbers. (Time complexity
    O(mn), using radix sort).
  • Claim If the binary value of column i is larger
    than that of column j, then Oi is not a proper
    subset of Oj.
  • Proof Oi Oj gt 0 means the 1s in Oi are not
    covered by the 1s in Oj.

70
Efficient Implementation (2)
  • 2. Make a backwards linked list of the 1s in
    each row (leftmost 1 in each row points at
    itself). Time complexity O(mn).

4
5
3
1
2
Claim If the columns are sorted, then the set of
columns is laminar iff for each column i, all
the links leaving column i point at the same
column. Can be checked in O(mn) time.
0
0
0
1
1
A
0
0
1
0
0
B
0
1
0
1
1
C
1
0
1
0
0
D
0
0
0
0
1
E
71
Examples
laminar
4
5
3
1
2
0
0
0
1
1
A
0
0
1
0
0
B
0
1
0
1
1
C
1
0
1
0
0
D
0
0
0
0
1
E
72
Efficient Implementation (3)
  • 3. When the matrix is laminar, the tree edges
    corresponding to characters are defined by the
    backwards links in the matrix.

Remaining edges and leaves are determined by the
characters of each object. Need O(mn) time.
2
3
1
4
E
D
B
5
A
C
Write a Comment
User Comments (0)