Title: Phylogenetic inference
1Phylogenetic inference
2Phylogenetic inference
- The question is how do we find the best tree?
- Many methods using many techniques, many software
packages - New improved methods and tests appear in the
literature constantly - WHY SO MANY?
- For molecular data, the trend is towards using
methods based on explicit models based on
realistic assumptions
3Classification of phylogenetic methods
4Discrete data and Distances
5Algorithms versus optimality criteria
- Phylogenetic inference is an estimation procedure
(best estimate) - Only have information about the contemporary
molecules (and organisms) - How do we choose a tree from the set of all
possible trees? - Two basic approaches
- Algorithmic just follow a sequence of steps
- Optimality criterion how do we compare trees?
6Algorithmic methods
- Combine tree inference and the definition of a
preferred tree into a single statement - Include UPGMA and all forms of pair-group cluster
analysis, and neighbor joining - Computationally fast because they go straight to
the final solution - The task of finding an optimal tree can not be
separated from that of evaluating a specific tree
7Optimality criteria
- Two logical steps
- Define an optimality criterion (objective
function for evaluating trees) - Find the tree(s) with the best value for the
objective function (may use algorithms) - Evolutionary assumptions made in the first step
are decoupled from the computations involved in
the second step - Price for logical clarity is that these methods
can be very slow
8Use of algorithms
- Different use in the two approaches
- In purely algorithmic methods, the algorithm
defines the tree selection criterion and is
fundamental - In criterion-based methods, algorithms are merely
tools used in evaluating and searching for
optimal trees - Reliability of the tree?
9Optimality criteria
- Parsimony select the tree that minimizes the
total tree length (number of steps or character
transformations required to explain a given set
of data) - Methods based on models of evolutionary change
assumptions are made explicit. - Is parsimonys model-free nature an advantage
or a disadvantage? - What are the assumptions for parsimony
(consistency?)
10Optimality methods (cont.)
- Maximum likelihood evaluates the probability
that a proposed model of evolution and the
hypothesized history could give rise to the
observed data (attempts to estimate the actual
amount of change or history) - Usually more consistent estimates have lower
variance than other methods robust to violations
of assumptions
11Optimality criteria (cont.)
- Pairwise distance methods also minimize the
effect of multiple hits when using appropriate
model to estimate the true evolutionary distance
between two sequences (less desirable than full
ML) - Additive and ultrametric distances can be fitted
to a tree such that all pairwise distances are
equal to the sum of the branches along the path
connecting them in the tree
12TREES AND DISTANCES
- Measures of sequence similarity are used to
estimate evolutionary changes that occurred
between 2 sequences - These measures quantify the evolutionary distance
between the 2 sequences - Trees can represent these distances.
- This has motivated a range of tree building
techniques to convert pairwise distances into
evolutionary trees.
13A distance measure can be used to build
phylogenies IF it satisfies some basic
requirements...
- It must be a metric
- It must be additive
141. Metric Distances
- Let d (a,b) be the distance between two sequences
a and b - A distance d is a metric if it satisfies 4
conditions - d (a,b) 0 (non-negativity)
- d (a,b) d (b,a) (symmetry)
- d (a,c) d (a,b) d (b,c) (triangle
inequality) - d (a,b) 0 if and only if ab
(distinctness)
15Triangle inequality d (a,c) lt d (a,b) d (b,c)
The distance between any pair of sequences must
be no greater than that between those sequences
and a third sequence
16Of these 4 conditions...
- non-negativity
- symmetry
- triangle inequality
- distinctness
- 1, 2 and 4 are generally true for most measures
of sequence dissimilarity calculated directly
from sequences. - Indirect measures of sequence dissimilarity such
as DNADNA hybridization and immunological
distance often fail condition 2 (symmetry)
17A distance is ultrametric if it satisfies the
additional three-point condition
- d (a,b) maximum d (a,c), d (b,c)
- This implies that 2 of the 3 pairwise distances
between 3 taxa are equal, and at least as large
as the third, defining an isosceles triangle.
18Ultrametric Distances have the very useful
property of implyinga constant rate of evolution
- In fact we have a test for the molecular clock
called the Relative Rate Test which is simply
a test of how far the pairwise distances between
3 sequences depart from Ultrametricity. - If distances between sequences are Ultrametric
then the most similar sequences are also the most
closely related
192. Additive Distances
- Being Metric or Ultrametric is necessary but not
sufficient to ensure a measure of change is valid
and fits a tree exactly. - It must also satisfy the four-point condition
- d(a,b) d(c,d) maximum d(a,c) d(b,d),
d(a,d) d(b,c) - This is equivalent to requiring that of the three
sums d(a,b) d(c,d), d(a,c) d(b,d) and d(a,d)
d(b,c) the two largest are equal.
20d (a,b) d (c,d), d (a,c) d (b,d) d (a,d) d
(b,c)
of the three sums...
Ae ef fB Ce ef fD
Ae eC Bf fD
Ae ef fD Ce ef fB
...the two largest are equal.
21An additive distance measure defines a tree...
- Sequence d is equidistant from all other
sequences - Sequence c is equidistant from a and b
- For any 2 sequences the value in the distance
matrix corresponds to the sum of the branch
lengths along the path between the 2 sequences on
the tree. - The presented tree is ultrametric (draw the
isosceles triangle)
22When distances are not ultrametric but only
metric they can still be represented by a tree..
An additive tree
Also represents additive distances exactly...
23- Notice that sequences b and c are the most
similar (3), but ARE NOT the most closely related - Similarity and evolutionary relationship will
only coincide exactly if the distances are
ultrametric - While this tree is additive, it is not ultrametric
24- Observed distances are obtained directly from the
sequences themselves and patristic distances
from a tree - For additive and ultrametric distances, the
observed and tree distances match exactly
For real data this is rarely the case, indicating
that observed distances cannot be completely
accurately represented by a tree.
25The discrepancy between observed and tree
distances can be used as an indicator of how well
observed distances fit a tree like
representation.
26Distance methods
- Experimentally derived distances are estimates of
true distances - We want to fit them to a mathematical model
(additive tree) and find the optimal value for
the adjustable parameters - Branching pattern
- Branch lenghts
- Some methods based on this optimality criterion
are Fitch Margoliash, minumum evolution - Other methods fit trees to distances
algorithmically (NJ, UPGMA)
27UPGMA an algorithmic method
- Cluster analysis Unweighted pair group method
using arithmetic averages (Sneath and Sokal 1973) - Assumes ultrametricity
28UPGMA example
- Given a matrix of pairwise distances, find the
clusters (taxa) i and j such that dij is the Min
value in the table - Define the depth of the branching between i and j
(lij) to be dij/2 - If i and j were the last two clusters, the tree
is complete. Otherwise, create a new cluster
called u - Define a distance from u to each other cluster k
(with k ? i or j) to be the average of the
distances dki and dkj - Go back to step 1 with one less cluster clusters
i and j have been eliminated, and cluster u has
been added
29Distance Matrices and phenogram
30Classification of phylogenetic methods
31Parsimony methods
- The most widely-used method, familiar notion in
science (simplicity) - Shared attributes among taxa are inherited from
common ancestors - When character conflicts occur, ad hoc hypotheses
cannot be avoided if you want to explain all the
data, and assumptions of homoplasy must be invoked
32Parsimony
- From all sets of possible trees, find all trees ?
such that L(?) is minimal - B is the number of branches
- N is the number of characters
- k and k are the two nodes incident to each
branch k - xkj and xkj represent either elements of the
input matrix or optimal-character assignments
made to internal nodes - Diff(y,z) is a function specifying the cost of
transformation from y to z along any branch (for
unrooted trees diff(y,z)diff(z,y) - The coefficient w is the weight assigned to each
character
33Parsimony
From the set of all possible trees, find all
trees ? such that L(?) is minimal
- Two problems
- How do you actually implement the objective
function to evaluate a particular tree? - How do you find all minimal trees when you have
many taxa? - Objective function and tree-searches can be
formalized into algorithms
34Fitch and Wagner Parsimony
- Simplest parsimony methods
- Fitch unordered multistate characters
- Wagner binary,ordered multistate and continuous
- In describing the algorithm we will consider a
single character (j) in isolation (independence) - The tree (strictly bifurcating) is a given, just
any tree we wish to evaluate (tree-search
algorithms, later), we only need to assign an
arbitrary root (any taxon), denoted r
35Fitch parsimony The algorithm
- To each terminal node i (including the one at
the root), assign a state set Si containing the
character state assigned to the corresponding
taxon in the input data matrix (Xij)
36Post-order traversal
- Visit an internal node k for which a state set Sk
has not been defined but for which the state sets
of ks two immediate descendants (Si and Sj) has
been defined. Assign to k a state set Sk
according to the following rules
37then
then
and increase the length by 1
The intersection of Sb and Sc is empty, so we use
the union as Sx and increase L by 1 (L 1)
38Post-order traversal
- If node k is located at the basal fork of the
tree (I.e., the immediate descendant of the
terminal node placed at the root). The traversal
has been completed proceed to step 4. Otherwise
return to step 2.
39Return to step 2
The intersection of Sx and Sd is not empty, so we
use the intersection as Sy and do not increase
L (L remains 1)
40Return to step 2
The intersection of Sy and Se is empty, so we use
the union as Sz and increase L by 1 (now, L 2)
41Post-order traversal
- If the state assigned to the terminal node at the
root of the tree (Xr) is not contained in the
state set just assigned to the node at the basal
fork of the tree (Sk), increase the tree length
by 1
In this case state C is not included in state set
A,G, so we increase L by 1 to (now, L3) This
procedure concludes the evaluation but does not
assign optimal character states to the internal
nodes
42Pre-order traversal
- To obtain a most parsimonious reconstruction
(MPR) and assign optimal character states to each
node we need to make a second pass over the tree - This time from the root to the tips
43Pre-order traversal
- Visit an internal node k for which an optimal
state assignment Xk has not yet been made but for
which an assignment has been made to ks
immediate ancestor, denoted m - If Xm is contained in the state set assigned to k
in the first pass (Sk), assign this state to k as
well. Otherwise arbitrarily assign any state from
Sk to k
44Pre-order traversal
- Two MPRs exist for this case according to which
state we arbitrarily choose for the set A,G at
the basal fork - A third option is to assign C to all three
internal nodes
45Other parsimony variants
- Dollo parsimony every derived character must be
uniquely derived (originate only once in the
tree) - Homoplasy only reversals are allowed (no
parallelism or convergence) - In practice, Dollo parsimony does not require
inclusion of hypothetical ancestors just
character polarity (unrooted Dollo) - Convenient for restriction-site characters
(easier to loose that to gain a site)
46Dollo parsimony and RFLP data
Relaxed Dollo criterion, may be applied using
generalized parsimony
47Generalized Parsimony
- All parsimony variants can be subsumed into a
generalized method that assigns a cost for each
possible transformation - Costs are represented in a m-by-m cost matrix S,
where each element Sij represents the increase in
tree length due to a transformation from state i
to j - The cost of each transformation (weight) can be
determined a priori (e.g. for RFLPs or for
transition/transversion changes) or a posteriori
(using the same data, e.g. successive
approximations method)
48Generalized Parsimony Cost matrices
49Protein parsimony
- A 20x20 matrix specifies the cost for each
possible transformation - The matrix may be based on the genetic code
(PROTPARS matrix) and/or the biochemical
properties of the amino acids themselves (Dayhoff
matrices)
50Parsimony
- From all sets of possible trees, find all trees ?
such that L(?) is minimal - B is the number of branches
- N is the number of characters
- Diff(y,z) is a function specifying the cost of
transformation from y to z along any branch (for
unrooted trees diff(y,z)diff(z,y) -- Cost matrix - The coefficient w is the weight assigned to each
character (weighted parsimony) - A priori or a posteriori
51Difference in perspective MP and ML
- Parsimony seeks solutions that minimize the
amount of change required to explain the data
(underestimates superimposed changes) - ML attempts to estimate the actual amount of
change (by specifying the evolutionary model that
will account for the data with the highest
likelihood) - Methods that incorporate models of evolutionary
change can make more efficient use of the data
52Searching for optimal trees
- Methods with explicit optimality criteria
- Parsimony
- Maximum likelihood
- Additive-tree distance
- Separate the problem of
- evaluating the tree
- finding the optimal tree(s)
- Can we evaluate all possible trees for a
particular problem?
53Searching for optimal trees
- For small to moderate data sets, with as many as
8-20 taxa, we can use exact methods - Exact methods guarantee the discovery of all
optimal trees - Exact methods include
- Exhaustive search
- Branch-and-bound search
54How many trees?
55Exhaustive search enumerate al possible trees
56- Branch-and-bound search
- Does not require exhaustive search and yet
provides an exact solution. - Traverse a search tree in a depth-first sequence
(to get an initial tree, could be a random tree
but better use heuristics to make BB more
efficient) - Use this tree score as the upper bound (L) on
optimal value of chosen criterion.
57Branch-and-bound 3. Move along path to tips and
evaluate trees. If tree is gtL then dispense the
rest of that path. 4. If a better tree is found
with L lt L, we now have improved the upper bound
on the score of the optimal tree. This may enable
us to dispense of other paths and finish the
search more quickly
58Approximate methods
- For larger data sets computing time becomes
prohibitive and we only explore some subset of
all possible trees (hoping that the optimal trees
will be found in the subset explored) - Heuristic approaches sacrifice the guarantee of
optimality in favor of reduced computer time - Use hill climbing methods. Initial tree starts
the process, then we seek to improve its score - When we can find no way to further improve the
score, we stop.We dont know if we reached a
local or a global optimum
59Initial trees
- May be obtained by stepwise addition, the most
commonly used method - Similar to exhaustive search but evaluate trees
at every step, each time you add a new taxon and
only follow the path derived from the optimal
tree - Which taxa do you choose first? Which do you
connect next? - These are greedy algorithms
60Stepwise addition
61- Initial trees also may be obtained by star
decomposition, another greedy algorithm
62Branch swapping
- To improve the initial estimate we can perform
sets of predefined rearrangements on the tree - Any of these rearrangements amounts to a stab in
the dark - Globally optimal trees may be several
rearrangements away from the starting tree - If a better tree is found, a new round of
rearrangements is then performed in the new tree - Several branch-swapping algorithms are available
63Branch swapping by tree bisection and
reconnection (TBR) 1. Tree is bisected along a
branch, yielding two disjunct subtrees 2. The
subtrees are reconnected by joining a pair of
branches, one from each subtree 3. All possible
bisections and pairwise reconnections are
evaluated
64Branch swapping by subtree prunning and
regrafting (SPR) 1. A subtree is pruned from the
tree (e.g. A,B) 2. The subtree is then regrafted
to a different location on the tree 3. All
possible subtree removals and reattachment points
are evaluated
65Branch swapping by nearest-neighbor interchanges
(NNI) 1. Each interior branch of the tree
defines a local region of four subtrees
2. Interchanging a subtree on one side of the
branch with one from the other constitutes an
NNI 3. Two such rearrangements are possible for
each interior branch (all interior branches are
swapped)
66Landscapes and the problem of islands of trees
67Bayesian Inference of Phylogenies
- Closely related to ML methods, differing only in
the use of a PRIOR DISTRIBUTION (which would
typically be a tree) - Use of a prior enables us to interpret the result
as the distribution of the tree given the data - Bayes described this in 1790, and controversy
among statisticians over its appropriateness is
almost that old - Recently, the introduction of Markov Chain Monte
Carlos (MCMC) methods has given a new impetus to
Bayesian inference - The latest silver bullet for phylogenetic
analysis?
68Simple example of Bayesian inference
Box with 90 fair and 10 biased dice
Take a die at random from the box and roll it
twice get a 4 and a 6 What is the probability
that the die is biased?
69A Bayesian analysis combines ones prior beliefs
about the probability of a hypothesis with its
likelihood
Likelihood assuming a fair die
Likelihood assuming a biased die
Probability of observing the data is 1.96 times
greater under the hypothesis that the die is
biased
70Bayesian inference is based upon the POSTERIOR
probability of a hypothesis
The posterior probability that the die is biased
can be obtained using Bayess formula
Our opinion of the die being biased changed from
0.1 to 0.179 after observing a 4 and a 6 Priors
are a strength or a weakness of the method?
71Bayesian inference of phylogeny
Based upon the posterior probability of a
phylogenetic tree ( )
Posterior probability of the ith tree, can be
interpreted as the probability that this tree is
the correct tree given the data
prior probability of the ith tree, typically
The summation in the denominator is over all B
trees possible for s species (taxa)
72Markov Chain Monte Carlo
Typically, the posterior probability
cannot be calculated analytically. However,
the posterior probability of phylogeniers can be
approximated by sampling trees from the posterior
probability distribution. MCMC can be used to
sample phylogenie according to their posterior
probabilities Let
be a specific tree, combination of branch
lengths, substitution parameters, and gamma shape
parameter The MH algorithm is an MCMC allgoritm
that has been successfully used to approximate
the posterior probability of trees
73The MH algorithm constructs a Markov chain that
has as its stationary frequency the posterior
probability of interest (in this case the joint
posterior prob of ) The
current state of the chain is denoted A
new state is then proposed The new state is
accepted with probability
74A uniform random number is drawn and if this
number is lt R, then the proposed change is
accepted. Otherwise the chain remains in the
original state. This process of proposing a new
state, calculating the acceptance probability,
and accepting or rejecting the move is repeated
thousands of times. The sequence of states
visited forms the Markov Chain. The chain is
sampled after it reached stationarity and the
sampled trees represent the posterior prob
distribution
75The proportion of times a single tree is found
among these samples is the posteriror prob of
that tree A majority rule consensus can be
derived from the sample and the proportions
obtained for each clade are an approximation of
the posterior prob of the clades