Phylogenetic inference

About This Presentation

Title:

Phylogenetic inference

Description:

Algorithms versus ... with the best value for the objective function (may use algorithms) ... In criterion-based methods, algorithms are merely tools used ... – PowerPoint PPT presentation

Number of Views:260

Avg rating:3.0/5.0

Slides: 76

Provided by: Guille83

Category:

more less

Transcript and Presenter's Notes

Title: Phylogenetic inference

1
Phylogenetic inference
2
Phylogenetic inference

The question is how do we find the best tree?
Many methods using many techniques, many software
packages
New improved methods and tests appear in the
literature constantly
WHY SO MANY?
For molecular data, the trend is towards using
methods based on explicit models based on
realistic assumptions

3
Classification of phylogenetic methods
4
Discrete data and Distances
5
Algorithms versus optimality criteria

Phylogenetic inference is an estimation procedure
(best estimate)
Only have information about the contemporary
molecules (and organisms)
How do we choose a tree from the set of all
possible trees?
Two basic approaches
Algorithmic just follow a sequence of steps
Optimality criterion how do we compare trees?

6
Algorithmic methods

Combine tree inference and the definition of a
preferred tree into a single statement
Include UPGMA and all forms of pair-group cluster
analysis, and neighbor joining
Computationally fast because they go straight to
the final solution
The task of finding an optimal tree can not be
separated from that of evaluating a specific tree

7
Optimality criteria

Two logical steps
Define an optimality criterion (objective
function for evaluating trees)
Find the tree(s) with the best value for the
objective function (may use algorithms)
Evolutionary assumptions made in the first step
are decoupled from the computations involved in
the second step
Price for logical clarity is that these methods
can be very slow

8
Use of algorithms

Different use in the two approaches
In purely algorithmic methods, the algorithm
defines the tree selection criterion and is
fundamental
In criterion-based methods, algorithms are merely
tools used in evaluating and searching for
optimal trees
Reliability of the tree?

9
Optimality criteria

Parsimony select the tree that minimizes the
total tree length (number of steps or character
transformations required to explain a given set
of data)
Methods based on models of evolutionary change
assumptions are made explicit.
Is parsimonys model-free nature an advantage
or a disadvantage?
What are the assumptions for parsimony
(consistency?)

10
Optimality methods (cont.)

Maximum likelihood evaluates the probability
that a proposed model of evolution and the
hypothesized history could give rise to the
observed data (attempts to estimate the actual
amount of change or history)
Usually more consistent estimates have lower
variance than other methods robust to violations
of assumptions

11
Optimality criteria (cont.)

Pairwise distance methods also minimize the
effect of multiple hits when using appropriate
model to estimate the true evolutionary distance
between two sequences (less desirable than full
ML)
Additive and ultrametric distances can be fitted
to a tree such that all pairwise distances are
equal to the sum of the branches along the path
connecting them in the tree

12
TREES AND DISTANCES

Measures of sequence similarity are used to
estimate evolutionary changes that occurred
between 2 sequences
These measures quantify the evolutionary distance
between the 2 sequences
Trees can represent these distances.
This has motivated a range of tree building
techniques to convert pairwise distances into
evolutionary trees.

13
A distance measure can be used to build
phylogenies IF it satisfies some basic
requirements...

It must be a metric
It must be additive

14
1. Metric Distances

Let d (a,b) be the distance between two sequences
a and b
A distance d is a metric if it satisfies 4
conditions
d (a,b) 0 (non-negativity)
d (a,b) d (b,a) (symmetry)
d (a,c) d (a,b) d (b,c) (triangle
inequality)
d (a,b) 0 if and only if ab
(distinctness)

15
Triangle inequality d (a,c) lt d (a,b) d (b,c)
The distance between any pair of sequences must
be no greater than that between those sequences
and a third sequence
16
Of these 4 conditions...

non-negativity
symmetry
triangle inequality
distinctness

1, 2 and 4 are generally true for most measures
of sequence dissimilarity calculated directly
from sequences.
Indirect measures of sequence dissimilarity such
as DNADNA hybridization and immunological
distance often fail condition 2 (symmetry)

17
A distance is ultrametric if it satisfies the
additional three-point condition

d (a,b) maximum d (a,c), d (b,c)

This implies that 2 of the 3 pairwise distances
between 3 taxa are equal, and at least as large
as the third, defining an isosceles triangle.

18
Ultrametric Distances have the very useful
property of implyinga constant rate of evolution

In fact we have a test for the molecular clock
called the Relative Rate Test which is simply
a test of how far the pairwise distances between
3 sequences depart from Ultrametricity.
If distances between sequences are Ultrametric
then the most similar sequences are also the most
closely related

19
2. Additive Distances

Being Metric or Ultrametric is necessary but not
sufficient to ensure a measure of change is valid
and fits a tree exactly.
It must also satisfy the four-point condition
d(a,b) d(c,d) maximum d(a,c) d(b,d),
d(a,d) d(b,c)
This is equivalent to requiring that of the three
sums d(a,b) d(c,d), d(a,c) d(b,d) and d(a,d)
d(b,c) the two largest are equal.

20
d (a,b) d (c,d), d (a,c) d (b,d) d (a,d) d
(b,c)
of the three sums...
Ae ef fB Ce ef fD
Ae eC Bf fD
Ae ef fD Ce ef fB
...the two largest are equal.
21
An additive distance measure defines a tree...

Sequence d is equidistant from all other
sequences
Sequence c is equidistant from a and b

For any 2 sequences the value in the distance
matrix corresponds to the sum of the branch
lengths along the path between the 2 sequences on
the tree.
The presented tree is ultrametric (draw the
isosceles triangle)

22
When distances are not ultrametric but only
metric they can still be represented by a tree..
An additive tree
Also represents additive distances exactly...
23

Notice that sequences b and c are the most
similar (3), but ARE NOT the most closely related
Similarity and evolutionary relationship will
only coincide exactly if the distances are
ultrametric
While this tree is additive, it is not ultrametric

Observed distances are obtained directly from the
sequences themselves and patristic distances
from a tree
For additive and ultrametric distances, the
observed and tree distances match exactly

For real data this is rarely the case, indicating
that observed distances cannot be completely
accurately represented by a tree.
25
The discrepancy between observed and tree
distances can be used as an indicator of how well
observed distances fit a tree like
representation.
26
Distance methods

Experimentally derived distances are estimates of
true distances
We want to fit them to a mathematical model
(additive tree) and find the optimal value for
the adjustable parameters
Branching pattern
Branch lenghts
Some methods based on this optimality criterion
are Fitch Margoliash, minumum evolution
Other methods fit trees to distances
algorithmically (NJ, UPGMA)

27
UPGMA an algorithmic method

Cluster analysis Unweighted pair group method
using arithmetic averages (Sneath and Sokal 1973)
Assumes ultrametricity

28
UPGMA example

Given a matrix of pairwise distances, find the
clusters (taxa) i and j such that dij is the Min
value in the table
Define the depth of the branching between i and j
(lij) to be dij/2
If i and j were the last two clusters, the tree
is complete. Otherwise, create a new cluster
called u
Define a distance from u to each other cluster k
(with k ? i or j) to be the average of the
distances dki and dkj
Go back to step 1 with one less cluster clusters
i and j have been eliminated, and cluster u has
been added

29
Distance Matrices and phenogram
30
Classification of phylogenetic methods
31
Parsimony methods

The most widely-used method, familiar notion in
science (simplicity)
Shared attributes among taxa are inherited from
common ancestors
When character conflicts occur, ad hoc hypotheses
cannot be avoided if you want to explain all the
data, and assumptions of homoplasy must be invoked

32
Parsimony

From all sets of possible trees, find all trees ?
such that L(?) is minimal
B is the number of branches
N is the number of characters
k and k are the two nodes incident to each
branch k
xkj and xkj represent either elements of the
input matrix or optimal-character assignments
made to internal nodes
Diff(y,z) is a function specifying the cost of
transformation from y to z along any branch (for
unrooted trees diff(y,z)diff(z,y)
The coefficient w is the weight assigned to each
character

33
Parsimony
From the set of all possible trees, find all
trees ? such that L(?) is minimal

Two problems
How do you actually implement the objective
function to evaluate a particular tree?
How do you find all minimal trees when you have
many taxa?
Objective function and tree-searches can be
formalized into algorithms

34
Fitch and Wagner Parsimony

Simplest parsimony methods
Fitch unordered multistate characters
Wagner binary,ordered multistate and continuous
In describing the algorithm we will consider a
single character (j) in isolation (independence)
The tree (strictly bifurcating) is a given, just
any tree we wish to evaluate (tree-search
algorithms, later), we only need to assign an
arbitrary root (any taxon), denoted r

35
Fitch parsimony The algorithm

To each terminal node i (including the one at
the root), assign a state set Si containing the
character state assigned to the corresponding
taxon in the input data matrix (Xij)

36
Post-order traversal

Visit an internal node k for which a state set Sk
has not been defined but for which the state sets
of ks two immediate descendants (Si and Sj) has
been defined. Assign to k a state set Sk
according to the following rules

37
then

then

Otherwise

and increase the length by 1
The intersection of Sb and Sc is empty, so we use
the union as Sx and increase L by 1 (L 1)
38
Post-order traversal

If node k is located at the basal fork of the
tree (I.e., the immediate descendant of the
terminal node placed at the root). The traversal
has been completed proceed to step 4. Otherwise
return to step 2.

39
Return to step 2
The intersection of Sx and Sd is not empty, so we
use the intersection as Sy and do not increase
L (L remains 1)
40
Return to step 2
The intersection of Sy and Se is empty, so we use
the union as Sz and increase L by 1 (now, L 2)
41
Post-order traversal

If the state assigned to the terminal node at the
root of the tree (Xr) is not contained in the
state set just assigned to the node at the basal
fork of the tree (Sk), increase the tree length
by 1

In this case state C is not included in state set
A,G, so we increase L by 1 to (now, L3) This
procedure concludes the evaluation but does not
assign optimal character states to the internal
nodes
42
Pre-order traversal

To obtain a most parsimonious reconstruction
(MPR) and assign optimal character states to each
node we need to make a second pass over the tree
This time from the root to the tips

43
Pre-order traversal

Visit an internal node k for which an optimal
state assignment Xk has not yet been made but for
which an assignment has been made to ks
immediate ancestor, denoted m
If Xm is contained in the state set assigned to k
in the first pass (Sk), assign this state to k as
well. Otherwise arbitrarily assign any state from
Sk to k

44
Pre-order traversal

Two MPRs exist for this case according to which
state we arbitrarily choose for the set A,G at
the basal fork
A third option is to assign C to all three
internal nodes

45
Other parsimony variants

Dollo parsimony every derived character must be
uniquely derived (originate only once in the
tree)
Homoplasy only reversals are allowed (no
parallelism or convergence)
In practice, Dollo parsimony does not require
inclusion of hypothetical ancestors just
character polarity (unrooted Dollo)
Convenient for restriction-site characters
(easier to loose that to gain a site)

46
Dollo parsimony and RFLP data
Relaxed Dollo criterion, may be applied using
generalized parsimony
47
Generalized Parsimony

All parsimony variants can be subsumed into a
generalized method that assigns a cost for each
possible transformation
Costs are represented in a m-by-m cost matrix S,
where each element Sij represents the increase in
tree length due to a transformation from state i
to j
The cost of each transformation (weight) can be
determined a priori (e.g. for RFLPs or for
transition/transversion changes) or a posteriori
(using the same data, e.g. successive
approximations method)

48
Generalized Parsimony Cost matrices
49
Protein parsimony

A 20x20 matrix specifies the cost for each
possible transformation
The matrix may be based on the genetic code
(PROTPARS matrix) and/or the biochemical
properties of the amino acids themselves (Dayhoff
matrices)

50
Parsimony

From all sets of possible trees, find all trees ?
such that L(?) is minimal
B is the number of branches
N is the number of characters
Diff(y,z) is a function specifying the cost of
transformation from y to z along any branch (for
unrooted trees diff(y,z)diff(z,y) -- Cost matrix
The coefficient w is the weight assigned to each
character (weighted parsimony)
A priori or a posteriori

51
Difference in perspective MP and ML

Parsimony seeks solutions that minimize the
amount of change required to explain the data
(underestimates superimposed changes)
ML attempts to estimate the actual amount of
change (by specifying the evolutionary model that
will account for the data with the highest
likelihood)
Methods that incorporate models of evolutionary
change can make more efficient use of the data

52
Searching for optimal trees

Methods with explicit optimality criteria
Parsimony
Maximum likelihood
Additive-tree distance
Separate the problem of
evaluating the tree
finding the optimal tree(s)
Can we evaluate all possible trees for a
particular problem?

53
Searching for optimal trees

For small to moderate data sets, with as many as
8-20 taxa, we can use exact methods
Exact methods guarantee the discovery of all
optimal trees
Exact methods include
Exhaustive search
Branch-and-bound search

54
How many trees?
55
Exhaustive search enumerate al possible trees
56

Branch-and-bound search
Does not require exhaustive search and yet
provides an exact solution.
Traverse a search tree in a depth-first sequence
(to get an initial tree, could be a random tree
but better use heuristics to make BB more
efficient)
Use this tree score as the upper bound (L) on
optimal value of chosen criterion.

57
Branch-and-bound 3. Move along path to tips and
evaluate trees. If tree is gtL then dispense the
rest of that path. 4. If a better tree is found
with L lt L, we now have improved the upper bound
on the score of the optimal tree. This may enable
us to dispense of other paths and finish the
search more quickly
58
Approximate methods

For larger data sets computing time becomes
prohibitive and we only explore some subset of
all possible trees (hoping that the optimal trees
will be found in the subset explored)
Heuristic approaches sacrifice the guarantee of
optimality in favor of reduced computer time
Use hill climbing methods. Initial tree starts
the process, then we seek to improve its score
When we can find no way to further improve the
score, we stop.We dont know if we reached a
local or a global optimum

59
Initial trees

May be obtained by stepwise addition, the most
commonly used method
Similar to exhaustive search but evaluate trees
at every step, each time you add a new taxon and
only follow the path derived from the optimal
tree
Which taxa do you choose first? Which do you
connect next?
These are greedy algorithms

60
Stepwise addition
61

Initial trees also may be obtained by star
decomposition, another greedy algorithm

62
Branch swapping

To improve the initial estimate we can perform
sets of predefined rearrangements on the tree
Any of these rearrangements amounts to a stab in
the dark
Globally optimal trees may be several
rearrangements away from the starting tree
If a better tree is found, a new round of
rearrangements is then performed in the new tree
Several branch-swapping algorithms are available

63
Branch swapping by tree bisection and
reconnection (TBR) 1. Tree is bisected along a
branch, yielding two disjunct subtrees 2. The
subtrees are reconnected by joining a pair of
branches, one from each subtree 3. All possible
bisections and pairwise reconnections are
evaluated
64
Branch swapping by subtree prunning and
regrafting (SPR) 1. A subtree is pruned from the
tree (e.g. A,B) 2. The subtree is then regrafted
to a different location on the tree 3. All
possible subtree removals and reattachment points
are evaluated
65
Branch swapping by nearest-neighbor interchanges
(NNI) 1. Each interior branch of the tree
defines a local region of four subtrees
2. Interchanging a subtree on one side of the
branch with one from the other constitutes an
NNI 3. Two such rearrangements are possible for
each interior branch (all interior branches are
swapped)
66
Landscapes and the problem of islands of trees
67
Bayesian Inference of Phylogenies

Closely related to ML methods, differing only in
the use of a PRIOR DISTRIBUTION (which would
typically be a tree)
Use of a prior enables us to interpret the result
as the distribution of the tree given the data
Bayes described this in 1790, and controversy
among statisticians over its appropriateness is
almost that old
Recently, the introduction of Markov Chain Monte
Carlos (MCMC) methods has given a new impetus to
Bayesian inference
The latest silver bullet for phylogenetic
analysis?

68
Simple example of Bayesian inference
Box with 90 fair and 10 biased dice
Take a die at random from the box and roll it
twice get a 4 and a 6 What is the probability
that the die is biased?
69
A Bayesian analysis combines ones prior beliefs
about the probability of a hypothesis with its
likelihood
Likelihood assuming a fair die
Likelihood assuming a biased die
Probability of observing the data is 1.96 times
greater under the hypothesis that the die is
biased
70
Bayesian inference is based upon the POSTERIOR
probability of a hypothesis
The posterior probability that the die is biased
can be obtained using Bayess formula
Our opinion of the die being biased changed from
0.1 to 0.179 after observing a 4 and a 6 Priors
are a strength or a weakness of the method?
71
Bayesian inference of phylogeny
Based upon the posterior probability of a
phylogenetic tree ( )
Posterior probability of the ith tree, can be
interpreted as the probability that this tree is
the correct tree given the data
prior probability of the ith tree, typically
The summation in the denominator is over all B
trees possible for s species (taxa)
72
Markov Chain Monte Carlo
Typically, the posterior probability
cannot be calculated analytically. However,
the posterior probability of phylogeniers can be
approximated by sampling trees from the posterior
probability distribution. MCMC can be used to
sample phylogenie according to their posterior
probabilities Let
be a specific tree, combination of branch
lengths, substitution parameters, and gamma shape
parameter The MH algorithm is an MCMC allgoritm
that has been successfully used to approximate
the posterior probability of trees
73
The MH algorithm constructs a Markov chain that
has as its stationary frequency the posterior
probability of interest (in this case the joint
posterior prob of ) The
current state of the chain is denoted A
new state is then proposed The new state is
accepted with probability
74
A uniform random number is drawn and if this
number is lt R, then the proposed change is
accepted. Otherwise the chain remains in the
original state. This process of proposing a new
state, calculating the acceptance probability,
and accepting or rejecting the move is repeated
thousands of times. The sequence of states
visited forms the Markov Chain. The chain is
sampled after it reached stationarity and the
sampled trees represent the posterior prob
distribution
75
The proportion of times a single tree is found
among these samples is the posteriror prob of
that tree A majority rule consensus can be
derived from the sample and the proportions
obtained for each clade are an approximation of
the posterior prob of the clades

Write a Comment

User Comments (0)