Phylogenetic inference - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

Phylogenetic inference

Description:

Algorithms versus ... with the best value for the objective function (may use algorithms) ... In criterion-based methods, algorithms are merely tools used ... – PowerPoint PPT presentation

Number of Views:260
Avg rating:3.0/5.0
Slides: 76
Provided by: Guille83
Category:

less

Transcript and Presenter's Notes

Title: Phylogenetic inference


1
Phylogenetic inference
2
Phylogenetic inference
  • The question is how do we find the best tree?
  • Many methods using many techniques, many software
    packages
  • New improved methods and tests appear in the
    literature constantly
  • WHY SO MANY?
  • For molecular data, the trend is towards using
    methods based on explicit models based on
    realistic assumptions

3
Classification of phylogenetic methods
4
Discrete data and Distances
5
Algorithms versus optimality criteria
  • Phylogenetic inference is an estimation procedure
    (best estimate)
  • Only have information about the contemporary
    molecules (and organisms)
  • How do we choose a tree from the set of all
    possible trees?
  • Two basic approaches
  • Algorithmic just follow a sequence of steps
  • Optimality criterion how do we compare trees?

6
Algorithmic methods
  • Combine tree inference and the definition of a
    preferred tree into a single statement
  • Include UPGMA and all forms of pair-group cluster
    analysis, and neighbor joining
  • Computationally fast because they go straight to
    the final solution
  • The task of finding an optimal tree can not be
    separated from that of evaluating a specific tree

7
Optimality criteria
  • Two logical steps
  • Define an optimality criterion (objective
    function for evaluating trees)
  • Find the tree(s) with the best value for the
    objective function (may use algorithms)
  • Evolutionary assumptions made in the first step
    are decoupled from the computations involved in
    the second step
  • Price for logical clarity is that these methods
    can be very slow

8
Use of algorithms
  • Different use in the two approaches
  • In purely algorithmic methods, the algorithm
    defines the tree selection criterion and is
    fundamental
  • In criterion-based methods, algorithms are merely
    tools used in evaluating and searching for
    optimal trees
  • Reliability of the tree?

9
Optimality criteria
  • Parsimony select the tree that minimizes the
    total tree length (number of steps or character
    transformations required to explain a given set
    of data)
  • Methods based on models of evolutionary change
    assumptions are made explicit.
  • Is parsimonys model-free nature an advantage
    or a disadvantage?
  • What are the assumptions for parsimony
    (consistency?)

10
Optimality methods (cont.)
  • Maximum likelihood evaluates the probability
    that a proposed model of evolution and the
    hypothesized history could give rise to the
    observed data (attempts to estimate the actual
    amount of change or history)
  • Usually more consistent estimates have lower
    variance than other methods robust to violations
    of assumptions

11
Optimality criteria (cont.)
  • Pairwise distance methods also minimize the
    effect of multiple hits when using appropriate
    model to estimate the true evolutionary distance
    between two sequences (less desirable than full
    ML)
  • Additive and ultrametric distances can be fitted
    to a tree such that all pairwise distances are
    equal to the sum of the branches along the path
    connecting them in the tree

12
TREES AND DISTANCES
  • Measures of sequence similarity are used to
    estimate evolutionary changes that occurred
    between 2 sequences
  • These measures quantify the evolutionary distance
    between the 2 sequences
  • Trees can represent these distances.
  • This has motivated a range of tree building
    techniques to convert pairwise distances into
    evolutionary trees.

13
A distance measure can be used to build
phylogenies IF it satisfies some basic
requirements...
  • It must be a metric
  • It must be additive

14
1. Metric Distances
  • Let d (a,b) be the distance between two sequences
    a and b
  • A distance d is a metric if it satisfies 4
    conditions
  • d (a,b) 0 (non-negativity)
  • d (a,b) d (b,a) (symmetry)
  • d (a,c) d (a,b) d (b,c) (triangle
    inequality)
  • d (a,b) 0 if and only if ab
    (distinctness)

15
Triangle inequality d (a,c) lt d (a,b) d (b,c)
The distance between any pair of sequences must
be no greater than that between those sequences
and a third sequence
16
Of these 4 conditions...
  • non-negativity
  • symmetry
  • triangle inequality
  • distinctness
  • 1, 2 and 4 are generally true for most measures
    of sequence dissimilarity calculated directly
    from sequences.
  • Indirect measures of sequence dissimilarity such
    as DNADNA hybridization and immunological
    distance often fail condition 2 (symmetry)

17
A distance is ultrametric if it satisfies the
additional three-point condition
  • d (a,b) maximum d (a,c), d (b,c)
  • This implies that 2 of the 3 pairwise distances
    between 3 taxa are equal, and at least as large
    as the third, defining an isosceles triangle.

18
Ultrametric Distances have the very useful
property of implyinga constant rate of evolution
  • In fact we have a test for the molecular clock
    called the Relative Rate Test which is simply
    a test of how far the pairwise distances between
    3 sequences depart from Ultrametricity.
  • If distances between sequences are Ultrametric
    then the most similar sequences are also the most
    closely related

19
2. Additive Distances
  • Being Metric or Ultrametric is necessary but not
    sufficient to ensure a measure of change is valid
    and fits a tree exactly.
  • It must also satisfy the four-point condition
  • d(a,b) d(c,d) maximum d(a,c) d(b,d),
    d(a,d) d(b,c)
  • This is equivalent to requiring that of the three
    sums d(a,b) d(c,d), d(a,c) d(b,d) and d(a,d)
    d(b,c) the two largest are equal.

20
d (a,b) d (c,d), d (a,c) d (b,d) d (a,d) d
(b,c)
of the three sums...
Ae ef fB Ce ef fD
Ae eC Bf fD
Ae ef fD Ce ef fB
...the two largest are equal.
21
An additive distance measure defines a tree...
  • Sequence d is equidistant from all other
    sequences
  • Sequence c is equidistant from a and b
  • For any 2 sequences the value in the distance
    matrix corresponds to the sum of the branch
    lengths along the path between the 2 sequences on
    the tree.
  • The presented tree is ultrametric (draw the
    isosceles triangle)

22
When distances are not ultrametric but only
metric they can still be represented by a tree..
An additive tree
Also represents additive distances exactly...
23
  • Notice that sequences b and c are the most
    similar (3), but ARE NOT the most closely related
  • Similarity and evolutionary relationship will
    only coincide exactly if the distances are
    ultrametric
  • While this tree is additive, it is not ultrametric

24
  • Observed distances are obtained directly from the
    sequences themselves and patristic distances
    from a tree
  • For additive and ultrametric distances, the
    observed and tree distances match exactly

For real data this is rarely the case, indicating
that observed distances cannot be completely
accurately represented by a tree.
25
The discrepancy between observed and tree
distances can be used as an indicator of how well
observed distances fit a tree like
representation.
26
Distance methods
  • Experimentally derived distances are estimates of
    true distances
  • We want to fit them to a mathematical model
    (additive tree) and find the optimal value for
    the adjustable parameters
  • Branching pattern
  • Branch lenghts
  • Some methods based on this optimality criterion
    are Fitch Margoliash, minumum evolution
  • Other methods fit trees to distances
    algorithmically (NJ, UPGMA)

27
UPGMA an algorithmic method
  • Cluster analysis Unweighted pair group method
    using arithmetic averages (Sneath and Sokal 1973)
  • Assumes ultrametricity

28
UPGMA example
  • Given a matrix of pairwise distances, find the
    clusters (taxa) i and j such that dij is the Min
    value in the table
  • Define the depth of the branching between i and j
    (lij) to be dij/2
  • If i and j were the last two clusters, the tree
    is complete. Otherwise, create a new cluster
    called u
  • Define a distance from u to each other cluster k
    (with k ? i or j) to be the average of the
    distances dki and dkj
  • Go back to step 1 with one less cluster clusters
    i and j have been eliminated, and cluster u has
    been added

29
Distance Matrices and phenogram
30
Classification of phylogenetic methods
31
Parsimony methods
  • The most widely-used method, familiar notion in
    science (simplicity)
  • Shared attributes among taxa are inherited from
    common ancestors
  • When character conflicts occur, ad hoc hypotheses
    cannot be avoided if you want to explain all the
    data, and assumptions of homoplasy must be invoked

32
Parsimony
  • From all sets of possible trees, find all trees ?
    such that L(?) is minimal
  • B is the number of branches
  • N is the number of characters
  • k and k are the two nodes incident to each
    branch k
  • xkj and xkj represent either elements of the
    input matrix or optimal-character assignments
    made to internal nodes
  • Diff(y,z) is a function specifying the cost of
    transformation from y to z along any branch (for
    unrooted trees diff(y,z)diff(z,y)
  • The coefficient w is the weight assigned to each
    character

33
Parsimony
From the set of all possible trees, find all
trees ? such that L(?) is minimal
  • Two problems
  • How do you actually implement the objective
    function to evaluate a particular tree?
  • How do you find all minimal trees when you have
    many taxa?
  • Objective function and tree-searches can be
    formalized into algorithms

34
Fitch and Wagner Parsimony
  • Simplest parsimony methods
  • Fitch unordered multistate characters
  • Wagner binary,ordered multistate and continuous
  • In describing the algorithm we will consider a
    single character (j) in isolation (independence)
  • The tree (strictly bifurcating) is a given, just
    any tree we wish to evaluate (tree-search
    algorithms, later), we only need to assign an
    arbitrary root (any taxon), denoted r

35
Fitch parsimony The algorithm
  • To each terminal node i (including the one at
    the root), assign a state set Si containing the
    character state assigned to the corresponding
    taxon in the input data matrix (Xij)

36
Post-order traversal
  • Visit an internal node k for which a state set Sk
    has not been defined but for which the state sets
    of ks two immediate descendants (Si and Sj) has
    been defined. Assign to k a state set Sk
    according to the following rules

37
then
  • if

then
  • Otherwise

and increase the length by 1
The intersection of Sb and Sc is empty, so we use
the union as Sx and increase L by 1 (L 1)
38
Post-order traversal
  • If node k is located at the basal fork of the
    tree (I.e., the immediate descendant of the
    terminal node placed at the root). The traversal
    has been completed proceed to step 4. Otherwise
    return to step 2.

39
Return to step 2
The intersection of Sx and Sd is not empty, so we
use the intersection as Sy and do not increase
L (L remains 1)
40
Return to step 2
The intersection of Sy and Se is empty, so we use
the union as Sz and increase L by 1 (now, L 2)
41
Post-order traversal
  • If the state assigned to the terminal node at the
    root of the tree (Xr) is not contained in the
    state set just assigned to the node at the basal
    fork of the tree (Sk), increase the tree length
    by 1

In this case state C is not included in state set
A,G, so we increase L by 1 to (now, L3) This
procedure concludes the evaluation but does not
assign optimal character states to the internal
nodes
42
Pre-order traversal
  • To obtain a most parsimonious reconstruction
    (MPR) and assign optimal character states to each
    node we need to make a second pass over the tree
  • This time from the root to the tips

43
Pre-order traversal
  • Visit an internal node k for which an optimal
    state assignment Xk has not yet been made but for
    which an assignment has been made to ks
    immediate ancestor, denoted m
  • If Xm is contained in the state set assigned to k
    in the first pass (Sk), assign this state to k as
    well. Otherwise arbitrarily assign any state from
    Sk to k

44
Pre-order traversal
  • Two MPRs exist for this case according to which
    state we arbitrarily choose for the set A,G at
    the basal fork
  • A third option is to assign C to all three
    internal nodes

45
Other parsimony variants
  • Dollo parsimony every derived character must be
    uniquely derived (originate only once in the
    tree)
  • Homoplasy only reversals are allowed (no
    parallelism or convergence)
  • In practice, Dollo parsimony does not require
    inclusion of hypothetical ancestors just
    character polarity (unrooted Dollo)
  • Convenient for restriction-site characters
    (easier to loose that to gain a site)

46
Dollo parsimony and RFLP data
Relaxed Dollo criterion, may be applied using
generalized parsimony
47
Generalized Parsimony
  • All parsimony variants can be subsumed into a
    generalized method that assigns a cost for each
    possible transformation
  • Costs are represented in a m-by-m cost matrix S,
    where each element Sij represents the increase in
    tree length due to a transformation from state i
    to j
  • The cost of each transformation (weight) can be
    determined a priori (e.g. for RFLPs or for
    transition/transversion changes) or a posteriori
    (using the same data, e.g. successive
    approximations method)

48
Generalized Parsimony Cost matrices
49
Protein parsimony
  • A 20x20 matrix specifies the cost for each
    possible transformation
  • The matrix may be based on the genetic code
    (PROTPARS matrix) and/or the biochemical
    properties of the amino acids themselves (Dayhoff
    matrices)

50
Parsimony
  • From all sets of possible trees, find all trees ?
    such that L(?) is minimal
  • B is the number of branches
  • N is the number of characters
  • Diff(y,z) is a function specifying the cost of
    transformation from y to z along any branch (for
    unrooted trees diff(y,z)diff(z,y) -- Cost matrix
  • The coefficient w is the weight assigned to each
    character (weighted parsimony)
  • A priori or a posteriori

51
Difference in perspective MP and ML
  • Parsimony seeks solutions that minimize the
    amount of change required to explain the data
    (underestimates superimposed changes)
  • ML attempts to estimate the actual amount of
    change (by specifying the evolutionary model that
    will account for the data with the highest
    likelihood)
  • Methods that incorporate models of evolutionary
    change can make more efficient use of the data

52
Searching for optimal trees
  • Methods with explicit optimality criteria
  • Parsimony
  • Maximum likelihood
  • Additive-tree distance
  • Separate the problem of
  • evaluating the tree
  • finding the optimal tree(s)
  • Can we evaluate all possible trees for a
    particular problem?

53
Searching for optimal trees
  • For small to moderate data sets, with as many as
    8-20 taxa, we can use exact methods
  • Exact methods guarantee the discovery of all
    optimal trees
  • Exact methods include
  • Exhaustive search
  • Branch-and-bound search

54
How many trees?
55
Exhaustive search enumerate al possible trees
56
  • Branch-and-bound search
  • Does not require exhaustive search and yet
    provides an exact solution.
  • Traverse a search tree in a depth-first sequence
    (to get an initial tree, could be a random tree
    but better use heuristics to make BB more
    efficient)
  • Use this tree score as the upper bound (L) on
    optimal value of chosen criterion.

57
Branch-and-bound 3. Move along path to tips and
evaluate trees. If tree is gtL then dispense the
rest of that path. 4. If a better tree is found
with L lt L, we now have improved the upper bound
on the score of the optimal tree. This may enable
us to dispense of other paths and finish the
search more quickly
58
Approximate methods
  • For larger data sets computing time becomes
    prohibitive and we only explore some subset of
    all possible trees (hoping that the optimal trees
    will be found in the subset explored)
  • Heuristic approaches sacrifice the guarantee of
    optimality in favor of reduced computer time
  • Use hill climbing methods. Initial tree starts
    the process, then we seek to improve its score
  • When we can find no way to further improve the
    score, we stop.We dont know if we reached a
    local or a global optimum

59
Initial trees
  • May be obtained by stepwise addition, the most
    commonly used method
  • Similar to exhaustive search but evaluate trees
    at every step, each time you add a new taxon and
    only follow the path derived from the optimal
    tree
  • Which taxa do you choose first? Which do you
    connect next?
  • These are greedy algorithms

60
Stepwise addition
61
  • Initial trees also may be obtained by star
    decomposition, another greedy algorithm

62
Branch swapping
  • To improve the initial estimate we can perform
    sets of predefined rearrangements on the tree
  • Any of these rearrangements amounts to a stab in
    the dark
  • Globally optimal trees may be several
    rearrangements away from the starting tree
  • If a better tree is found, a new round of
    rearrangements is then performed in the new tree
  • Several branch-swapping algorithms are available

63
Branch swapping by tree bisection and
reconnection (TBR) 1. Tree is bisected along a
branch, yielding two disjunct subtrees 2. The
subtrees are reconnected by joining a pair of
branches, one from each subtree 3. All possible
bisections and pairwise reconnections are
evaluated
64
Branch swapping by subtree prunning and
regrafting (SPR) 1. A subtree is pruned from the
tree (e.g. A,B) 2. The subtree is then regrafted
to a different location on the tree 3. All
possible subtree removals and reattachment points
are evaluated
65
Branch swapping by nearest-neighbor interchanges
(NNI) 1. Each interior branch of the tree
defines a local region of four subtrees
2. Interchanging a subtree on one side of the
branch with one from the other constitutes an
NNI 3. Two such rearrangements are possible for
each interior branch (all interior branches are
swapped)
66
Landscapes and the problem of islands of trees
67
Bayesian Inference of Phylogenies
  • Closely related to ML methods, differing only in
    the use of a PRIOR DISTRIBUTION (which would
    typically be a tree)
  • Use of a prior enables us to interpret the result
    as the distribution of the tree given the data
  • Bayes described this in 1790, and controversy
    among statisticians over its appropriateness is
    almost that old
  • Recently, the introduction of Markov Chain Monte
    Carlos (MCMC) methods has given a new impetus to
    Bayesian inference
  • The latest silver bullet for phylogenetic
    analysis?

68
Simple example of Bayesian inference
Box with 90 fair and 10 biased dice
Take a die at random from the box and roll it
twice get a 4 and a 6 What is the probability
that the die is biased?
69
A Bayesian analysis combines ones prior beliefs
about the probability of a hypothesis with its
likelihood
Likelihood assuming a fair die
Likelihood assuming a biased die
Probability of observing the data is 1.96 times
greater under the hypothesis that the die is
biased
70
Bayesian inference is based upon the POSTERIOR
probability of a hypothesis
The posterior probability that the die is biased
can be obtained using Bayess formula
Our opinion of the die being biased changed from
0.1 to 0.179 after observing a 4 and a 6 Priors
are a strength or a weakness of the method?
71
Bayesian inference of phylogeny
Based upon the posterior probability of a
phylogenetic tree ( )
Posterior probability of the ith tree, can be
interpreted as the probability that this tree is
the correct tree given the data
prior probability of the ith tree, typically
The summation in the denominator is over all B
trees possible for s species (taxa)
72
Markov Chain Monte Carlo
Typically, the posterior probability
cannot be calculated analytically. However,
the posterior probability of phylogeniers can be
approximated by sampling trees from the posterior
probability distribution. MCMC can be used to
sample phylogenie according to their posterior
probabilities Let
be a specific tree, combination of branch
lengths, substitution parameters, and gamma shape
parameter The MH algorithm is an MCMC allgoritm
that has been successfully used to approximate
the posterior probability of trees
73
The MH algorithm constructs a Markov chain that
has as its stationary frequency the posterior
probability of interest (in this case the joint
posterior prob of ) The
current state of the chain is denoted A
new state is then proposed The new state is
accepted with probability
74
A uniform random number is drawn and if this
number is lt R, then the proposed change is
accepted. Otherwise the chain remains in the
original state. This process of proposing a new
state, calculating the acceptance probability,
and accepting or rejecting the move is repeated
thousands of times. The sequence of states
visited forms the Markov Chain. The chain is
sampled after it reached stationarity and the
sampled trees represent the posterior prob
distribution
75
The proportion of times a single tree is found
among these samples is the posteriror prob of
that tree A majority rule consensus can be
derived from the sample and the proportions
obtained for each clade are an approximation of
the posterior prob of the clades
Write a Comment
User Comments (0)
About PowerShow.com