Title: Comparative Genomics
1Comparative Genomics
- Lecture 6
- Phylogenetics I
2Contents
- Evolutionary Trees
- Distance Methods
- UPGMA
- Neighbor-Joining
- Optimality Criterion Methods
- Simple parsimony
- Tree searching
- Weighted Parsimony
- Assessing Phylogenetic Uncertainty
- Bootstrapping
- Decay index and jackknifing
3Tree Terminology
4Types of Trees
Cladogram branch lengths are arbitrary, only the
topology of the tree is represented
Phylogram branch lengths are proportional to
amount of evolutionary change
Clock tree (ultrametric or linearized tree)
branch lengths are proportional to time units
5Unrooted and Rooted Trees
ingroup
Place root here
outgroup
Many analyses produce unrooted trees. They must
be rooted by locating the branch separating the
ingroup taxa from the outgroup taxa. The ingroup
and outgroup taxa must be circumscribed based on
other data (previous knowledge)..
6Tree Rotations
Which trees are identical? Which trees are
identical when the root is removed?
7Contents
- Evolutionary Trees
- Distance Methods
- UPGMA
- Neighbor-Joining
- Optimality Criterion Methods
- Simple parsimony
- Tree searching
- Weighted Parsimony
- Assessing Phylogenetic Uncertainty
- Bootstrapping
- Decay index and jackknifing
8Infer relationships among three species
9Three possible trees (topologies)
10Data
Aligned sequences
11Distance Methods
- Construct a tree by following a recipe
- First step is to calculate the distance between
sequences - Typically results in one tree
- Fast (O(n2)) but sometimes inaccurate
- Examples include UPGMA, WPGMA, and
Neighbor-Joining also PileUp and ClustalW/X
12UPGMA
- Calculate distance matrix
- Find the pair i,j with the smallest distance Dij
- Create a new group (ij), which has n(ij) ni
nj members - Connect i and j to a new node, each with a branch
of length Dij / 2 - Compute the distance from the new node to each of
the other taxa / groups k usingD(ij)k (ni /
n(ij)) Dik (nj / n(ij)) Djk - Replace the rows and columns of i and j with a
row and column for (ij) - If there is more than one group left, find next
pair with the smallest distance
13Neighbor Joining
- Similar to UPGMA but produces a non-clock tree
instead of a clock tree by weighting all
distances according to how different each taxon
is from the remaining taxa
14Contents
- Evolutionary Trees
- Distance Methods
- UPGMA
- Neighbor-Joining
- Optimality Criterion Methods
- Simple parsimony
- Tree searching
- Weighted Parsimony
- Assessing Phylogenetic Uncertainty
- Bootstrapping
- Decay index and jackknifing
15Optimality Criterion Methods
- The criterion associates each tree with a score
- The best tree is the one with the best score
- Implicitly, we need to examine all trees to be
guaranteed to find the tree with the best score - Computational complexity high (O(n!) for exact
solutions)
16Parsimony
- The best tree (the most parsimonious or maximum
parsimony tree) is the one requiring the smallest
number of evolutionary changes - How do you count changes?
17Parsimony
18Parsimony
Most parsimonious solution Requires two
changes Length is two steps This is the solution
requiring the smallest amount of change
19Parsimony Example
Human
Chimp
Gorilla
Orang
Tree 1
Tree 2
Tree 3
Char.
- 1 A A A C
- 2 T T T T
- 3 C C A A
- 4 G A A G
- 5 T T C C
- 6 G T G C
- TREE LENGTH
First draw the possible trees. Then calculate the
number of required changes on each tree. The tree
with the smallest number of changes is the most
parsimonious tree.
20Tree Quality Measures
Consistency Index
Retention Index
m minimum number changes required on any tree g
maximum number changes required on any tree l
actual changes required on the tree
Indices vary between 0 and 1. Higher value
indicates better fit between data and tree.
21Size of tree space grows very quickly
22Tree Searching Strategies
- Exact Methods (guarantee that you find the best
tree) - Exhaustive Search look at all possible trees
- Branch-and-Bound exclude trees that cannot beat
the best tree you have already found - Heuristic Methods (no guarantees)
- Stepwise Addition
- Branch Swapping NNI, SPR, TBR
23Branch-and-Bound
Search Tree for Branch and Bound algorithm.
Search starts at the root and proceeds towards
the tips. Once a tree longer than the shortest
tree already found (the bound) is hit, the search
need not explore more complete trees because they
will be longer. In this way, large sections of
the search tree can be skipped.
24Heuristic Search
Start at the tree found by stepwise addition,
random addition, or at a random tree. Climb
uphill using branch swapping (tree
rearrangements).
Higher is better (more parsimonious)
25Stepwise Addition
At each level (A, B, C, etc), all trees are
examined and only the best tree (or the n best
trees) are kept as seed(s) in the next step of
the algorithm.
26Tree Rearrangements
- NNI Nearest Neighbor Interchanges. Fastest but
only modest tree changes - SPR Subtree Pruning and Regrafting Slower but
more substantial rearrangements - TBR Tree Bisection and Reconnection. Slowest but
most comprehensive rearrangements
27NNI
Nearest Neighbor Interchange
Delete one branch and try the two other
alternative arrangements of the four subtrees
surrounding that branch
28SPR
Subtree Pruning and Regrafting
29TBR
Tree Bisection Reconnection
30Weighted Parsimony
- If different events occur at different rates
(they are not equally likely), parsimony should
weight the rare events more heavily (they are
less likely to evolve repeatedly) - Two types of weighting are sometimes used
- substitution weighting
- character (homoplasy) weighting
- Branch weighting should be used but is difficult
to implement so is never used
31Substitution Weighting
Cost Matrix (Sankoff Matrix)
A
C
transition
transition
G
T
transversion
32Homoplasy Weighting
- The more rapidly a character evolves, the more
unreliable it is as an indicator of phylogenetic
relationships because of the high probability of
multiple hits - To compensate for this, it is possible to weight
slowly evolving characters more heavily in a
parsimony analysis (called implied weights or
Goloboff fit criterion)
33Saturation Plot
The longer the branch, the larger the discrepancy
between the observed number of differences
(parsimony length) and the true branch length
(true number of changes)
34Long-Branch Attraction
True tree
Parsimony gives you this tree
Parsimony gives you the wrong tree if you do not
weight branches according to their length. Such
weighting is difficult to do within the parsimony
framework.
35Contents
- Evolutionary Trees
- Distance Methods
- UPGMA
- Neighbor-Joining
- Optimality Criterion Methods
- Simple parsimony
- Tree searching
- Weighted Parsimony
- Assessing Phylogenetic Uncertainty
- Bootstrapping
- Decay index and jackknifing
- Post-Tree Analysis
- Phylogenetic Classification
36Measuring the Quality of Trees
- Measures of the overall quality of the tree
- Consistency index
- Retention index
- Measures of how robust different portions
(branches) of a tree are - Bootstrapping
- Decay index
- Jack-knifing
37Bootstrapping
Original characters
Pseudosampling 100-1000 times
Phyl. analysis
Phyl. Analysis
Phyl. analysis
Estimate of uncertainty
38Majority Rule Consensus
Majority Rule Consensus
Percentages indicate frequency with which the
corresponding group occurs among the summarized
trees. This can be viewed as a measure of the
support for that group.
39Alternative Support Measures
- Decay Index (also known as Bremer Support or
Branch Support) the difference in length between
the best tree with and the best tree without a
particular group (branch) - Jackknife Similar to bootstrap but a proportion
of characters are deleted from original matrix to
yield the replicate matrices, which are smaller
than the original matrix. In statistics, the
jackknife is an older technique that is generally
considered inferior to bootstrapping.