Comparative Genomics - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Comparative Genomics

Description:

The tree with the smallest number of changes is the most parsimonious tree. ... Consistency Index. Retention Index. m = minimum number changes required on any tree ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 40
Provided by: fredrikr
Category:

less

Transcript and Presenter's Notes

Title: Comparative Genomics


1
Comparative Genomics
  • Lecture 6
  • Phylogenetics I

2
Contents
  • Evolutionary Trees
  • Distance Methods
  • UPGMA
  • Neighbor-Joining
  • Optimality Criterion Methods
  • Simple parsimony
  • Tree searching
  • Weighted Parsimony
  • Assessing Phylogenetic Uncertainty
  • Bootstrapping
  • Decay index and jackknifing

3
Tree Terminology
4
Types of Trees
Cladogram branch lengths are arbitrary, only the
topology of the tree is represented
Phylogram branch lengths are proportional to
amount of evolutionary change
Clock tree (ultrametric or linearized tree)
branch lengths are proportional to time units
5
Unrooted and Rooted Trees
ingroup
Place root here
outgroup
Many analyses produce unrooted trees. They must
be rooted by locating the branch separating the
ingroup taxa from the outgroup taxa. The ingroup
and outgroup taxa must be circumscribed based on
other data (previous knowledge)..
6
Tree Rotations
Which trees are identical? Which trees are
identical when the root is removed?
7
Contents
  • Evolutionary Trees
  • Distance Methods
  • UPGMA
  • Neighbor-Joining
  • Optimality Criterion Methods
  • Simple parsimony
  • Tree searching
  • Weighted Parsimony
  • Assessing Phylogenetic Uncertainty
  • Bootstrapping
  • Decay index and jackknifing

8
Infer relationships among three species
9
Three possible trees (topologies)
10
Data
Aligned sequences
11
Distance Methods
  • Construct a tree by following a recipe
  • First step is to calculate the distance between
    sequences
  • Typically results in one tree
  • Fast (O(n2)) but sometimes inaccurate
  • Examples include UPGMA, WPGMA, and
    Neighbor-Joining also PileUp and ClustalW/X

12
UPGMA
  • Calculate distance matrix
  • Find the pair i,j with the smallest distance Dij
  • Create a new group (ij), which has n(ij) ni
    nj members
  • Connect i and j to a new node, each with a branch
    of length Dij / 2
  • Compute the distance from the new node to each of
    the other taxa / groups k usingD(ij)k (ni /
    n(ij)) Dik (nj / n(ij)) Djk
  • Replace the rows and columns of i and j with a
    row and column for (ij)
  • If there is more than one group left, find next
    pair with the smallest distance

13
Neighbor Joining
  • Similar to UPGMA but produces a non-clock tree
    instead of a clock tree by weighting all
    distances according to how different each taxon
    is from the remaining taxa

14
Contents
  • Evolutionary Trees
  • Distance Methods
  • UPGMA
  • Neighbor-Joining
  • Optimality Criterion Methods
  • Simple parsimony
  • Tree searching
  • Weighted Parsimony
  • Assessing Phylogenetic Uncertainty
  • Bootstrapping
  • Decay index and jackknifing

15
Optimality Criterion Methods
  • The criterion associates each tree with a score
  • The best tree is the one with the best score
  • Implicitly, we need to examine all trees to be
    guaranteed to find the tree with the best score
  • Computational complexity high (O(n!) for exact
    solutions)

16
Parsimony
  • The best tree (the most parsimonious or maximum
    parsimony tree) is the one requiring the smallest
    number of evolutionary changes
  • How do you count changes?

17
Parsimony
18
Parsimony
Most parsimonious solution Requires two
changes Length is two steps This is the solution
requiring the smallest amount of change
19
Parsimony Example
Human
Chimp
Gorilla
Orang
Tree 1
Tree 2
Tree 3
Char.
  • 1 A A A C
  • 2 T T T T
  • 3 C C A A
  • 4 G A A G
  • 5 T T C C
  • 6 G T G C
  • TREE LENGTH

First draw the possible trees. Then calculate the
number of required changes on each tree. The tree
with the smallest number of changes is the most
parsimonious tree.
20
Tree Quality Measures
Consistency Index
Retention Index
m minimum number changes required on any tree g
maximum number changes required on any tree l
actual changes required on the tree
Indices vary between 0 and 1. Higher value
indicates better fit between data and tree.
21
Size of tree space grows very quickly
22
Tree Searching Strategies
  • Exact Methods (guarantee that you find the best
    tree)
  • Exhaustive Search look at all possible trees
  • Branch-and-Bound exclude trees that cannot beat
    the best tree you have already found
  • Heuristic Methods (no guarantees)
  • Stepwise Addition
  • Branch Swapping NNI, SPR, TBR

23
Branch-and-Bound
Search Tree for Branch and Bound algorithm.
Search starts at the root and proceeds towards
the tips. Once a tree longer than the shortest
tree already found (the bound) is hit, the search
need not explore more complete trees because they
will be longer. In this way, large sections of
the search tree can be skipped.
24
Heuristic Search
Start at the tree found by stepwise addition,
random addition, or at a random tree. Climb
uphill using branch swapping (tree
rearrangements).
Higher is better (more parsimonious)
25
Stepwise Addition
At each level (A, B, C, etc), all trees are
examined and only the best tree (or the n best
trees) are kept as seed(s) in the next step of
the algorithm.
26
Tree Rearrangements
  • NNI Nearest Neighbor Interchanges. Fastest but
    only modest tree changes
  • SPR Subtree Pruning and Regrafting Slower but
    more substantial rearrangements
  • TBR Tree Bisection and Reconnection. Slowest but
    most comprehensive rearrangements

27
NNI
Nearest Neighbor Interchange
Delete one branch and try the two other
alternative arrangements of the four subtrees
surrounding that branch
28
SPR
Subtree Pruning and Regrafting
29
TBR
Tree Bisection Reconnection
30
Weighted Parsimony
  • If different events occur at different rates
    (they are not equally likely), parsimony should
    weight the rare events more heavily (they are
    less likely to evolve repeatedly)
  • Two types of weighting are sometimes used
  • substitution weighting
  • character (homoplasy) weighting
  • Branch weighting should be used but is difficult
    to implement so is never used

31
Substitution Weighting
Cost Matrix (Sankoff Matrix)
A
C
transition
transition
G
T
transversion
32
Homoplasy Weighting
  • The more rapidly a character evolves, the more
    unreliable it is as an indicator of phylogenetic
    relationships because of the high probability of
    multiple hits
  • To compensate for this, it is possible to weight
    slowly evolving characters more heavily in a
    parsimony analysis (called implied weights or
    Goloboff fit criterion)

33
Saturation Plot
The longer the branch, the larger the discrepancy
between the observed number of differences
(parsimony length) and the true branch length
(true number of changes)
34
Long-Branch Attraction
True tree
Parsimony gives you this tree
Parsimony gives you the wrong tree if you do not
weight branches according to their length. Such
weighting is difficult to do within the parsimony
framework.
35
Contents
  • Evolutionary Trees
  • Distance Methods
  • UPGMA
  • Neighbor-Joining
  • Optimality Criterion Methods
  • Simple parsimony
  • Tree searching
  • Weighted Parsimony
  • Assessing Phylogenetic Uncertainty
  • Bootstrapping
  • Decay index and jackknifing
  • Post-Tree Analysis
  • Phylogenetic Classification

36
Measuring the Quality of Trees
  • Measures of the overall quality of the tree
  • Consistency index
  • Retention index
  • Measures of how robust different portions
    (branches) of a tree are
  • Bootstrapping
  • Decay index
  • Jack-knifing

37
Bootstrapping
Original characters
Pseudosampling 100-1000 times
Phyl. analysis
Phyl. Analysis
Phyl. analysis
Estimate of uncertainty
38
Majority Rule Consensus
Majority Rule Consensus
Percentages indicate frequency with which the
corresponding group occurs among the summarized
trees. This can be viewed as a measure of the
support for that group.
39
Alternative Support Measures
  • Decay Index (also known as Bremer Support or
    Branch Support) the difference in length between
    the best tree with and the best tree without a
    particular group (branch)
  • Jackknife Similar to bootstrap but a proportion
    of characters are deleted from original matrix to
    yield the replicate matrices, which are smaller
    than the original matrix. In statistics, the
    jackknife is an older technique that is generally
    considered inferior to bootstrapping.
Write a Comment
User Comments (0)
About PowerShow.com