Too Many Trees - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Too Many Trees

Description:

salamander. frog. 0. 0. Distance Based Methods. Iterative ... In our case, one pair has zero distance: Mr. salamander & Mr. frog. Algorithm - 1st Iteration: ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 22
Provided by: den89
Category:
Tags: many | salamander | trees

less

Transcript and Presenter's Notes

Title: Too Many Trees


1
Too Many Trees !
The problem of optimal tree identification
becomes computationally hard if the algorithm
has to test every tree. In this case,
heuristics must be used.
Number of rooted trees (2n-3)!/2n-2(n-2)! Num
ber of unrooted trees (2n-5)!/2n-3(n-3)!
http//bioinformatics.weizmann.ac.il/courses/BCG/l
ectures/06_phylo/02approaches/01problem.html
2
Molecular Clocks - evolution rate of different
genes
Hypothesis Evolution of a given gene, that have
equal mutation rates (number of mutations per
million years) on all branches of the phylogeny
tree.
http//www.bio.miami.edu/dana/250/betaglobinclock.
jpg
http//nitro.biosci.arizona.edu/courses/EEB182/Lec
ture07/lect7.html
3
Molecular Clock Hypothesis (Allan Wilson, UC
Berkeley, mid-1970s)
Under the molecular clock hypothesis differences
in the same gene among different species
mutation rate X time
The molecular clock hypothesis usually holds only
for closely related species. The hypothesis
makes it is easier to construct phylogeny trees.
4
  • Lecture 6 Phylogeny Reconstruction
  • What is evolution and what are phylogenetic trees
    ?
  • Types of phylogeny trees and tree properties.
  • Major algorithm families for phylogeny
    reconstruction
  • Character Based Methods Maximum Parsimony,
    Maximum Likelihood.
  • Distance Based Methods Neighbor Joining, etc.
  • PHYLIP software for phylogeny reconstruction.
  • Displaying the resulting trees drawgram (rooted)
    or drawtree
  • (un-rooted).

5
Phylogeny Reconstruction
  • Morphology Based Input n-by-m table, with
    rows species,

  • columns properties.
  • Sequence Based Input n aligned sequences,
    one per species.

Output Phylogeny tree
Input Properties table or aligned sequences
algorithm
Major Families of Algorithms
  • Distance Based Methods Neighbor Joining,
    etc.
  • Character Based Methods Maximum Parsimony.
  • Maximum Likelihood.

6
Tree Reconstruction
  • Build a tree based on organism sequences
  • Distance based methods
  • Use pairwise alignment scores to build
    trees.
  • Fast.
  • Character based methods
  • Make a tree that minimizes total number of
    mutations.
  • Slower (especially max. likelihood) but
    generally
  • better results.

http//www.cs.technion.ac.il/Labs/cbl/courses/biom
ed/lectures/08-Multiple_Alignment_Phylogeny.pdf
7
  • Character Based Methods
  • (sequence based methods,
  • NP-hard)
  • Maximum Parsimony.
  • Maximum Likelihood.

8
Character Based Methods Maximum Parsimony
Input multiple sequence alignment of n
sequences. The goal is to find the tree(s) and
internal nodes, using sequences that minimize
the number of changes (mutations).
GC
AA
AA
GC
AC
CC
CC
AC
9
Parsimony- Internal labeling to minimize number
of changes (for a given tree)
(?)
(?)
One more change. Change2
Change1
The intersection of A and A, is A. Change 0
The intersection of C, T and C is (of course) C.
The intersection set of A,C and C is C.
10
Character Based Methods Maximum Likelihood
Input multiple sequence alignment of n
sequences, one sequence per species.
AAAAATC
CCCCCCG
Long edge - more likely
AAAAAAG
CCCCCCG
AAAAATC
Long edge very unlikely
AAAAAAG
We look for a tree, focusing on edge lengths,
such that the likelihood (probability) that the
tree will produce the observed data is
maximized. Note long edges are likely to induce
lots of changes, while short edges are likely to
produce few changes.
11
Maximum Parsimony and Maximum Likelihood
  • ML is considered more reliable than MP,
  • since there are scenarios where MP fails
  • but ML succeeds.
  • These two sequence-based methods are
  • computationally hard (NP-hard).
  • Consequently, most algorithms perform a
  • heuristic search in the huge space of all
    trees.

12
Heuristics for Searching the Huge Space of Trees
The search starts with an initial tree, being
improved by local changes (sub-trees
interchange, see examples). For small datasets,
it is possible to evaluate all possible tree
topologies. This is done, by adding taxa to the
growing tree in all possible locations, which
ends up with many trees. Usually, you will be
constrained to search with heuristics ! Nearest
Neighbor Interchange
13
Heuristics for Searching the Huge Space of Trees
Sub-tree Pruning Re-grafting
Any branch on the tree can be "cut" off, or
pruned to create a sub-tree.
The green lines represent possibilities of
re-attraction of the pruned portion to the tree.
14
Heuristics for Searching the Huge Space of Trees
Branch-Breaking (tree bisection reconnection)
This method can break the tree on any branch.
Then the two sub-trees are each considered
rootless, following which, any two branches can
be connected.
Result
Sub-tree ((A B) C)
Sub-tree ((D E)(F G))
The branch leading to A is connected to the
branch leading to E.
15
Constructing Phylogenies Distance Based
Methods
  • The input is a symmetric n-by-n distance matrix
    D.
  • D(i,j) denotes the distance between species i
    and j.
  • The algorithm is iterative. In each iteration,
    two (or more)
  • species that are deemed neighbors are clustered
    together.
  • Then, the algorithm updates distances for the new
    cluster.
  • Now, the next iteration begins, and so on, until
    there is a
  • single cluster (tree) containing all n species.
  • There are numerous variations of this method
  • (UPGMA, Neighbor-joining (NJ), Fitch-Margoliash,
    KITCH, etc...).

0
salamander frog
0
16
Distance Based Methods
  • Iterative process, n-1 iterations (n is of
    species).
  • Each iteration consists of two steps
  • Step 1 Determine the closest pair of
    species v, u.
  • Cluster together these two
    neighbors
  • to a new species w.
  • Step 2 Update the distance matrix determine
    the
  • distances from the new species w
    to the other.

17
EXAMPLE for Tree Reconstruction Morphology Based
Input using Distance Based Algorithm
  • Input 10-by-13 table (10 species, 13
    properties).
  • Algorithm - 2 phases
  • 1. Build a distance matrix Distance
    between a pair
  • of species is the of properties
    where they differ.
  • 2. Construct a tree by iteratively
    clustering species
  • with small distances (neighbors).

18
Example Morphologic Data
http//research.amnh.org/siddall/methods/day1.htm
l
Table representing data as a 0/1 Matrix
34
19
Algorithm - Initial Phase
Convert 0/1 Matrix to a Distance
Matrix. Distance between any two species is
defined by the properties where they disagree
(out of 13, the total number of properties).
15 38
38 31 46 46
38 46 77
23 23
15 31 31 23
31 62
0 23 23
46 54 38
54
23 23
46 54 38 54

15 15
24 15 46

31 38
31 31

8
15 46

24
54


31
10 sp.
From now on, we use only the distance matrix,
and forget where it came from !
20
Algorithm - 1st Iteration Step 1
Cluster the nearest pair of neighbors.
Identify the two closest taxa from the distance
matrix. In our case, one pair has zero distance
Mr. salamander Mr. frog.
0
We join them together, and update the distance
matrix
salamander frog
0
Step 2 Update distance matrix only 9 species (8
old, one new).
15 31 46
46 38 46 77
38
15 31 31 23
31 62 23
15 15
24 15 46 23

31 38 31 31
23
8
15 46 46

24 54
38

31 38


54
The closest pair here are the geckosnake
(distance 8).
21
Algorithm - 2nd Iteration, Step 1 Cluster
the nearest pair of neighbors.
4
gecko snake
4
Join the gecko and snake. Add the new pair to the
forest, equally distributing the distance (8 4
4) to their ancestor.
4
gecko snake
coelocanth
4
?
budgy
aligator
0
salamander frog
turtle
0
human
Step 2 Update the distance matrix (yes, but how
?).
Write a Comment
User Comments (0)
About PowerShow.com