Phylogenetic Reconstruction Using NeighborJoining - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Phylogenetic Reconstruction Using NeighborJoining

Description:

Jukes-Cantor. Notice that Bubalus is the outgroup. Trastrep is nested within the tree ... Analysis 1: ML Consensus Bootstrapped Jukes-Cantor corrected ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 23
Provided by: CISE9
Category:

less

Transcript and Presenter's Notes

Title: Phylogenetic Reconstruction Using NeighborJoining


1
Phylogenetic Reconstruction Using
Neighbor-Joining Maximum Likelihood Methods
With Different Evolutionary Models
  • PROJECT PRESENTATION
  • CAP5510, Fall 2004
  • Rebecca Gray
  • Dennis Klop
  • Alexandra Martinez

2
Outline
  • Problem Description
  • Methodology
  • Neighbor-Joining Algorithm
  • Evolutionary Models
  • Phylip Package for Maximum Likelihood
  • Analysis
  • Results
  • Conclusions

3
Problem Description
  • 24 sequences generated by Mulligan Lab.
  • Sequences from mitochondrial d-loop region
  • Average length of sequences 500 bp
  • Exact evolutionary relationship between these
    sequences is unknown
  • Our goal gt infer the correct phylogenetic
    relationship by using the most suitable model of
    evolution

4
Neighbor-Joining Method
  • Reconstructs phylogenetic trees from evolutionary
    distance data.
  • Provides both topology and branch lengths.
  • Input multiple sequence alignment.
  • Initial distance matrix is derived from msa.
  • Starts with a starlike tree, and iterates through
    a series of clustering steps.
  • At each clustering step, find the next pair of
    OTUs (neighbors) to join as the one that
    minimizes the sum of branch lengths.
  • We implemented this algorithm in Java.

5
Evolutionary Models
  • Jukes-Cantor model
  • Simplest, equal base frequencies
  • Free parameter rate of nucleotide substitution
  • Jukes-Cantor corrected model
  • By Mount, unequal base frequencies
  • Free parameter rate of nucleotide substitution
  • Kimura 2-Parameter model
  • More complex, equal base frequencies
  • Free parameters transition / transversion rates

6
The Phylip Package (1)
  • Used for creating trees with the Maximum
    Likelihood method.
  • DNAML ? DNAMLK
  • DNAMLK assumes the molecular clock hypothesis
    under which proteins and nucleic acids evolve at
    constant rates through time and for different
    lineages
  • User specified parameters
  • Global rearrangements
  • Using outgroups ? Trastrep sequences

7
The Phylip Package (2)
  • SPR Subtree Pruning and Regrafting
  • Identify and remove a subtree
  • Reattach to each possible branch of the remaining
    tree
  • Improves result since the position of every
    species is reconsidered
  • High time complexity ? triples the runtime of the
    program!

8
The Phylip Package (3)
  • Creating phylogenies of all the Trastrep
    sequences with or without the global
    rearrangements.
  • Comparing Maximum Likelihood scores

9
The Phylip Package (4)
  • Bootstrapping to obtain statistical support for
    our branches of our trees
  • In DNAML JC, JC corrected K2-P model
  • In DNAPARS creates a tree based on the maximum
    parsimony method
  • Creating consensus trees out of a hundred
    alignments for each model
  • Obtaining bootstrap values for each node for each
    model

10
Analysis of the Models of Evolution - Techniques
  • 1. Evaluate trees based on biological intuition
  • There are 6 different genuses represented
  • We expect Trastrep to be placed as the outgroup
  • Bubalus sequences should be placed together
  • Bos sequences should be placed together
  • 2. Select ML tree with highest likelihood score
  • 3. Compare NJ trees with ML bootstrapped
    consensus trees
  • The model of evolution that has the most
    similarities between the two methods is the best

11
Analysis - Creation of Bootstrapped Consensus
trees
  • To determine which model of evolution best
    supported our data, we used the bootstrap method
  • Non-parametric bootstrapping involves re-sampling
    the data a set number of times (in this case
    100x)
  • The bootstrap analysis begins with the initial
    multiple alignment and re-creates this dataset
  • Bootstrap values indicate the number of replicate
    data sets in which a particular branch was
    created
  • There is not a set standard bootstrap value which
    indicates statistical support we chose the value
    of 70

12
Analysis - Consensus Tree
  • Each bootstrapped data set, which now consists of
    100 different multiple alignments, was used as
    input into the PHYLIP programs DNAML and DNAPARS
  • The parameters of DNAML were manipulated to
    reflect the particular model of evolution under
    review (JC, JC corrected, K2P) as discussed
    before DNAPARS does not require any parameter
    optimization
  • For each of the four models, 100 trees were
    created
  • The program CONSENS in the PHYLIP program was
    used to create a consensus tree for each of the
    100 trees for each model of evolution
  • This consensus tree reflects the best topology
    that fits the 100 trees each branch is given
    bootstrap value

13
Analysis 1 Consensus Parsimony Tree
  • Notice Trastrep not outgroup
  • Notice Bubalus not grouped together

14
Analysis 1 Consensus Jukes-Cantor
  • Notice that Bubalus is the outgroup
  • Trastrep is nested within the tree

15
Analysis 1 ML Consensus Bootstrapped Kimura
2-Parameter
  • Notice that Trastrep is nested in tree
  • Bos sequences are separated

16
Analysis 1 ML Consensus Bootstrapped
Jukes-Cantor corrected
  • Notice that Trastrep is placed as the outgroup
  • All of the Bubalus species are placed together
  • All of the Bos sequences are placed together,
    closer to Bubalus than Trastrep

17
Analysis 2 ML tree with highest likelihood score
(JC corrected)
  • We created 3 ML trees in PHYLIP for each model of
    evolution
  • The tree with the highest score was the JC
    corrected tree
  • Jukes-Cantor
  • 2787.4376
  • Jukes-Cantor corrected
  • 2740.70636
  • Kimura 2-Parameter
  • 2818.97662.

18
Analysis 3 Comparisons between NJ trees and ML
consensus bootstrapped trees
  • 1. Calculated the number of shared branches
    between the NJ tree and the ML consensus tree for
    each model of evolution (Parsimony vs. Mismatch,
    Jukes-Cantor, Jukes-Cantor corrected, Kimura
    2-parameter
  • 2. Calculated the number of shared branches
    between trees with gt70 bootstrap support
  • 3. Calculated the number of shared internal
    branches between trees, i.e. those branches that
    do not just connect leaves
  • 4. Calculated the number of shared internal
    branches with gt70 bootstrap support

19
Comparison Results
20
Results
  • Out of the four comparisons, the JC corrected
    model had the most shared branches
  • For one comparison, the K2P model had the most
    shared branches
  • The parsimony/mismatch comparison fared the
    poorest

21
Conclusions
  • In the NJ-ML consensus comparison, the JC
    corrected model scored best in 3 of 4 trials
  • Of the 3 ML trees created for each model of
    evolution, the JC corrected model was given the
    highest ML score
  • The ML consensus JC corrected model produces a
    topology most like the one we would expect based
    on biological intuition
  • We therefore conclude that the JC corrected model
    of evolution best fits our data set

22
References
  • 1 Krane, D.E. and Raymer, M.L. Fundamental
    Concepts of Bioinformatics. 2002. (ISBN
    0-8053-4633-3)
  • 2 Lio, P., Goldman, N. Models of Molecular
    Evolution and Phylogeny. Genome Research.
    81233-1224, 1998.
  • 3 Mount, D. W. Bioinformatics Sequence and
    Genome Analysis, 2nd ed., 2004. (ISBN
    0-87969-712-1)
  • 4 Phylip Program, version 3.62. URL
    http//evolution.genetics.washington.edu/phylip/
  • 5 PhyloDraw Program, version 0.8. URL
    http//pearl.cs.pusan.ac.kr/phylodraw/
  • 6 Saitou, N., and Nei, M. The neighbor-joining
    method A new method for reconstructing
    phylogenetic trees. Mol. Biol. Evol. 4406-425,
    1987.
  • 7 Subtree Pruning and Regrafting. URL
    http//www.hyphy.org/docs/analyses/methods/spr.htm
    l
Write a Comment
User Comments (0)
About PowerShow.com