Phylogenetic Reconstruction Using NeighborJoining - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Phylogenetic Reconstruction Using NeighborJoining

Description:

Jukes-Cantor. Notice that Bubalus is the outgroup. Trastrep is nested within the tree ... Analysis 1: ML Consensus Bootstrapped Jukes-Cantor corrected ... – PowerPoint PPT presentation

Number of Views:128

Avg rating:3.0/5.0

Slides: 23

Provided by: CISE9

Category:

more less

Transcript and Presenter's Notes

Title: Phylogenetic Reconstruction Using NeighborJoining

1
Phylogenetic Reconstruction Using
Neighbor-Joining Maximum Likelihood Methods
With Different Evolutionary Models

PROJECT PRESENTATION
CAP5510, Fall 2004
Rebecca Gray
Dennis Klop
Alexandra Martinez

2
Outline

Problem Description
Methodology
Neighbor-Joining Algorithm
Evolutionary Models
Phylip Package for Maximum Likelihood
Analysis
Results
Conclusions

3
Problem Description

24 sequences generated by Mulligan Lab.
Sequences from mitochondrial d-loop region
Average length of sequences 500 bp
Exact evolutionary relationship between these
sequences is unknown
Our goal gt infer the correct phylogenetic
relationship by using the most suitable model of
evolution

4
Neighbor-Joining Method

Reconstructs phylogenetic trees from evolutionary
distance data.
Provides both topology and branch lengths.
Input multiple sequence alignment.
Initial distance matrix is derived from msa.
Starts with a starlike tree, and iterates through
a series of clustering steps.
At each clustering step, find the next pair of
OTUs (neighbors) to join as the one that
minimizes the sum of branch lengths.
We implemented this algorithm in Java.

5
Evolutionary Models

Jukes-Cantor model
Simplest, equal base frequencies
Free parameter rate of nucleotide substitution
Jukes-Cantor corrected model
By Mount, unequal base frequencies
Free parameter rate of nucleotide substitution
Kimura 2-Parameter model
More complex, equal base frequencies
Free parameters transition / transversion rates

6
The Phylip Package (1)

Used for creating trees with the Maximum
Likelihood method.
DNAML ? DNAMLK
DNAMLK assumes the molecular clock hypothesis
under which proteins and nucleic acids evolve at
constant rates through time and for different
lineages
User specified parameters
Global rearrangements
Using outgroups ? Trastrep sequences

7
The Phylip Package (2)

SPR Subtree Pruning and Regrafting
Identify and remove a subtree
Reattach to each possible branch of the remaining
tree
Improves result since the position of every
species is reconsidered
High time complexity ? triples the runtime of the
program!

8
The Phylip Package (3)

Creating phylogenies of all the Trastrep
sequences with or without the global
rearrangements.
Comparing Maximum Likelihood scores

9
The Phylip Package (4)

Bootstrapping to obtain statistical support for
our branches of our trees
In DNAML JC, JC corrected K2-P model
In DNAPARS creates a tree based on the maximum
parsimony method
Creating consensus trees out of a hundred
alignments for each model
Obtaining bootstrap values for each node for each
model

10
Analysis of the Models of Evolution - Techniques

1. Evaluate trees based on biological intuition
There are 6 different genuses represented
We expect Trastrep to be placed as the outgroup
Bubalus sequences should be placed together
Bos sequences should be placed together
2. Select ML tree with highest likelihood score
3. Compare NJ trees with ML bootstrapped
consensus trees
The model of evolution that has the most
similarities between the two methods is the best

11
Analysis - Creation of Bootstrapped Consensus
trees

To determine which model of evolution best
supported our data, we used the bootstrap method
Non-parametric bootstrapping involves re-sampling
the data a set number of times (in this case
100x)
The bootstrap analysis begins with the initial
multiple alignment and re-creates this dataset
Bootstrap values indicate the number of replicate
data sets in which a particular branch was
created
There is not a set standard bootstrap value which
indicates statistical support we chose the value
of 70

12
Analysis - Consensus Tree

Each bootstrapped data set, which now consists of
100 different multiple alignments, was used as
input into the PHYLIP programs DNAML and DNAPARS
The parameters of DNAML were manipulated to
reflect the particular model of evolution under
review (JC, JC corrected, K2P) as discussed
before DNAPARS does not require any parameter
optimization
For each of the four models, 100 trees were
created
The program CONSENS in the PHYLIP program was
used to create a consensus tree for each of the
100 trees for each model of evolution
This consensus tree reflects the best topology
that fits the 100 trees each branch is given
bootstrap value

13
Analysis 1 Consensus Parsimony Tree

Notice Trastrep not outgroup
Notice Bubalus not grouped together

14
Analysis 1 Consensus Jukes-Cantor

Notice that Bubalus is the outgroup
Trastrep is nested within the tree

15
Analysis 1 ML Consensus Bootstrapped Kimura
2-Parameter

Notice that Trastrep is nested in tree
Bos sequences are separated

16
Analysis 1 ML Consensus Bootstrapped
Jukes-Cantor corrected

Notice that Trastrep is placed as the outgroup
All of the Bubalus species are placed together
All of the Bos sequences are placed together,
closer to Bubalus than Trastrep

17
Analysis 2 ML tree with highest likelihood score
(JC corrected)

We created 3 ML trees in PHYLIP for each model of
evolution
The tree with the highest score was the JC
corrected tree
Jukes-Cantor
2787.4376
Jukes-Cantor corrected
2740.70636
Kimura 2-Parameter
2818.97662.

18
Analysis 3 Comparisons between NJ trees and ML
consensus bootstrapped trees

1. Calculated the number of shared branches
between the NJ tree and the ML consensus tree for
each model of evolution (Parsimony vs. Mismatch,
Jukes-Cantor, Jukes-Cantor corrected, Kimura
2-parameter
2. Calculated the number of shared branches
between trees with gt70 bootstrap support
3. Calculated the number of shared internal
branches between trees, i.e. those branches that
do not just connect leaves
4. Calculated the number of shared internal
branches with gt70 bootstrap support

19
Comparison Results
20
Results

Out of the four comparisons, the JC corrected
model had the most shared branches
For one comparison, the K2P model had the most
shared branches
The parsimony/mismatch comparison fared the
poorest

21
Conclusions

In the NJ-ML consensus comparison, the JC
corrected model scored best in 3 of 4 trials
Of the 3 ML trees created for each model of
evolution, the JC corrected model was given the
highest ML score
The ML consensus JC corrected model produces a
topology most like the one we would expect based
on biological intuition
We therefore conclude that the JC corrected model
of evolution best fits our data set

22
References

1 Krane, D.E. and Raymer, M.L. Fundamental
Concepts of Bioinformatics. 2002. (ISBN
0-8053-4633-3)
2 Lio, P., Goldman, N. Models of Molecular
Evolution and Phylogeny. Genome Research.
81233-1224, 1998.
3 Mount, D. W. Bioinformatics Sequence and
Genome Analysis, 2nd ed., 2004. (ISBN
0-87969-712-1)
4 Phylip Program, version 3.62. URL
http//evolution.genetics.washington.edu/phylip/
5 PhyloDraw Program, version 0.8. URL
http//pearl.cs.pusan.ac.kr/phylodraw/
6 Saitou, N., and Nei, M. The neighbor-joining
method A new method for reconstructing
phylogenetic trees. Mol. Biol. Evol. 4406-425,
1987.
7 Subtree Pruning and Regrafting. URL
http//www.hyphy.org/docs/analyses/methods/spr.htm
l