Title: Phylogenetic Reconstruction Using NeighborJoining
1Phylogenetic Reconstruction Using
Neighbor-Joining Maximum Likelihood Methods
With Different Evolutionary Models
- PROJECT PRESENTATION
- CAP5510, Fall 2004
- Rebecca Gray
- Dennis Klop
- Alexandra Martinez
2Outline
- Problem Description
- Methodology
- Neighbor-Joining Algorithm
- Evolutionary Models
- Phylip Package for Maximum Likelihood
- Analysis
- Results
- Conclusions
3Problem Description
- 24 sequences generated by Mulligan Lab.
- Sequences from mitochondrial d-loop region
- Average length of sequences 500 bp
- Exact evolutionary relationship between these
sequences is unknown - Our goal gt infer the correct phylogenetic
relationship by using the most suitable model of
evolution
4Neighbor-Joining Method
- Reconstructs phylogenetic trees from evolutionary
distance data. - Provides both topology and branch lengths.
- Input multiple sequence alignment.
- Initial distance matrix is derived from msa.
- Starts with a starlike tree, and iterates through
a series of clustering steps. - At each clustering step, find the next pair of
OTUs (neighbors) to join as the one that
minimizes the sum of branch lengths. - We implemented this algorithm in Java.
5Evolutionary Models
- Jukes-Cantor model
- Simplest, equal base frequencies
- Free parameter rate of nucleotide substitution
- Jukes-Cantor corrected model
- By Mount, unequal base frequencies
- Free parameter rate of nucleotide substitution
- Kimura 2-Parameter model
- More complex, equal base frequencies
- Free parameters transition / transversion rates
6The Phylip Package (1)
- Used for creating trees with the Maximum
Likelihood method. - DNAML ? DNAMLK
- DNAMLK assumes the molecular clock hypothesis
under which proteins and nucleic acids evolve at
constant rates through time and for different
lineages - User specified parameters
- Global rearrangements
- Using outgroups ? Trastrep sequences
7The Phylip Package (2)
- SPR Subtree Pruning and Regrafting
- Identify and remove a subtree
- Reattach to each possible branch of the remaining
tree - Improves result since the position of every
species is reconsidered - High time complexity ? triples the runtime of the
program!
8The Phylip Package (3)
- Creating phylogenies of all the Trastrep
sequences with or without the global
rearrangements. - Comparing Maximum Likelihood scores
9The Phylip Package (4)
- Bootstrapping to obtain statistical support for
our branches of our trees - In DNAML JC, JC corrected K2-P model
- In DNAPARS creates a tree based on the maximum
parsimony method - Creating consensus trees out of a hundred
alignments for each model - Obtaining bootstrap values for each node for each
model
10Analysis of the Models of Evolution - Techniques
- 1. Evaluate trees based on biological intuition
- There are 6 different genuses represented
- We expect Trastrep to be placed as the outgroup
- Bubalus sequences should be placed together
- Bos sequences should be placed together
- 2. Select ML tree with highest likelihood score
- 3. Compare NJ trees with ML bootstrapped
consensus trees - The model of evolution that has the most
similarities between the two methods is the best
11Analysis - Creation of Bootstrapped Consensus
trees
- To determine which model of evolution best
supported our data, we used the bootstrap method - Non-parametric bootstrapping involves re-sampling
the data a set number of times (in this case
100x) - The bootstrap analysis begins with the initial
multiple alignment and re-creates this dataset - Bootstrap values indicate the number of replicate
data sets in which a particular branch was
created - There is not a set standard bootstrap value which
indicates statistical support we chose the value
of 70
12Analysis - Consensus Tree
- Each bootstrapped data set, which now consists of
100 different multiple alignments, was used as
input into the PHYLIP programs DNAML and DNAPARS - The parameters of DNAML were manipulated to
reflect the particular model of evolution under
review (JC, JC corrected, K2P) as discussed
before DNAPARS does not require any parameter
optimization - For each of the four models, 100 trees were
created - The program CONSENS in the PHYLIP program was
used to create a consensus tree for each of the
100 trees for each model of evolution - This consensus tree reflects the best topology
that fits the 100 trees each branch is given
bootstrap value
13Analysis 1 Consensus Parsimony Tree
- Notice Trastrep not outgroup
- Notice Bubalus not grouped together
14Analysis 1 Consensus Jukes-Cantor
- Notice that Bubalus is the outgroup
- Trastrep is nested within the tree
15Analysis 1 ML Consensus Bootstrapped Kimura
2-Parameter
- Notice that Trastrep is nested in tree
- Bos sequences are separated
16Analysis 1 ML Consensus Bootstrapped
Jukes-Cantor corrected
- Notice that Trastrep is placed as the outgroup
- All of the Bubalus species are placed together
- All of the Bos sequences are placed together,
closer to Bubalus than Trastrep
17Analysis 2 ML tree with highest likelihood score
(JC corrected)
- We created 3 ML trees in PHYLIP for each model of
evolution - The tree with the highest score was the JC
corrected tree - Jukes-Cantor
- 2787.4376
- Jukes-Cantor corrected
- 2740.70636
- Kimura 2-Parameter
- 2818.97662.
18Analysis 3 Comparisons between NJ trees and ML
consensus bootstrapped trees
- 1. Calculated the number of shared branches
between the NJ tree and the ML consensus tree for
each model of evolution (Parsimony vs. Mismatch,
Jukes-Cantor, Jukes-Cantor corrected, Kimura
2-parameter - 2. Calculated the number of shared branches
between trees with gt70 bootstrap support - 3. Calculated the number of shared internal
branches between trees, i.e. those branches that
do not just connect leaves - 4. Calculated the number of shared internal
branches with gt70 bootstrap support
19Comparison Results
20Results
- Out of the four comparisons, the JC corrected
model had the most shared branches - For one comparison, the K2P model had the most
shared branches - The parsimony/mismatch comparison fared the
poorest
21Conclusions
- In the NJ-ML consensus comparison, the JC
corrected model scored best in 3 of 4 trials - Of the 3 ML trees created for each model of
evolution, the JC corrected model was given the
highest ML score - The ML consensus JC corrected model produces a
topology most like the one we would expect based
on biological intuition - We therefore conclude that the JC corrected model
of evolution best fits our data set
22References
- 1 Krane, D.E. and Raymer, M.L. Fundamental
Concepts of Bioinformatics. 2002. (ISBN
0-8053-4633-3) - 2 Lio, P., Goldman, N. Models of Molecular
Evolution and Phylogeny. Genome Research.
81233-1224, 1998. - 3 Mount, D. W. Bioinformatics Sequence and
Genome Analysis, 2nd ed., 2004. (ISBN
0-87969-712-1) - 4 Phylip Program, version 3.62. URL
http//evolution.genetics.washington.edu/phylip/ - 5 PhyloDraw Program, version 0.8. URL
http//pearl.cs.pusan.ac.kr/phylodraw/ - 6 Saitou, N., and Nei, M. The neighbor-joining
method A new method for reconstructing
phylogenetic trees. Mol. Biol. Evol. 4406-425,
1987. - 7 Subtree Pruning and Regrafting. URL
http//www.hyphy.org/docs/analyses/methods/spr.htm
l