Johns Hopkins University - Fall 2003 Phylogenetics - PowerPoint PPT Presentation

About This Presentation
Title:

Johns Hopkins University - Fall 2003 Phylogenetics

Description:

d(AB)C / 2. B. C. step 1 step 2. UPGMA example. Lecture #6 Page 19 ... that other methods (e.g. minimum evolution) will evaluate to find the best tree. Exercises ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 24
Provided by: kingj
Learn more at: http://physics.gmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Johns Hopkins University - Fall 2003 Phylogenetics


1
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Week6 Intro to Phylogenetic Reconstruction
Distance Based Methods
introduction to phylogenies distance based
methods phylogeny exercises
Phylogeny Objectives 1 - understand the essence
of phylogenies (definition of terms) 2 -
understand distance based methods of phylogenetic
reconstruction 3 - should be able to use various
software packages to reconstruct and view
phylogenies ClustalX, MEGA, DAMBE, Treeview
Lecture 6 Page 1
2
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Molecular phylogeny
study of relationships among organisms (molecular
systematics), proteins or genes using molecular
biology techniques
Darwin - thesis 1 - organisms descend with
modification from common ancestors (CA)
relationships among organisms, proteins, genes
are illustrated by a phylogenetic tree
internal node - common ancestor (CA) external
node - operational taxonomic unit (OTU) order of
branches define the relationships
(topology) branch length defines the number of
changes
Lecture 6 Page 2
3
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
A brief history of molecular phylogeny
phylogenetic inference is old (for Biology)
Charles Darwin Orgin of Species
(1859) Illustration of descent with modification
Ernst Haeckel Tree of life (1891)
Lecture 6 Page 3
4
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
A brief history of molecular phylogeny
more modern developments
Molecular phylogeny
Nuttall (1904) found that the strength of
serological cross reactions was correlated with
the level of relatedness between animals -
applied to primate phylogeny
starting 1950s many more sources of molecular
information become avaiable e.g. amino acid
sequences, allozyme frequencies, DNA hybridization
these data stimulated the development of
quantitative numerical taxonomy techniques
for phylogenetic analysis
Algorithmic approaches
first numerical approach to phylogeny based on
phenetic approach i.e. similarity of
morphological characters (Michener and Sokal 1957)
phylogenetic studies of human populations based
on blood allele frequencies led to the
introduction of distance, parsimony likelihood
methods (Edwards Cavalli-Sforza 1963, 1964)
Lecture 6 Page 4
5
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
A brief history of molecular phylogeny
emergence of sequence based methods
  • ? accumulation of amino acid sequences stimulates
    development of sequence
  • based phylogenetic methods
  • ? these soon emerge as the most powerful methods
    (see slide 6 for reasons)
  • Parsimony, Distance Maximum likelihood methods
    (see slide 10)
  • ? Eck and Dayhoff (1966) working of Atlas of
    Protein Sequence and Structure
  • publish first method for phylogenetic analysis
    of sequences based on parsimony
  • ? Fitch and Margoliash (1967) publish first
    distance based method weighted least
  • squares for sequence based (cytochrome c)
    phylogenetic inference
  • ? Statistician Neyman (1971) publishes first
    maximum likelihood method for
  • phylogenetic analysis of sequence data

Lecture 6 Page 5
6
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Benefits of using molecular sequences for
phylogenetics
1 - sequences evolve in a much more regular
manner than morphological characters 2 - less
prone to confusion between homology and analogy,
homoplasies 3 - vast abundance of characters to
analyze 4 - molecular data more amenable to
quantitative treatments 5 - molecular data
ubiquitous - can be used for microorganisms 6 -
can be used to study relationships at many
different evolutionary levels faster
evolving genes - mitochondrial DNA - closely
related species slower evolving genes -
ribosomal RNA genes - distantly related species
Some success stories .
primate evolution - who are humans closest
relatives ? origin of Cetacea mammals (whales,
dolphins, porpoises) revising deep taxonomic
classification scheme - 3 domains of life
Lecture 6 Page 6
7
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Unrooted versus rooted phylogenies
R
time
unrooted
rooted
only specifies relationships not the evolutionary
path
root (R) is common ancestor of all OTUs path from
root to OTUs specifies time knowledge of
outgroup required to define root
Lecture 6 Page 7
8
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Different visual representations of trees
rectangular cladogram
slanted cladogram branch lengths not
proportional to distance
phylogram branch lengths proportional to distance
Lecture 6 Page 8
9
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Species tree versus gene tree
species tree - represents evolutionary
relationships among species gene tree -
represents evolutionary relationship among genes
species trees and genes trees can (and often do!)
differ
Reasons for this ?? comparison of orthologous
versus paralogous genes horizontal (later)
transfer of genes
more on these important concepts later in course
the concept of an accurate species tree is
notoriously difficult to pin down in this class
we will deal almost exclusively with genes trees
Lecture 6 Page 9
10
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Methods of phylogenetic reconstruction
Distance based pairwise evolutionary distances
computed for all taxa tree constructed using
algorithm based on relationships between
distances Maximum parsimony nucleotides or
amino acids are considered as character states
best phylogeny is chosen as the one that
minimizes the number of changes between
character states Maximum likelihood
statistical method of phylogeny reconstruction
explicit model for how data set generated -
nucleotide or amino acid substitution find
topology that maximizes the probability of the
data given the model and the parameter values
(estimated from data)
Lecture 6 Page 10
11
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Phylogenetic inference
1 sequences change as they evolve from a common
ancestor over time 2 a group of related
sequences retains information (incomplete) about
the evolutionary history that unites them
based on the pattern of changes 3 phylogeny is
estimation, make the best estimate about
evolutionary history given the incomplete
information in the sequences being analyzed 4
information about the past is not available, only
extant sequences 5 therefore any evolutionary
scenario (i.e. phylogeny) can be postulated to
explain the changes in the sequences being
analyzed 6 must have some way to discriminate
among the (many!) possible phylogenies
Lecture 6 Page 11
12
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Number of OTUs and number of possible trees
n ? (2i-5) i3
n ? (2i-5) ? (2n-3) i3
unrooted trees
rooted trees
OTUs (n)
2 1 1 3
1 3 4 3
15 5 15
105 6 105
954 7 954 10,395
8 10,395 135,135 9
135,135 2,027,025 10
2,027,025 34,459,425
true tree - true evolutionary history is one of
many possibilities difficult to infer true tree
when OTUs is large inferred tree - obtained
using data and reconstruction method not
necessarily the same as the true tree - a
hypothesis
Lecture 6 Page 12
13
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Alogrithms Optimality Criteria
Two ways for selecting preferred phylogenies ?
Algorithms sequence of steps that leads to the
selection of a phylogeny - combine phylogeny
inference and criterion definition into single
step - move directly to toward the best tree
without evaluating many different trees e.g.
UPGMA Neighbor-joining ? Optimality criteria
a criteria is defined whereby different
phylogenies are - compared to one another to
determine which is better - two steps
involved 1 define criteria (objective
function) 2 use algorithm to compute
objective function on different trees - this
method is much slower must evaluate many trees
(shorcuts often necessary) - may be more
robust because scores are assigned to every
phylogeny and then they are ranked
yields information about how well specified the
tree is e.g. Least squares Minimum
evolution Compromise define starting tree with
algorithm approach and then search nearby
tree-space using optimality criteria approach
Lecture 6 Page 13
14
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Least squares method
? First distance based method developed
Cavalli-Sforza Edwards (1967) Fitch
Margoliash (1967) ? Optimality criterion
minimize the residual sum of squares (RS) between
the observed distances (dij - based on
distance matrix) and the patristic differences
(eij based on the branch lengths of the
inferred phylogeny)
e.g. dBD 18 eBD 6 2 8 16 RS-BD (18
16)2 4
Lecture 6 Page 14
15
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Least squares method
? Statisitcally very sound method since based on
Least squares ? Logically challenged since it
formally estimates branch lengths and not
topologies ? In principle RS is computed for all
possible topologies but in practice this
quickly becomes impracticable (see slide 12)
short cuts are available to minimize search
space (see lecture week7) ? Fitch Margoliash
(1967) introduced weighted least squares that
corrects for the bias introduced by long
distances ? Negative branch length estimates
can confound method constraint of
non-negative branch lengths results in
substantial improvement
Lecture 6 Page 15
16
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Minimum evolution method
? Optimality criterion choose the phylogeny
that gives the smallest value of S - the sum
of all branch lengths
where T total branches bi
branch length i estimate
S 35.6
S 35.0
Lecture 6 Page 16
17
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Minimum evolution method
? As with least squares, S should ideally be
computed for all possible trees but this is
impossible with many taxa ? One shortcut is to
start search with neighbor-joining tree and then
evaluate closely related trees to find the
best one ? Close neighbor interchange (CNI)
start with temporary ME tree (e.g. NJ tree for
first step) and evaluate all trees that differ
by one or two topological changes ? This
approach may be more robust than using
neighbor-joining alone because it can result
in an ordered list of trees, if many trees
represent the data almost equally well then
the best tree may not be so well supported
Lecture 6 Page 17
18
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
UPGMA method (unweighted pair group method with
arithmetic mean)
? simplest method - uses sequential clustering
algorithm ? results in ultrameric trees
equal distances from root to all tips ? based on
assumption of strict rate constancy among
lineages this is often violated and so
method often gives erroneous trees (not
reccomended)
step 1 step 2
(AB) C d(AB)C
Distance matrix Tree
d(AB)C (dAC dAB) / 2
Lecture 6 Page 18
19
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
UPGMA example
step 1 step 2
(AB) C 6
Distance matrix Tree
d(AB)C (dAC dAB) / 2
2 2
1 3
2 2
Lecture 6 Page 19
20
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Neighbor-joining method
uses star decomposition identification of
neighbors that sequentially minimize the total
length of the tree
1 - start with star tree - no topology S total
branch length of tree
2 - separate pair of OTUs from all others S12
total branch length of tree
3 - choose pair of OTUs that minimizes total
branch lengths in the tree 4 - this pair
collapsed as single OTU and distance matrix
recalculated 5 - next pair of OTUs that gives
smallest branch length is chosen 6 - iterate
until complete
Lecture 6 Page 20
21
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Neighbor-joining example
Lecture 6 Page 21
22
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Neighbor-joining method
? Extremely fast and efficient method, widely
used found in numerous publications ? Tends to
perform fairly well in simulation studies ? May
produce tie trees from data set but this appears
to be rare ? Algorithm is greedy and so can
get stuck in local optima ? Main criticism is
that it produces only one tree and does not give
any idea of how many other trees are equally
well or almost as supported by the data ? For
this reason, neighbor-joining is often used as a
method to find a starting tree that other
methods (e.g. minimum evolution) will evaluate to
find the best tree
Lecture 6 Page 22
23
Johns Hopkins University - Fall 2003
Phylogenetics Computational Genomics -
410.640.71
Exercises
1 - choose some alignment to work on 2 - load
alignment into Clustal and build neighbor-joining
tree 3 - open tree in Treeview and view,
manipulate and save tree 4 - load alignment into
DAMBE and into MEGA and reconstruct and view
trees using all distance methods available
look for differences in results 5 - manually
reconstruct UPGMA tree for the distance matrix on
slide 14 6 - open the MEGA formatted version of
this same distance matrix http//jhunix.hcf.jh
u.edu/kjordan6/distances.meg in MEGA and
reconstruct distance based trees using all 3
methods available (check UPGMA result against
manually reconstructed UPGMA tree) 7 - calculate
RS for all three distance based trees from 6 and
pick best tree 8 - calculate S for all three
distance based trees from 6 and pick best tree
Lecture 6 Page 23
Write a Comment
User Comments (0)
About PowerShow.com