Title: General Phylogenetics
1General Phylogenetics
- Points that will be covered in this presentation
- Tree Terminology
- General Points About Phylogenetic Trees
- Phylogenetic Analyses
- The importance of Alignments
- The different analysis methods
- Tree confidence measures
2Tree Terminology
Node point at which 2 or more branches diverge
internal node
Internal node hypothetical last common
ancestor Terminal node molecular or
morphological data from which the tree is
derived. (These will often be used to represent
species or individual specimens and may be
referred to as OTUs Operational Taxonomic Units)
terminal node or OTU
terminal node or OTU
internal node
Clade a node (hypothetical ancestor) and all the
lineages descending from it
clade
clade
3Tree Terminology
Monophyletic group a group in which all members
are derived from a unique common ancestor
Polyphyletic group a group in which all members
are not derived from a unique common ancestor.
The common ancestor of the group has many
descendants that are not in the group
Paraphyletic group a group that excludes some of
the descendants of the common ancestor (a form of
polyphyly)
4General Points About Phylogenetic Trees
A
B
All branches can rotate freely around a node
(i.e. B is not more closely related to C than A,
and C is not more closely related to D than E)
C
D
E
Branch lengths may be be drawn as equal between
nodes cladograms (see tree above) (these are
used when one is interested only in the branching
pattern) Branch lengths may be proportional to
the hypothesized distance between nodes
phylogram (see tree on left)
A
B
C
D
E
5General Points About Phylogenetic Trees
Fully resolved trees are bifurcating (only two
decendant lineages from nodes) A node with more
than two decendant lineages is a multifurcating
node or a polytomy. Polytomies may be soft or
hard Soft product of data or
analysis Hard product of biology
polytomy
polytomy
6General Points About Phylogenetic Trees
LSU tree
polytomy
Example of a soft polytomy LSU analysis is
unable to resolve the relationships of some
Ptilophora species.
Tronchin et al. 2004
rbcL tree
Using different data (rbcL) the relationships
among Ptilophora species are better resolved.
Tronchin et al. 2004
7Phylogenetic Analyses
The Importance of Alignments Phylogenetic trees
derived from the analysis of DNA or amino acid
sequences are only as good as the data they are
based upon. Garbage In Garbage
Out Consequently, sequence alignment is the most
important step in phylogenetic analysis. The
aligned sites of a sequence must be homologous
(or identical by decent taxa share the same
state because their ancestor did). If two taxa
share the same state but not by decent it is
called homoplasy
8Phylogenetic Analyses
The Importance of Alignments
DNA sequences are prone to homoplasy because
there are only 4 possible sites (and
insertion/deletion mutationsindels for some
loci).
same sites in different sequences need to be
homologous
area to possibly remove from analyses because of
uncertain homology between sites
inferred insertion/deletion mutations (gaps)
9Phylogenetic Analyses
The Different Analysis Methods See
evolution.genetics.washington.edu/phylip/software.
htmlmethods for a list of software programs
- Distance methods based on similarity between
OTUs - UPGMA originally used for phenotypic characters
in numerical taxonomy. Generally not applied to
sequence data because it is highly sensitive to
mutation rate changes in lineages, i.e. the data
must fit a molecular clock. - NJ (Neighbor Joining) algorithm method that
will find the minimum evolution tree without
examining all possible topologies. - The accuracy of a distance tree depends on 2
things - How true are the distances calculated between
taxa (how good is the model of evolution that
your distances are based upon). - 2) The standard error of the distance measure
estimation
10Phylogenetic Analyses
The Different Analysis Methods
- Optimization methods
- Parsimony searching for the tree that requires
the least number of mutational steps i.e. the
simplest is the best. - Maximum Likelihood searching for the most likely
tree (the tree with highest probability) given
the OTUs (sequences) and model of evolution i.e.
the tree that maximizes the probability of
observing the data is the best tree. - Bayesian searching for the best set of trees
i.e. the set of trees in which the likelihoods
are so similar that changes between them are
essentially random.
11Phylogenetic Analyses
Tree Confidence Measures
Decay Analysis or Goodman-Bremer Support Values
a test used in parsimony analyses where one
determines how many steps less parsimonious than
minimal, is a particular branch in your tree no
longer resolved in the consensus of all possible
trees that length.
One step less parsimonious L 36
Two steps less parsimonious L 37
Most parsimonious tree L 35
d1
d2
How meaningful the values are may depend on the
tree length.
12Phylogenetic Analyses
Tree Confidence Measures
Bootstrapping A non-parametric test of how well
the data support the nodes of a given
tree. Determining support is a bit of a
statistical problem Evolution only happened once
so there is no underlying distribution to sample
in order to develop confidence values. Method
the original analysis is performed multiple times
on pseudo-datasets derived by sampling the
original dataset with replacement. The number, or
fraction, of times that a particular clade is
present in the resulting trees is its boostrap
value. Bootstrapping is not portable i.e. you
can not compare values across studies because
changing any parameters will change the values.
13Phylogenetic Analyses
Tree Confidence Measures
- Bootstrapping
- By default most programs will show bootstrap
values when they are greater than 50 but, does a
bootstrap value of 50 mean anything? - For a discussion of this see Hillis Bull (1993)
Systematic Biology 42182-192 (they tested
bootstrap values based on a known phylogeny). - Wilsons General Rule
- 60-80, is there other evidence to support the
relationship, be cautious - 80-90, usually pretty solid
- 90-100, solid and unlikely to be misleading.
14General Points About Phylogenetic Trees
DNA or protein sequence trees are hypotheses of
how a particular DNA locus or protein has
evolved. We assume that the way the DNA or
protein has evolved reflects the way the species
has evolved i.e. gene tree species
tree IMPORTANT This may or may not reflect
reality. i.e. You Still Have To Think as
molecules do not necessarily trump morphology,
development, etc.
15General Points About Phylogenetic Trees
species tree
gene tree
C
C
A
A
B
B
gene tree species tree
gene tree species tree