Title: Terminology of phylogenetic trees
1Terminology of phylogenetic trees
Types of phylogenetic trees Types of Data Charac
ter Evolution Approaches to Phylogeny Reconstruct
ion
2Phylogenetic tree (dendrogram)
Nodes branching points Branches lines Topology
branching pattern
3Sister Taxa two taxa that are more closely
related to eachother than either is to a third ta
xon.
A B
C D
4 Branches can be rotated at a node, without
changing the
relationships among the OTUs.
5Levels of Resolution on a Phylogenetic Tree
6Hard polytomy simultaneous divergence.
Soft polytomy lack of resolution.
7Rooted unique path from root.
Unrooted degree of kinship, no evolutionary path.
8Number of possible phylogenetic trees
3 OTUs 1 unrooted tree 3 rooted trees 4 OTU
s 3 unrooted trees
15 rooted trees.
9(No Transcript)
10TYPES OF TREES
11Newick (shorthand) format
- text based representation of relationships.
12Qualitative vs. quantitative data
Quantitative continuous data (i.e.height or
length) Qualitative discrete (2 or more value
s) Binary 2 values Mulitstate more than 2 val
ues
Most molecular data are qualitative
Binary presence or absence of band, or gap in
sequence
Multistate nucleotide data (A, T, G, C)
13 Nucleotide character data Characters positio
n in the nucleotide sequence. (i.e. position 352)
Character states nucleotide at the position
in the nucleotide sequence. (G, A, T, or C)
14Assumptions About Character Evolution
Unordered change from one character to
another occurs in one step. (i.e. nucleotide chan
ges) Ordered number of steps from one state t
o another equals the absolute value of
the difference between their state number.
1 2 3 4 5 requires 4 steps 5 4 3 2 1 requires
4 steps (reversible vs. unreversible)
15Phylogenetic reconstruction methods take into
assumption (1) of discrete steps required fo
r one character state to change into another
(2) probability with which such change occurs.
16Step matrix
- number of steps required between character st
ates.
17Approaches to Phylogeny Reconstruction
Cladistics (parsimony) recency of common ancest
ry Maximum Likelihood model of sequence evolutio
n Phenetics (UPGMA, neighbor joining) overall si
milarity
18PARSIMONY APPROACH
Parsimony General scientific criterion for choo
sing among competing hypotheses that states that
we should accept the hypothesis that explains the
data most simply and efficiently. Maximum p
arsimony method of phylogeny reconstruction
The optimum reconstruction of ancestral character
states is the one which requires the fewest mutat
ions in the phylogenetic tree to account for cont
emporary character states.
19First step in maximum parsimony analysis
Identify all of the informative sites.
Invariant all OTUs possess the same character
state at the site. Any invariant site is uninfor
mative.
20Two types of variable sites Informative favor
s a subset of trees over other possible trees.
Uninformative a character that contains no
grouping information relevant to a cladistic prob
lem (i.e. autapomorphies).
21Uninformative each tree 3 steps
22Parsimony Analysis 2nd step Calculate the
minimum number of substitutions at each informati
ve site
1 step
2 steps
2 steps
Informative favors tree 1 over other 2 trees.
23Final step in parsimony analysis Sum the number
of changes over all informative sites for each p
ossible tree and choose the tree
associated with the smallest number of changes.
Site 3
Site 4
Site 5
Site 9
3 steps
3 steps
4 steps
24Parsimony Search Methods Exhaustive search met
hod searches all possible fully resolved
topologies and guarantees that all of the minimu
m length cladograms will be found.
(not a practical option, time consuming)
Branch and bound methods begins with a cladogra
m. The length of starting cladogram is retained a
s an upper bound for use during subsequent cladog
ram construction. As soon as a length
of part of the tree exceeds the upperbound, the
cladogram is abandoned. If equal length, cladogra
m is saved as an optimal topology. If length is l
ess, it is substituted for the original as the
optimal upperbound. (good option for fewer than
20 taxa, time consuming) Heuristic methods ap
proximate or hill climbing technique
Begin with a cladogram, add taxa and swap
branches until a shorter length cladogram is fou
nd. Procedure can be replicated many
times to increase chance of finding minimum
length cladogram.
25Different types of parsimony analyses
Unweighted parsimony all character state change
s are given equal weight in the step matrix. W
eighted parsimony different weights assigned to
different character state changes.
Transversion parsimony transitions are complete
ly ignored in the analysis, only transversions ar
e considered.
26Maximum Likelihood Method The likelihood (L) o
f a phylogenetic tree is the probability of obser
ving the data (nucleotide sequences)
under a given tree and a specified model of
character state changes. The aim is to find the
tree (among all possible trees)
with the highest L value.
27 Models of character state changes (sequence
evolution) Jukes and Cantor 1 parameter model
all changes equal probability
Kimura 2 parameter model transitions more
frequent than transversions Other more compl
icated models...
281. Calculate likelihood for each site on a spec
ific tree. 2. Sum up the L values for all sit
es on the tree. 3. Compare the L value for a
ll possible trees. 4. Choose tree with highe
st L value.
29Distance Methods evolutionary distances (number
of substitutions) are computed for all pairs of t
axa. UPGMA unweighted pairgroup method with ar
ithmetic means - assumes equal rate of substituti
ons - sequential clustering algorithms - pairs o
f taxa are clustered in order of decreasing
similarity Neighbor Joining finding shortest
(minimum evolution) tree by finding
neighbors that minimize the total length of the
tree. Shortest pairs are chosen to be neighbors a
nd then joined in distance matrix as one OTU.
30Consensus Methods Consensus trees are derived
from a set of trees and summarize the phylogeneti
c information of several trees in a single tree.
Most commonly used consensus trees Strict
consensus all conflicting branching patterns
are collapsed. 50 majority rule consensus b
ranching patterns that occur with a frequency of
50 or more are retained, all others are collapse
d.
31CONSENSUS METHODS
A
A
A
B
D
C
B
C
C
B
D
D
E
E
E
F
F
F
G
G
G
A
A
B
B
C
C
D
D
E
E
F
F
G
G
32Bootstrap method of assessing tree reliability
Inferred tree is constructed from data set.
Characters are resampled from the data set with
replacement. Resampling is replicated several (1
00-1000) times. Bootstrap trees are constructed
from the resampled data sets.
Bootstrap tree is compared to original inferred
tree. of bootstrap trees supporting a node ar
e determined for each node in the tree.
33Homoplasy non-homologous similarity
- resemblance not due to common ancestry
- evolved independently
- considered noise
34(No Transcript)
35Known bacterial phylogeny ancestors at each node
known. Hillis Huelsenbeck 1992 tested the a
bility of different methods, of finding the tru
e phylogeny. Maximum parsimony and maximum li
kelihood performed well, UPGMA neighbor joinin
g did not.
36Strengths and Weaknesses UPGMA neighbor-join
ing fast but not as accurate as
other methods. Maximum parsimony time consumin
g, but more accurate. can combine morphological c
haracters with DNA characters in a single analysi
s. Maximum likelihood very time consuming, inc
luding information from morphology is a new techn
ique (but it is controversial), can invoke a spec
ific model of sequence evolution.
Reference Molecular Systematics 2nd Ed., Hillis
et. al (1996), Sinauer Associates. ISBN0-8
7893-282-8