Title: Dan Graur
1Methods of Tree Reconstruction
2(No Transcript)
3(No Transcript)
4(No Transcript)
5(No Transcript)
6Molecular phylogenetic approaches 1.
distance-matrix (based on distance measures) 2.
character-state (based on character states) 3.
maximum likelihood (based on both character
states and distances)
7DISTANCE-MATRIX METHODS In the distance matrix
methods, evolutionary distances (usually the
number of nucleotide substitutions or amino-acid
replacements between two taxonomic units) are
computed for all pairs of taxa, and a
phylogenetic tree is constructed by using an
algorithm based on some functional relationships
among the distance values.
8Multiple Alignment
9Compute pairwise distances by correcting for
multiple hits at a single sites
Number of differences Number of changes
(e.g., number of nucleotide substitutions, number
of amino acid replacements)
10Distance Matrix
Units Numbers of nucleotide substitutions per
1,000 nucleotide sites
11Distance Methods UPGMA Neighbor-relations N
eighbor joining
12UPGMA Unweighted pair-group method with
arithmetic means
13UPGMA employs a sequential clustering algorithm,
in which local topological relationships are
identified in order of decreased similarity, and
the tree is built in a stepwise manner.
14simple OTUs
15composite OTU
16(No Transcript)
17(No Transcript)
18 UPGMA yields the correct answer only if the
distances are ultrametric! Q What happens if
the distances are only additive? Q What happens
if the distances are not even additive?
19Neighborliness methods The neighbors-relation
method (Sattath Tversky) The neighbor-joining
method (Saitou Nei)
20In an unrooted bifurcating tree, two OTUs are
said to be neighbors if they are connected
through a single internal node.
Neighbors ? Sister Taxa
21If we combine OTUs A and B into one composite
OTU, then the composite OTU (AB) and the simple
OTU C become neighbors.
22A
C
B
D
lt
Four-Point Condition
23The Neighbor Joining Method
24In distance-matrix methods, it is assumed
Similarity ? Kinship
25(No Transcript)
26From Similarity to Relationship
- Similarities among OTUs can be due to
- Ancestry
- Shared ancestral characters (symplesiomorphies)
- Shared derived characters (synapomorphy)
- Homoplasy
- Convergent events
- Parallel events
- Reversals
27Parsimony Methods
Willi Hennig 1913-1976
28Entities must not be multiplied beyond necessity
William of Occam (ca. 1285-1349) English
philosopher Franciscan monk William of Occam
was solemnly excommunicated by Pope John XXII.
29MAXIMUM PARSIMONY METHODS Maximum parsimony
involves the identification of a topology that
requires the smallest number of evolutionary
changes to explain the observed differences among
the OTUs under study. In maximum parsimony
methods, we use discrete character states, and
the shortest pathway leading to these character
states is chosen as the best or maximum
parsimony tree. Often two or more trees with
the same minimum number of changes are found, so
that no unique tree can be inferred. Such trees
are said to be equally parsimonious.
30(No Transcript)
31(No Transcript)
32uninformative
33informative
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38In the case of four OTUs, an informative site can
only favor one of the three possible alternative
trees. Thus, the tree supported by the largest
number of informative sites is the most
parsimonious tree.
39Inferring the maximum parsimony tree 1.
Identify all the informative sites. 2. For each
possible tree, calculate the minimum number of
substitutions at each informative site. 3. Sum
up the number of changes over all the informative
sites for each possible tree. 4. Choose the tree
associated with the smallest number of changes as
the maximum parsimony tree.
40- Maximum parsimony (Practice)
- Data
- TGCA
- TACC
- AGGT
- AAGT
- Step 1. Identify all the informative sites.
41- Maximum parsimony (Practice)
- Data
- TGC
- TAC
- AGG
- AAG
- Step 2. For each possible tree, calculate the
minimum number of substitutions at each
informative site.
42- Maximum parsimony (Practice)
- Data
- TGC
- TAC
- AGG
- AAG
- Step 3. Sum up the number of changes over all the
informative sites for each possible tree.
4 5 6
43- Maximum parsimony (Practice)
- Data
- TGC
- TAC
- AGG
- AAG
- Step 4. Choose the tree associated with the
smallest number of changes as the maximum
parsimony tree.
4 5 6
44Problem (exaggerated)
45Fitchs (1971) method for inferring nucleotides
at internal nodes
The set at an internal node is the intersection
(?) of the two sets at its immediate descendant
nodes if the intersection is not empty. The set
at an internal node is the union (?) of the two
sets at its immediate descendant nodes if the
intersection is empty. When a union is required
to form a nodal set, a nucleotide substitution at
this position must be assumed to have occurred.
46Fitchs (1971) method for inferring nucleotides
at internal nodes
47Testing properties of ancestral proteins
The ability to infer in silico the sequence of
ancestral proteins, in conjunction with some
astounding developments in synthetic biology,
allow us to resurrect putative ancestral
proteins in the laboratory and test their
properties. These properties, in turn, can be
used to test hypotheses concerning the physical
environment which the ancestral organism
inhabited (its paleoenvironment).
48Testing properties of ancestral proteins
Gaucher et al. (2003) used EF-Tu
(Elongation-Factor thermounstable) gene sequences
from completely sequenced mesophile eubacteria to
reconstruct candidate ancestral sequences at
nodes throughout the bacterial tree. These
inferred ancestral proteins were, then,
synthesized in the laboratory, and their
activities and thermal stabilities were measured
and compared to those of extant organisms.
Thermostability curves
The temperature profile of the inferred ancestral
protein was 55C, suggesting that the ancestor
of extant mesophiles was a thermophile.
49Ancestral reconstruction is not possible with
morphological data.
50The impossibility of exhaustively searching for
the maximum-parsimony tree when the number of
OTUs is large
51Exhaustive Examine all trees, get the best tree
(guaranteed). Branch-and-Bound Examine some
trees, get the best tree (guaranteed). Heuristic
Examine some trees, get a tree that may or may
not be the best tree.
52Exhaustive
53Branch-and-Bound
Rationale The length of a tree with n1 OTUs can
either be equal to or larger than the length of a
tree with n OTUs.
Reminder The total number of substitutions in a
tree tree length
54Branch -and- Bound
Obtain a tree by a fast method. (e.g., the
neighbor-joining method) Compute numbers of
substitutions (L) for this tree. Turn L into an
upper bound value. Rationale the maximum
parsimony tree must be either equal in length to
L or shorter.
55Branch -and- Bound
The magnitude of the search will depend on the
data (i.e., luck).
56Heuristic
57(No Transcript)
58Likelihood
- Example Coin tossing
- Data 10 tosses 6 heads 4 tails
- Hypothesis Binomial distribution
59LIKELIHOOD IN MOLECULAR PHYLOGENETICS
- The data are the aligned sequences
- The model is the probability of change from one
character state to another (e.g., Jukes Cantor
1-P model). - The parameters to be estimated are Topology
Branch Lengths
60(No Transcript)
61Bayesian Phylogenetics
Based on Bayes Theorem
A a proposition, a hypothesis. B the
evidence. P(A) the prior, the initial degree of
belief in A. P(AB) the posterior, the new
degree of belief in A given B (the evidence).
P(BA)/P(B) represents the support B provides
for A.
Thomas Bayes (17011761)