Title: Reconstruccin Filogentica
1Reconstrucción Filogenética
2Una manera simple de entender la evolución
3DATOS Alineamiento de secuencias de genes
Cómo podemos transformar esta información a un
contexto histórico?
4(No Transcript)
5(No Transcript)
6Phylogeny inference
- Distance based methods
- -Pair wise distance matrix
- -Adjust tree branch lengths to fit the distance
matrix (ex. Minimum squares, Neighbor joining) - 2. Character based methods
- -Parsimony
- -Maximum likelihood or model based evolution
7In 1866, Ernst Haeckel coined the word
phylogeny and presented phylogenetic trees for
most known groups of living organisms.
8The Tree of Life project
Surf the tree of life at http//tolweb.org/tree/p
hylogeny.html
9What is a tree?
A tree is a mathematical structure which is used
to model the actual evolutionary history of a
group of sequences or organisms, i.e. an
evolutionary hypothesis.
A tree consists of nodes connected by branches.
The ancestor of all the sequences is the root of
the tree
Internal nodes represent hypothetical ancestors
Terminal nodes represent sequences or organisms
for which we have data. Each is typically called
a Operational Taxonomical Unit or OTU.
10Types of Trees
Rooted vs. Unrooted
M is the number of OTUs
11The number of rooted and unrooted trees
Number of OTUs
OTU Operational Taxonomical Unit
Â
12Types of Trees
Multifurcating
Bifurcating
Polytomy
- Polytomies Soft vs. Hard
- Soft designate a lack of information about the
order of divergence. - Hard the hypothesis that multiple divergences
occurred simultaneously
13Types of Trees
Trees
Networks
Only one path between any pair of nodes
More than one path between any pair of nodes
14A shorthand for trees the Newick format
((1,2),(3,4))
1
2
3
4
(((1,2),((3,4),5)),6)
6
1
2
5
3
4
15Comments on Trees
- Trees give insights into underlying data
- Identical trees can appear differently depending
upon the method of display - Information maybe lost when creating the tree.
The tree is not the underlying data.
16Different kinds of trees can be used to depict
different aspects of evolutionary history
- Cladogram
- simply shows relative recency of common
ancestry
- Additive trees
- a cladogram with branch lengths,
- also called phylograms and metric trees
- 3. Ultrametric trees
- (dendograms) special kind of additive
tree in which the - tips of the trees are all equidistant
from the root
17Making trees according to morphological features
Ridley New Scientist (Dec. 1983) 100, 647-51
18Given a multiple alignment, how do we construct
the tree?
A - GCTTGTCCGTTACGAT B ACTTGTCTGTTACGAT C
ACTTGTCCGAAACGAT D - ACTTGACCGTTTCCTT E
AGATGACCGTTTCGAT F - ACTACACCCTTATGAG
?
19Distance methods
Logic Evolutionary distance is a tree metric and
hence defines a tree
- General Method
- Evolutionary distances are computed for all pairs
of taxa. - A phylogenetic tree is constructed by considering
the relationships among these distance data
(fitting a tree to the matrix).
- Methods well talk about
- UPGMA (Unweighted Pair Group Method with
Arithmetic Mean ) - Neighbor Joining
20Distance methods
21Ultrametric Trees
a
b
c
Metric distances must obey 4 rules Non-negativit
y d(a,b) gt 0 Symmetry d(a,b) d(b,a) Triangle
Inequality d(a,c) lt d(a,b) d(b,c) Distinctness
d(a,b) 0 iff a b
22Construction of a distance tree using clustering
with the Unweighted Pair Group Method with
Arithmatic Mean (UPGMA)
First, construct a distance matrix
A - GCTTGTCCGTTACGAT B ACTTGTCTGTTACGAT C
ACTTGTCCGAAACGAT D - ACTTGACCGTTTCCTT E
AGATGACCGTTTCGAT F - ACTACACCCTTATGAG
From http//www.icp.ucl.ac.be/opperd/private/upgm
a.html
23UPGMA
First round
dist(A,B),C (distAC distBC) / 2 4
dist(A,B),D (distAD distBD) / 2 6
dist(A,B),E (distAE distBE) / 2
6 dist(A,B),F (distAF distBF) / 2 8
Choose the most similar pair, cluster them
together and calculate the new distance matrix.
24UPGMA
Second round
Third round
25UPGMA
Fourth round
Fifth round
Note the this method identifies the root of the
tree.
26(No Transcript)
27A tree of human mitochondria sequences
http//www.genpat.uu.se/mtDB/
- The mitochondrial genome has 16,500 base-pairs.
- In 2000, Gyllensten and colleagues sequenced the
mitochondrial genomes of 53 people of diverse
geographical, racial and linguistic backgrounds. - A molecular clock seems to hold the divergence of
these sequences at a rate of 1.7x10-8
substitutions per site per year.
Ingman, M., Kaessmann, H., Pääbo, S.
Gyllensten, U. (2000) Nature 408 708-713.
28The deepest branches lead exclusively to
sub-Saharan mtDNAs, with the second branch
containing both Africans and non-Africans.
sub-Sahara mtDNA
A tree of 86 mitochondrial sequences. Downloaded
from http//www.genpat.uu.se/mtDB/sequences.html
and analyzed using MEGA, method UPGMA
29Rooting the tree with an outgroup
Root
Outgroup
Ingman, M., Kaessmann, H., Pääbo, S.
Gyllensten, U. (2000) Nature 408 708-713.
30Phylogeny based upon the molecular clock
- Evidence for a human mitochondrial origin in
Africa African sequence diversity is twice as
large as that of non-African - Gyllensten and colleagues estimate that the
divergence of Africans and non-Africans occurred
52,000 to 28,000 years ago.
Ingman, M., Kaessmann, H., Pääbo, S.
Gyllensten, U. (2000) Nature 408 708-713.
31UPGMA assumes a molecular clock
- The UPGMA clustering method is very sensitive to
unequal evolutionary rates (assumes that the
evolutionary rate is the same for all branches). - Clustering works only if the data are ultrametric
- Ultrametric distances are defined by the
satisfaction of the 'three-point condition'.
The three-point condition
For any three taxa, the two greatest distances
are equal.
32UPGMA fails when rates of evolution are not
constant
A tree in which the evolutionary rates are not
equal
(Neighbor joining will get the right tree in this
case.)
From http//www.icp.ucl.ac.be/opperd/private/upgm
a.html
33Neighbors
A
C
c
a
x
b
d
D
B
A and B are neighbors because they are connected
through a single internal node. C and D are
also neighbors, but A and D are not neighbors.
34The 4-point condition can be used to identify
neighbors. Basically states that neighbors are
closer than non-neighbors.
The Four Point Condition
A
C
c
a
x
b
d
D
B
dAC dBD dAD dBC a b c d 2x dAB
dCD 2x
dAB dCD lt dAC dBD
dAB dCD lt dAD dBC
non-neighbors
neighbors
35Neighbor Joining An algorithm for finding the
shortest tree
Start with a star (no hierarchical structure)
c
a
d
b
The length of the tree
Pair-wise distances
Number of OTUs
36Neighbor Joining
(Saitou and Nei, 1987)
37Neighbor Joining
(Saitou and Nei, 1987)
38Neighbor Joining
(Saitou and Nei, 1987)
39Neighbor Joining
(Saitou and Nei, 1987)
40Character state methods MAXIMUM PARSIMONY
Logic Examine each column in the multiple
alignment of the sequences. Examine all possible
trees and choose among them according to some
optimality criteria
- Method well talk about
- Maximum parsimony
41Maximum Parsimony
Simpler hypotheses are preferable to more
complicated ones and that as hoc hypotheses
should be avoided whenever possible (Occams
Razor). Thus, find the tree that requires the
smallest number of evolutionary changes.
0123456789012345 W - ACTTGACCCTTACGAT X
AGCTGGCCCTGATTAC Y AGTTGACCATTACGAT Z -
AGCTGGTCCTGATGAC
W
X
Y
Z
42Maximum Parsimony
Start by classifying the sites
123456789012345678901 Mouse
CTTCGTTGGATCAGTTTGATA Rat
CCTCGTTGGATCATTTTGATA Dog
CTGCTTTGGATCAGTTTGAAC Human
CCGCCTTGGATCAGTTTGAAC ----------------------------
-------- Invariant Variant
-------------------------
----------- Informative
Non-inform.
43 123456789012345678901 Mouse
CTTCGTTGGATCAGTTTGATA Rat
CCTCGTTGGATCATTTTGATA Dog
CTGCTTTGGATCAGTTTGAAC Human
CCGCCTTGGATCAGTTTGAAC
G
T
T
G
G
G
T
G
G
G
G
G
Site 5
G
C
G
C
T
C
T
T
T
T
T
C
C
C
C
C
C
T
Site 2
C
C
C
C
T
C
G
G
T
T
G
G
G
G
T
G
T
T
Site 3
T
G
T
G
G
G
44Maximum Parsimony
123456789012345678901 Mouse
CTTCGTTGGATCAGTTTGATA Rat
CCTCGTTGGATCATTTTGATA Dog
CTGCTTTGGATCAGTTTGAAC Human
CCGCCTTGGATCAGTTTGAAC Informative
3
1
0
45Maximum Parsimony
The situation is more complicated when there are
more than four units.
(TAGC)
(AT)
(TAG)
T
(AG)
(AGT)
T
(CT)
A
(GT)
C
T
T
G
A
A
A
C
T
T
A
G
Problems with maximum parsimony Only uses
informative sites Long-branches attract
46(No Transcript)
47(No Transcript)
48(No Transcript)
49Maximum Likelihood Analysis
- Same as Maximum Parsimony except rates of nucleic
acids substitutions are not considered to have
equal probability. - All possible unrooted trees are evaluated. (Same
for Parsimony) - Each column of the alignment is processed. (Same
for Parsimony) - The transition of A -gt T will have a different
probability than the transition from G -gt C - Start with a frequency distribution table that
specifies the probability of one base being
substituted for another base. - See probabilities of nucleotide substitution.
(Table 6.5 pg 275) - Probability that unrooted tree predicts each
column of the alignment is calculated. - Probabilities for each column are summed together
for each tree. - The unrooted tree with the highest probability is
chosen.
50Maximum Likelihood Example
- Four sequences are compared (w, x, y and z)
- All unrooted trees are shown
- In this example we will examine the first
unrooted tree.
51Maximum Likelihood Example Continued
- L(Tree x) L0 L1 L2 L3 L4 L5 L6
- L0 base probability of nucleotide at 0 (0.25)
- L1 probability of nucleotide changing from value
at 0 to value at 1. - L2 probability of nucleotide changing from value
at 0 to value at 1. - L3 probability of nucleotide changing from value
at 1 to value at 3 (T). - L4, L5, L6 probability of nucleotide changing to
value at leaf.
52Maximum Likelihood Example Continued
- There are 64 likelihood trees to evaluate.
(number of bases) (number of internal nodes) or
43. - We will show evaluation TTG against the first
unrooted tree for column TTAG - Determine values for L0, L6. Values are
determined by looking up probabilities in
transition probability table. - Probability of L2 is T-gtG
- Probability of L5 is G -gt A
- Probability of L3 is T-gtT
- Determine combined probability L0 L1 L2
L6
53Maximum Likelihood Example Continued
- Determine probability for combination TGG
- Determine probability for the other 62
combinations. - Sum all the trees together. L(Tree) (LTree1)
L(Tree2) L(Tree64) - Move to next column and repeat the same
procedure. - Once all columns are complete sum all the
probabilities. This is the likelihood of the
first unrooted tree. - Continue this process for the other unrooted
trees. - Pick the unrooted tree with the highest
probability. This is the most likely unrooted
tree.
54(No Transcript)
55(No Transcript)
56EVOLUCIÓN IN VITRO POR INTERMEDIO DE PCR
57(No Transcript)
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62Conclusion
- Phylogenetic Prediction can be used for more than
Evolutionary Distance - Verification of Taxonomy
- Identification of unknown
- Techniques work for genetic and non genetic data
(Fatty Acid). - Use multiple methods for verification
- Pick at least two different types of methods from
Parsimony, Distance and Likelihood. - If the analysis is in agreement there is a higher
level of confidence that the analysis is correct.
63(No Transcript)
64BOOTSTRAPING How confident are we in this tree?
65Bootstrapping
A statistical method that can be used to place
confidence intervals on phylogenies
66(No Transcript)
67Resampling from the Data
Original data
human_myoglobin
-GLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHL
... pig_myoglobin
-GLSDGEWQLVLNVWGKVEADVAGHGQEVLIRLFKGHPETLEKFDK
FKHL ... horse_myoglobin
-GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETLEKF
DKFKHL ... common_seal_myoglobin
-GLSEGEWQLVLNVWGKVEADLAGHGQDVLIRLFKGHPETLEKFDKFKHL
... sperm_whale_myoglobin
MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHL
... sea_hare_myoglobin
-SLSAAEADLAGKSWAPVFANKDANGDAFLVALFEKFPDSANFFADFKG-
...
Pick with replacement human_myoglobin
LQKWDQKHNVHTEFGAEELQGDKLSWKKLDQGKKVVKKELGLDEDEW
LGE pig_myoglobin
LQKWDQKHNVHTEFGAEELQGDKLSWKKLDQGKKVVKKELGLDEDEWLGE
horse_myoglobin LQKWDQTHNVHTEFGAEELQG
DKLSWKTLDQGKKVVTKELGQDEDEWLGE common_seal_myoglobi
n LQKWEQKHNVHTEFGADELQGDKLSWKKLDQGKKVVKKELGL
DEDDWLGE -sperm_whale_myoglobin
LQRWEQKHHVHTEFAADELQGDKLSWKKLDQGRKVVKKELGLDEDDWLGE
sea_hare_myoglobin LDDWADENKSNSNFAAAELD
ANFASAPELNDGDKVAEKFAALNNAAWAAN
Resampled data number 1
Repeat 99 more time (or 999,999..)
68Given the following tree, estimate the confidence
of the two internal branches
Chimpanzee
Human
Gibbon
Gorilla
Orang-utan
69Estimating Confidence from the Resamplings
1. Of the 100 trees
41/100
28/100
31/100
Gorilla
Human
Chimpanzee
Human
Chimpanzee
Human
Gibbon
Gibbon
Gibbon
Gorilla
Chimpanzee
Gorilla
Orang-utan
Orang-utan
Orang-utan
2. Upon the original tree we superimpose
bootstrap values
Chimpanzee
Human
41
In 41 of the 100 trees, chimp and gorilla are
split from the rest.
Gibbon
In 100 of the 100 trees, gibbon and orang-utan
are split from the rest.
100
Gorilla
Orang-utan
70(No Transcript)
71(No Transcript)
72(No Transcript)
73(No Transcript)
74THE TREE OF LIFE Relationships between 16S
ribosomal RNAs
eukaryotes
bacteria
archaea
Distant relationships
Close relationships
75The three domains of Life as identified by
phylogenetic analysis of the highly conserved
16S ribosomal RNA
16S ribosomal RNA
(Woese and Fox 1977)
76Where is the root of the tree of life?
(by definition there is no outgroup)
77An ancient gene duplication can root a tree
Speciation of 3 and 1-2
Gene duplication
Speciation of 1 and 2
Outgroups for A2
Outgroups for A1
Root of 1,2,3
Graur Li. Fundamentals of Molecular Evolution
(1999)
78The root of the tree of life as inferred from
Ef-Tu and EF-G
Both trees show Archaea and Eucarya as sister taxa
Graur Li. Fundamentals of Molecular Evolution
(1999)
79Horizontal Gene Transfer
archae
eubacteria
Mn-dependent transcriptional regulator
(Tatusov, 1996)
80What is the origin of the mitochondria?
http//www.mitomap.org/
81The endosymbiotic theory
- The evidence
- Both mitochondria and chloroplasts can arise only
from preexisting mitochondria and chloroplasts.
They cannot be formed in a cell that lacks them
because nuclear genes encode only some of the
proteins of which they are made. - Both mitochondria and chloroplasts have their own
genome. - Both genomes consist of a single circular
molecule of DNA. - There are no histones associated with the DNA.
82The Mitochondria sit with the proteobacteria in
the tree of life
mitochondrial (MT)
Small-subunit (SSU) ribosomal RNA tree
Gray MW Nature. 1998 Nov 12396(6707)109-10.
83mitochondrion
chloroplast
Lack mitochondria (?)
84The genome sequence of Rickettsia prowazekii and
the origin of mitochondria.
Andersson SG Nature 1998 Nov 12396(6707)133-40
85Mitochondrial ribosomal proteins are most similar
to those of R. prowazekii
Andersson SG Nature 1998 Nov 12396(6707)133-40
86Mitochondrial proteins involved in ATP
synthesis are most similar to those of R.
prowazekii
Andersson SG Nature 1998 Nov 12396(6707)133-40
87Mitochondria derive from ?-Purple
bacteria Chloroplasts derive from cyanobacteria
Graur Li. Fundamentals of Molecular Evolution
(1999)
88The tree of life with mitochondria and
chloroplast endosymbiotic events
(Doolittle, 1999)
89Horizontal transfer is a dominant feature of the
tree of life
(Doolittle, 1999)