Reconstruccin Filogentica - PowerPoint PPT Presentation

1 / 89
About This Presentation
Title:

Reconstruccin Filogentica

Description:

DATOS: Alineamiento de secuencias de genes ... Maximum likelihood or model ... clustering with the Unweighted Pair Group Method with Arithmatic Mean (UPGMA) ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 90
Provided by: michal91
Category:

less

Transcript and Presenter's Notes

Title: Reconstruccin Filogentica


1
Reconstrucción Filogenética
2
Una manera simple de entender la evolución
3
DATOS Alineamiento de secuencias de genes
Cómo podemos transformar esta información a un
contexto histórico?
4
(No Transcript)
5
(No Transcript)
6
Phylogeny inference
  • Distance based methods
  • -Pair wise distance matrix
  • -Adjust tree branch lengths to fit the distance
    matrix (ex. Minimum squares, Neighbor joining)
  • 2. Character based methods
  • -Parsimony
  • -Maximum likelihood or model based evolution

7
In 1866, Ernst Haeckel coined the word
phylogeny and presented phylogenetic trees for
most known groups of living organisms.
8
The Tree of Life project
Surf the tree of life at http//tolweb.org/tree/p
hylogeny.html
9
What is a tree?
A tree is a mathematical structure which is used
to model the actual evolutionary history of a
group of sequences or organisms, i.e. an
evolutionary hypothesis.
A tree consists of nodes connected by branches.
The ancestor of all the sequences is the root of
the tree
Internal nodes represent hypothetical ancestors
Terminal nodes represent sequences or organisms
for which we have data. Each is typically called
a Operational Taxonomical Unit or OTU.
10
Types of Trees
Rooted vs. Unrooted
M is the number of OTUs
11
The number of rooted and unrooted trees
Number of OTUs
OTU Operational Taxonomical Unit
 
12
Types of Trees
Multifurcating
Bifurcating
Polytomy
  • Polytomies Soft vs. Hard
  • Soft designate a lack of information about the
    order of divergence.
  • Hard the hypothesis that multiple divergences
    occurred simultaneously

13
Types of Trees
Trees
Networks
Only one path between any pair of nodes
More than one path between any pair of nodes
14
A shorthand for trees the Newick format
((1,2),(3,4))
1
2
3
4
(((1,2),((3,4),5)),6)
6
1
2
5
3
4
15
Comments on Trees
  • Trees give insights into underlying data
  • Identical trees can appear differently depending
    upon the method of display
  • Information maybe lost when creating the tree.
    The tree is not the underlying data.

16
Different kinds of trees can be used to depict
different aspects of evolutionary history
  • Cladogram
  • simply shows relative recency of common
    ancestry
  • Additive trees
  • a cladogram with branch lengths,
  • also called phylograms and metric trees
  • 3. Ultrametric trees
  • (dendograms) special kind of additive
    tree in which the
  • tips of the trees are all equidistant
    from the root

17
Making trees according to morphological features
Ridley New Scientist (Dec. 1983) 100, 647-51
18
Given a multiple alignment, how do we construct
the tree?
A - GCTTGTCCGTTACGAT B ACTTGTCTGTTACGAT C
ACTTGTCCGAAACGAT D - ACTTGACCGTTTCCTT E
AGATGACCGTTTCGAT F - ACTACACCCTTATGAG
?
19
Distance methods
Logic Evolutionary distance is a tree metric and
hence defines a tree
  • General Method
  • Evolutionary distances are computed for all pairs
    of taxa.
  • A phylogenetic tree is constructed by considering
    the relationships among these distance data
    (fitting a tree to the matrix).
  • Methods well talk about
  • UPGMA (Unweighted Pair Group Method with
    Arithmetic Mean )
  • Neighbor Joining

20
Distance methods
21
Ultrametric Trees
a
b
c
Metric distances must obey 4 rules Non-negativit
y d(a,b) gt 0 Symmetry d(a,b) d(b,a) Triangle
Inequality d(a,c) lt d(a,b) d(b,c) Distinctness
d(a,b) 0 iff a b
22
Construction of a distance tree using clustering
with the Unweighted Pair Group Method with
Arithmatic Mean (UPGMA)
First, construct a distance matrix
A - GCTTGTCCGTTACGAT B ACTTGTCTGTTACGAT C
ACTTGTCCGAAACGAT D - ACTTGACCGTTTCCTT E
AGATGACCGTTTCGAT F - ACTACACCCTTATGAG
From http//www.icp.ucl.ac.be/opperd/private/upgm
a.html
23
UPGMA
First round
dist(A,B),C (distAC distBC) / 2 4
dist(A,B),D (distAD distBD) / 2 6
dist(A,B),E (distAE distBE) / 2
6 dist(A,B),F (distAF distBF) / 2 8
Choose the most similar pair, cluster them
together and calculate the new distance matrix.
24
UPGMA
Second round
Third round
25
UPGMA
Fourth round
Fifth round
Note the this method identifies the root of the
tree.
26
(No Transcript)
27
A tree of human mitochondria sequences
http//www.genpat.uu.se/mtDB/
  • The mitochondrial genome has 16,500 base-pairs.
  • In 2000, Gyllensten and colleagues sequenced the
    mitochondrial genomes of 53 people of diverse
    geographical, racial and linguistic backgrounds.
  • A molecular clock seems to hold the divergence of
    these sequences at a rate of 1.7x10-8
    substitutions per site per year.

Ingman, M., Kaessmann, H., Pääbo, S.
Gyllensten, U. (2000) Nature 408 708-713.
28
The deepest branches lead exclusively to
sub-Saharan mtDNAs, with the second branch
containing both Africans and non-Africans.
sub-Sahara mtDNA
A tree of 86 mitochondrial sequences. Downloaded
from http//www.genpat.uu.se/mtDB/sequences.html
and analyzed using MEGA, method UPGMA
29
Rooting the tree with an outgroup
Root
Outgroup
Ingman, M., Kaessmann, H., Pääbo, S.
Gyllensten, U. (2000) Nature 408 708-713.
30
Phylogeny based upon the molecular clock
  • Evidence for a human mitochondrial origin in
    Africa African sequence diversity is twice as
    large as that of non-African
  • Gyllensten and colleagues estimate that the
    divergence of Africans and non-Africans occurred
    52,000 to 28,000 years ago.

Ingman, M., Kaessmann, H., Pääbo, S.
Gyllensten, U. (2000) Nature 408 708-713.
31
UPGMA assumes a molecular clock
  • The UPGMA clustering method is very sensitive to
    unequal evolutionary rates (assumes that the
    evolutionary rate is the same for all branches).
  • Clustering works only if the data are ultrametric
  • Ultrametric distances are defined by the
    satisfaction of the 'three-point condition'.

The three-point condition
For any three taxa, the two greatest distances
are equal.
32
UPGMA fails when rates of evolution are not
constant
A tree in which the evolutionary rates are not
equal
(Neighbor joining will get the right tree in this
case.)
From http//www.icp.ucl.ac.be/opperd/private/upgm
a.html
33
Neighbors
A
C
c
a
x
b
d
D
B
A and B are neighbors because they are connected
through a single internal node. C and D are
also neighbors, but A and D are not neighbors.
34
The 4-point condition can be used to identify
neighbors. Basically states that neighbors are
closer than non-neighbors.
The Four Point Condition
A
C
c
a
x
b
d
D
B
dAC dBD dAD dBC a b c d 2x dAB
dCD 2x
dAB dCD lt dAC dBD
dAB dCD lt dAD dBC
non-neighbors
neighbors
35
Neighbor Joining An algorithm for finding the
shortest tree
Start with a star (no hierarchical structure)
c
a
d
b
The length of the tree
Pair-wise distances
Number of OTUs
36
Neighbor Joining
(Saitou and Nei, 1987)
37
Neighbor Joining
(Saitou and Nei, 1987)
38
Neighbor Joining
(Saitou and Nei, 1987)
39
Neighbor Joining
(Saitou and Nei, 1987)
40
Character state methods MAXIMUM PARSIMONY
Logic Examine each column in the multiple
alignment of the sequences. Examine all possible
trees and choose among them according to some
optimality criteria
  • Method well talk about
  • Maximum parsimony

41
Maximum Parsimony
Simpler hypotheses are preferable to more
complicated ones and that as hoc hypotheses
should be avoided whenever possible (Occams
Razor). Thus, find the tree that requires the
smallest number of evolutionary changes.
0123456789012345 W - ACTTGACCCTTACGAT X
AGCTGGCCCTGATTAC Y AGTTGACCATTACGAT Z -
AGCTGGTCCTGATGAC
W
X
Y
Z
42
Maximum Parsimony
Start by classifying the sites
123456789012345678901 Mouse
CTTCGTTGGATCAGTTTGATA Rat
CCTCGTTGGATCATTTTGATA Dog
CTGCTTTGGATCAGTTTGAAC Human
CCGCCTTGGATCAGTTTGAAC ----------------------------
-------- Invariant Variant
-------------------------
----------- Informative
Non-inform.
43
123456789012345678901 Mouse
CTTCGTTGGATCAGTTTGATA Rat
CCTCGTTGGATCATTTTGATA Dog
CTGCTTTGGATCAGTTTGAAC Human
CCGCCTTGGATCAGTTTGAAC
G
T
T
G
G
G
T
G
G
G
G
G
Site 5
G
C
G
C
T
C
T
T
T
T
T
C
C
C
C
C
C
T
Site 2
C
C
C
C
T
C
G
G
T
T
G
G
G
G
T
G
T
T
Site 3
T
G
T
G
G
G
44
Maximum Parsimony
123456789012345678901 Mouse
CTTCGTTGGATCAGTTTGATA Rat
CCTCGTTGGATCATTTTGATA Dog
CTGCTTTGGATCAGTTTGAAC Human
CCGCCTTGGATCAGTTTGAAC Informative

3
1
0
45
Maximum Parsimony
The situation is more complicated when there are
more than four units.
(TAGC)
(AT)
(TAG)
T
(AG)
(AGT)
T
(CT)
A
(GT)
C
T
T
G
A
A
A
C
T
T
A
G
Problems with maximum parsimony Only uses
informative sites Long-branches attract
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
Maximum Likelihood Analysis
  • Same as Maximum Parsimony except rates of nucleic
    acids substitutions are not considered to have
    equal probability.
  • All possible unrooted trees are evaluated. (Same
    for Parsimony)
  • Each column of the alignment is processed. (Same
    for Parsimony)
  • The transition of A -gt T will have a different
    probability than the transition from G -gt C
  • Start with a frequency distribution table that
    specifies the probability of one base being
    substituted for another base.
  • See probabilities of nucleotide substitution.
    (Table 6.5 pg 275)
  • Probability that unrooted tree predicts each
    column of the alignment is calculated.
  • Probabilities for each column are summed together
    for each tree.
  • The unrooted tree with the highest probability is
    chosen.

50
Maximum Likelihood Example
  • Four sequences are compared (w, x, y and z)
  • All unrooted trees are shown
  • In this example we will examine the first
    unrooted tree.

51
Maximum Likelihood Example Continued
  • L(Tree x) L0 L1 L2 L3 L4 L5 L6
  • L0 base probability of nucleotide at 0 (0.25)
  • L1 probability of nucleotide changing from value
    at 0 to value at 1.
  • L2 probability of nucleotide changing from value
    at 0 to value at 1.
  • L3 probability of nucleotide changing from value
    at 1 to value at 3 (T).
  • L4, L5, L6 probability of nucleotide changing to
    value at leaf.

52
Maximum Likelihood Example Continued
  • There are 64 likelihood trees to evaluate.
    (number of bases) (number of internal nodes) or
    43.
  • We will show evaluation TTG against the first
    unrooted tree for column TTAG
  • Determine values for L0, L6. Values are
    determined by looking up probabilities in
    transition probability table.
  • Probability of L2 is T-gtG
  • Probability of L5 is G -gt A
  • Probability of L3 is T-gtT
  • Determine combined probability L0 L1 L2
    L6

53
Maximum Likelihood Example Continued
  • Determine probability for combination TGG
  • Determine probability for the other 62
    combinations.
  • Sum all the trees together. L(Tree) (LTree1)
    L(Tree2) L(Tree64)
  • Move to next column and repeat the same
    procedure.
  • Once all columns are complete sum all the
    probabilities. This is the likelihood of the
    first unrooted tree.
  • Continue this process for the other unrooted
    trees.
  • Pick the unrooted tree with the highest
    probability. This is the most likely unrooted
    tree.

54
(No Transcript)
55
(No Transcript)
56
EVOLUCIÓN IN VITRO POR INTERMEDIO DE PCR
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
Conclusion
  • Phylogenetic Prediction can be used for more than
    Evolutionary Distance
  • Verification of Taxonomy
  • Identification of unknown
  • Techniques work for genetic and non genetic data
    (Fatty Acid).
  • Use multiple methods for verification
  • Pick at least two different types of methods from
    Parsimony, Distance and Likelihood.
  • If the analysis is in agreement there is a higher
    level of confidence that the analysis is correct.

63
(No Transcript)
64
BOOTSTRAPING How confident are we in this tree?
65
Bootstrapping
A statistical method that can be used to place
confidence intervals on phylogenies
66
(No Transcript)
67
Resampling from the Data
Original data
human_myoglobin
-GLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHL
... pig_myoglobin
-GLSDGEWQLVLNVWGKVEADVAGHGQEVLIRLFKGHPETLEKFDK
FKHL ... horse_myoglobin
-GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETLEKF
DKFKHL ... common_seal_myoglobin
-GLSEGEWQLVLNVWGKVEADLAGHGQDVLIRLFKGHPETLEKFDKFKHL
... sperm_whale_myoglobin
MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHL
... sea_hare_myoglobin
-SLSAAEADLAGKSWAPVFANKDANGDAFLVALFEKFPDSANFFADFKG-
...
Pick with replacement human_myoglobin
LQKWDQKHNVHTEFGAEELQGDKLSWKKLDQGKKVVKKELGLDEDEW
LGE pig_myoglobin
LQKWDQKHNVHTEFGAEELQGDKLSWKKLDQGKKVVKKELGLDEDEWLGE
horse_myoglobin LQKWDQTHNVHTEFGAEELQG
DKLSWKTLDQGKKVVTKELGQDEDEWLGE common_seal_myoglobi
n LQKWEQKHNVHTEFGADELQGDKLSWKKLDQGKKVVKKELGL
DEDDWLGE -sperm_whale_myoglobin
LQRWEQKHHVHTEFAADELQGDKLSWKKLDQGRKVVKKELGLDEDDWLGE
sea_hare_myoglobin LDDWADENKSNSNFAAAELD
ANFASAPELNDGDKVAEKFAALNNAAWAAN
Resampled data number 1
Repeat 99 more time (or 999,999..)
68
Given the following tree, estimate the confidence
of the two internal branches
Chimpanzee
Human
Gibbon
Gorilla
Orang-utan
69
Estimating Confidence from the Resamplings
1. Of the 100 trees
41/100
28/100
31/100
Gorilla
Human
Chimpanzee
Human
Chimpanzee
Human
Gibbon
Gibbon
Gibbon
Gorilla
Chimpanzee
Gorilla
Orang-utan
Orang-utan
Orang-utan
2. Upon the original tree we superimpose
bootstrap values
Chimpanzee
Human
41
In 41 of the 100 trees, chimp and gorilla are
split from the rest.
Gibbon
In 100 of the 100 trees, gibbon and orang-utan
are split from the rest.
100
Gorilla
Orang-utan
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
(No Transcript)
74
THE TREE OF LIFE Relationships between 16S
ribosomal RNAs
eukaryotes
bacteria
archaea
Distant relationships
Close relationships
75
The three domains of Life as identified by
phylogenetic analysis of the highly conserved
16S ribosomal RNA
16S ribosomal RNA
(Woese and Fox 1977)
76
Where is the root of the tree of life?
(by definition there is no outgroup)
77
An ancient gene duplication can root a tree
Speciation of 3 and 1-2
Gene duplication
Speciation of 1 and 2
Outgroups for A2
Outgroups for A1
Root of 1,2,3
Graur Li. Fundamentals of Molecular Evolution
(1999)
78
The root of the tree of life as inferred from
Ef-Tu and EF-G
Both trees show Archaea and Eucarya as sister taxa
Graur Li. Fundamentals of Molecular Evolution
(1999)
79
Horizontal Gene Transfer
archae
eubacteria
Mn-dependent transcriptional regulator
(Tatusov, 1996)
80
What is the origin of the mitochondria?
http//www.mitomap.org/
81
The endosymbiotic theory
  • The evidence
  • Both mitochondria and chloroplasts can arise only
    from preexisting mitochondria and chloroplasts.
    They cannot be formed in a cell that lacks them
    because nuclear genes encode only some of the
    proteins of which they are made.
  • Both mitochondria and chloroplasts have their own
    genome.
  • Both genomes consist of a single circular
    molecule of DNA.
  • There are no histones associated with the DNA.

82
The Mitochondria sit with the proteobacteria in
the tree of life
mitochondrial (MT)
Small-subunit (SSU) ribosomal RNA tree
Gray MW Nature. 1998 Nov 12396(6707)109-10.
83
mitochondrion
chloroplast
Lack mitochondria (?)
84
The genome sequence of Rickettsia prowazekii and
the origin of mitochondria.
Andersson SG Nature 1998 Nov 12396(6707)133-40
85
Mitochondrial ribosomal proteins are most similar
to those of R. prowazekii
Andersson SG Nature 1998 Nov 12396(6707)133-40
86
Mitochondrial proteins involved in ATP
synthesis are most similar to those of R.
prowazekii
Andersson SG Nature 1998 Nov 12396(6707)133-40
87
Mitochondria derive from ?-Purple
bacteria Chloroplasts derive from cyanobacteria
Graur Li. Fundamentals of Molecular Evolution
(1999)
88
The tree of life with mitochondria and
chloroplast endosymbiotic events
(Doolittle, 1999)
89
Horizontal transfer is a dominant feature of the
tree of life
(Doolittle, 1999)
Write a Comment
User Comments (0)
About PowerShow.com