Title: Dan Graur
1Molecular Phylogenetics
2Objectives of molecular phylogenetics
- Reconstruct the correct evolutionary
relationships among biological entities - Estimate the time of divergence between
biological entities - Chronicle the sequence of events along
evolutionary lineages
3Evolutionary relationships are illustrated by
means of a phylogenetic tree or a dendogram.
4Ernst Heinrich Haeckel 1834-1919
5July 2007
July 1837
6November 1859
7The routes of inheritance represent the passage
of genes from parents to offspring, and the
branching pattern depicts a gene tree.
8Different genes, however, may have different
evolutionary histories, i.e., different routes of
inheritance.
9The routes of inheritance are confined by
reproductive barriers, i.e., gene flow occurs
only within a species. A species tree is a
representation of splitting of species lineages.
10Terminology
11A phylogenetic tree or dendrogram is a graph
composed of nodes and branches, in which only one
branch connects any two adjacent nodes.
12Internal
External or Peripheral Branch
13(No Transcript)
14Assumptions Bifurcation Real speciation
event Multifurcation Lack of resolution
15Binary tree
16Rooted and unrooted trees
17How many unrooted topologies are here?
a
b
d
d
2
1
b
a
e
e
c
c
a
c
e
e
4
3
a
b
d
d
b
c
18In an unrooted tree with four external nodes, the
internal branch is referred to as the central
branch.
19Cladograms Phylograms(collectively Dendograms)
20Unscaled phylogram
Scaled phylogram
21(No Transcript)
22(No Transcript)
23The Newick format In computer programs, trees
are represented in a linear form by a string of
nested parentheses, enclosing taxon names (and
possibly also branch lengths and bootstrap
values), and separated by commas. This type of
representation is called the Newick format. The
originator of this format in mathematics was
Arthur Cayley.
24The Newick format The Newick format for
phylogenetic trees was adopted on June 26, 1986
at an informal meeting at Newick's Lobster House
in Dover, New Hampshire. The Newick format
currently serves as the de facto standard for
representing phylogenetic tree and is employed by
almost all phylogenetic software tools.
Unfortunately, it has never been described in a
formal publication the first time it is
mentioned in a publication is in 1992.
25The Newick format In the Newick format, the
pattern of the parentheses indicates the topology
of the tree by having each pair of parentheses
enclose all members of a monophyletic group. A
phylogenetic tree in the Newick format always
ends in a semicolon ().
26The Newick format One can use the Newick format
to write down rooted trees, unrooted trees,
multifurcations, branch lengths, and bootstrap
values.
273 OTUs
1 unrooted tree 3 rooted trees
284 OTUs
3 unrooted trees 15 rooted trees
29The number of possible bifurcating rooted trees
(NR) for n ? 2 OTUs
The number of possible bifurcating unrooted trees
(NU) for n ? 3 OTUs
30¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾ Number of OTUs Number
of possible rooted tree ¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾
2 1 3 3 4 15 5 105 6
954 7 10,395 8 135,135 9 2,0
27,025 10 34,459,425 15 213,458,046,67
6,875 20 8,200,794,532,637,891,559,375 ¾¾¾¾¾
¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾
31Evolution is an historical process. Only one
historical narrative is true. From
8,200,794,532,637,891,559,375 possibilities, 1
possibility is true and 8,200,794,532,637,891,559,
374 are false. Truth is one, falsehoods are many.
32How do we know which of the 8,200,794,532,637,891,
559,375 trees is true?
33We dont, we infer by using decision criteria.
34True and inferred trees The sequence of
speciation events that has led to the formation
of a group of OTUs is historically unique. A tree
representing the true evolutionary history is
called the true tree. A tree that is obtained
by using a certain set of data and a certain
method of tree reconstruction is called an
inferred tree. An inferred tree may or may NOT
be the true tree.
35Cladogenesis the splitting of an evolutionary
lineage into two genetically
independent lineages.
36Anagenesis changes occurring along an
evolutionary lineage.
37In molecular phylogenetics, we assume that
species are only created by cladogenesis.
38A gene tree may differ from a species tree
39Gene trees and species trees
A
a
Species tree
Gene tree
B
b
D
c
It is often assumed that gene trees always equal
species trees. This may be not be true.
40Orthologs and paralogs
paralogous
A
C
b
orthologous
orthologous
A
c
B
C
a
b
A mixture of orthologs and paralogs is sampled
Duplication yields 2 copies (paralogs) on the
same genome
Ancestral gene
41(No Transcript)
42Taxon (singular) Taxa (plural)
A taxon is a species or a group of species that
has been given a name, e.g., Homo sapiens (modern
humans), or Lepidoptera (butterflies), or herbs.
There are codes of biological nomenclature
which seek to ensure that every taxon has a
single and stable name, and that every name is
used for only one taxon.
43Clades
- Strictly A clade is a group of all the taxa that
have been derived from a common ancestor plus the
common ancestor itself. - In molecular phylogenetics A clade is a group of
taxa under study that share a common ancestor,
which is not shared by any other species outside
the group.
also monophyletic groups, natural clades
44Paraphyletic Taxa
- A taxon whose common ancestor is shared by any
other taxon is called a paraphyletic taxon or an
invalid taxon.
Reptiles are paraphyletic.
44
45- A named taxon that lacks phylogenetic validity,
but is nonetheless used, is called a convenience
taxon.
Fish (Pisces)
a convenience fish
46Sister Taxa
- If a clade is composed of two taxa, these are
referred to as sister taxa.
Birds and crocodiles are sister taxa.
47(No Transcript)
48Which of the following groups are not
monophyletic?
E. coli
mouse
baboon
rat
human
chimp
a. human, chimpanzee, baboon b. mouse,
chimpanzee, baboon c. rat, mouse d. human,
chimpanzee, baboon, rat, mouse e. E. coli, human,
chimpanzee, baboon, rat, mouse
49Which of the following groups are not
monophyletic?
E. coli
mouse
baboon
rat
human
chimp
a. human, chimpanzee, baboon b. mouse,
chimpanzee, baboon c. rat, mouse d. human,
chimpanzee, baboon, rat, mouse e. E. coli, human,
chimpanzee, baboon, rat, mouse
50(No Transcript)
51A character provides information about an
individual OTU. A distance represents a
quantitative statement concerning the
dissimilarity between two OTUs.
52A character is a well-defined feature that in a
taxonomic unit can assume one out of two or more
mutually exclusive character states.
Mutually exclusive If David is tall, David
cannot be short.
53(No Transcript)
54(No Transcript)
55Character
Continuous
Discrete
Binary
Multistate
Unordered
Ordered
Unpolar
Polar
56A character is unordered if a change from one
character state to any other character state can
occur in one step.
57A character is ordered if there exists a unique
symmetrical path of change from one character
state to another.
58A character is polar if there exists a unique
asymmetrical (irreversible) path of change from
one character state to another.
Polar
59In partially ordered characters the number of
steps varies for the different pairwise
combinations of character states, but no definite
relationship exists between the number of steps
and the character-state. Amino-acid sites are
partially ordered characters. An amino acid
cannot change into all other amino acids in a
singe step, as sometimes 2 or 3 steps are
required. For example, a tyrosine may only
change into a leucine through an intermediate
state, i.e., phenylalanine or histidine.
60The number of steps in partially ordered
characters is specified by a step matrix, the
elements of which indicate the number of steps
required between any two character states
61(No Transcript)
62Assumptions about character evolution Methods of
phylogenetic reconstruction require that we make
explicit assumptions about (1) the number of
discrete steps required for one character state
to change into another. (2) the probability
with which such a change may occur.
63Temporal Polarity of Character States Character
states may be ranked by relative antiquity
into (1) primitive or ancestral
(plesiomorphy) (2) derived or novel (apomorphy)
64Taxonomic Distribution of Character States A
primitive state that is shared by several taxa is
a symplesiomorphy. A derived state that is
shared by several taxa is a synapomorphy. A
derived character state unique to a particular
taxon is an autapomorphy. A character state
that is shared by several taxa due to
convergence, parallelism and reversals, rather
than due to common descent, is a homoplasy.
sympathy synapse syllable system
65homoplasy
apomorphy (autapomorphy)
synapomorphy
symplesiomorphy
D
C
C
B
A
A
B
A
C
A
A
A
plesiomorphy
A
66(No Transcript)
67Distance Data
68(No Transcript)
69Most molecular data yield character states that
are subsequently converted into distances.
70Some molecular data can only be expressed as
distances.
71(No Transcript)
72(No Transcript)
73(No Transcript)
74