Phylogenetic Trees Lecture 12 - PowerPoint PPT Presentation

About This Presentation

Title:

Phylogenetic Trees Lecture 12

Description:

Phylogenetic Trees Lecture 12 Based on pages 160-176 in Durbin et al (the black text book). This class has been edited from Nir Friedman s lecture which was ... – PowerPoint PPT presentation

Number of Views:225

Avg rating:3.0/5.0

Slides: 40

Provided by: NirF57

Category:

more less

Transcript and Presenter's Notes

Title: Phylogenetic Trees Lecture 12

1
Phylogenetic TreesLecture 12
Based on pages 160-176 in Durbin et al (the black
text book).
This class has been edited from Nir Friedmans
lecture which was available at www.cs.huji.ac.il/
nir. Pictures from Tal Pupko slides. Changes by
Dan Geiger and Shlomo Moran.
2
Evolution

Evolution of new organisms is driven by
Diversity
Different individuals carry different variants of
the same basic blue print
Mutations
The DNA sequence can be changed due to single
base changes, deletion/insertion of DNA segments,
etc.
Selection bias

3
The Tree of Life
Source Alberts et al
4
Tree of life- a better picture
Daprès Ernst Haeckel, 1891
5
Primate evolution
A phylogeny is a tree that describes the sequence
of speciation events that lead to the forming of
a set of current day species also called a
phylogenetic tree.
6
Morphological vs. Molecular

Classical phylogenetic analysis morphological
features number of legs, lengths of legs, etc.
Modern biological methods allow to use molecular
features
Gene sequences
Protein sequences
Analysis based on homologous sequences (e.g.,
globins) in different species
Important for many aspects of biology
Classification
Understanding biological mechanisms

7
Morphological topology
(Based on Mc Kenna and Bell, 1997)
Archonta
Ungulata
8
From sequences to a phylogenetic tree
Rat QEPGGLVVPPTDA Rabbit QEPGGMVVPPTDA Gorilla QE
PGGLVVPPTDA Cat REPGGLVVPPTEG
There are many possible types of sequences to use
(e.g. Mitochondrial vs Nuclear proteins).
9
Mitochondrial topology
(Based on Pupko et al.,)
10
Nuclear topology
(Based on Pupko et al. slide)
(tree by Madsenl)
11
Theory of Evolution

Basic idea
speciation events lead to creation of different
species.
Speciation caused by physical separation into
groups where different genetic variants become
dominant
Any two species share a (possibly distant) common
ancestor

12
Phylogenenetic trees

Leafs - current day species
Nodes - hypothetical most recent common ancestors
Edges length - time from one speciation to the
next

13
Dangers in Molecular Phylogenies

Gene and protein sequences can be homologous for
various reasons
Orthologs -- sequences diverged after a
speciation event. Indicative of a new specie.
Paralogs -- sequences diverged after a
duplication event.
Xenologs -- sequences diverged after a horizontal
transfer (e.g., by virus).

14
Gene Phylogenies
Phylogenies can be constructed to describe
evolution genes.
Three species termed 1,2,3. Two paralog genes A
and B.
15
Dangers of Paralogs

If we happen to consider only species 1A, 2B, and
3A, we get a wrong tree that does not represent
the phylogeny of the host species of the given
sequences because duplication does not create new
species.

Gene Duplication
Speciation events
2B
1B
3A
3B
2A
1A
In the sequel we assume all given sequences are
orthologs.
16
Types of Trees

A natural model to consider is that of rooted
trees

Common Ancestor
17
Types of trees

Unrooted tree represents phylogeny without the
root node

Depending on the model, data from current day
species does not distinguish between different
placements of the root. In this example there
are seven possible ways to place a root.
18
Rooted versus unrooted trees
Tree c
b
a
c
Represents the three rooted trees
Slide by Tal Pupko
19
Positioning Roots in Unrooted Trees

We can estimate the position of the root by
introducing an outgroup
a set of species that are definitely distant from
all the species of interest

Proposed root
Falcon
Aardvark
Bison
Chimp
Dog
Elephant
20
Type of Data

Distance-based
Input is a matrix of distances between species
Can be fraction of residue they disagree on, or
alignment score between them, or
Character-based
Examine each character (e.g., residue) separately

21
Three Methods of Tree Construction

Distance- A tree that recursively combines two
nodes of the smallest distance.
Parsimony A tree with a total minimum number of
character changes between nodes.
Maximum likelihood - Finding the best Bayesian
network of a tree shape. The method of choice
nowadays. Most known and useful software called
phylip uses this method. http//evolution.genetics
.washington.edu/phylip.html

22
Distance-Based (1st type Method)

Input distance matrix between species
Outline
Cluster species together
Initially clusters are singletons
At each iteration combine two closest clusters
to get a new one

23
UPGMA Clustering

Let Ci and Cj be clusters, define distance
between them to be
When we combine two cluster, Ci and Cj, to form a
new cluster Ck, then
Define a node K and place its daughter nodes at
depth d(Ci,Cj)/2

24
Example
UPGMA construction on five objects. The length of
an edge its (vertical) height.
9
8
0.5d(7,8)
6
7
0.5d(2,3)
2
3
4
5
1
25
Molecular clock
This phylogenetic tree has all leaves in the same
level. When this property holds, the
phylogenetic tree is said to satisfy a molecular
clock. Namely, the time from a speciation event
to the formation of current species is identical
for all paths (wrong assumption in reality).
26
Molecular Clock
UPGMA constructs trees that satisfy a molecular
clock, even if the true tree does not satisfy a
molecular clock.
UPGMA
27
Restrictive Correctness of UPGMA
Proposition If the distance function is derived
by adding edge distances in a tree T with a
molecular clock, then UPGMA will reconstruct T.
28
Additivity

Molecular clock defines additive distances,
namely, distances between objects can be realized
by a tree

29
Basic property of Additivity

Suppose input distances are additive
For any three leaves
Thus

m
c
b
j
a
k
i
30
Constructing additive treesThe neighbor finding
problem

Can we use this fact to construct trees assuming
only additivity (but not a molecular clock)?

Yes. The formula shows that if we knew that i
and j are neighboring leaves, then we can
construct their parent node k and compute the
distances of k to all other leaves m. We remove
nodes i,j and add k.
31
Neighbor Finding

How can we find from distances alone that a pair
of nodes i,j are neighboring leaves?
Closest nodes arent necessarily neighbors.

Next we show one way to find neighbors from
additive distances.
32
Neighbor Finding
Theorem (SaitouNei) Assume all edge weights are
positive. If D(i,j) is minimal (among all pairs
of leaves), then i and j are neighboring leaves
in the tree.
33
Neighbor Joining Algorithm

Set L to contain all leaves
Iteration
Choose i,j such that D(i,j) is minimal
Create new node k, and set
remove i,j from L, and add k
Terminatewhen L 2, connect two remaining
nodes

34
Neighbor Finding
Notations used in the proof p(i,j) the path
from vertex i to vertex j P(D,C) (e1,e2,e3)
(D,E,F,C)
For a vertex i, and an edge e(i,j) Ni(e)
k e is on p(i,k). ND(e1) 3, ND(e2) 2,
ND(e3) 1 NC(e1) 1
E
F
35
Neighbor Finding
Notation For e(i,m), we denote d(i,m) by d(e).
Rest of T
k
l
i
j
36
Neighbor Finding
Proof of Theorem Assume by contradiction that
D(i,j) is minimal for i,j which are not
neighboring leaves. Let (i,l,...,k,j) be the path
from i to j. Let T1 and T2 be the subtrees
rooted at l and k. Let T denote the number
of leaves in T.
37
Neighbor Finding
Case 1 i or j has a neighboring leaf. WLOG j and
m are such leaves. A. D(i,j) - D(m,j)(L-2)(d(i,j)
- d(j,m) ) (rirj) rm rj
Definition (L-2)(d(i,k)-d(k,m) )rm-ri

Figure
B. rm-ri (L-2)(d(k,m)-d(i,l)) (4-L)d(k,l)
LemmaFigure (since for each
edge e?P(k,l), Nm(e)2 and Ni(e) ? L-2, so
Nm(e)- Ni(e ) 4-L )
Substituting B in A D(i,j) - D(m,j)
(L-2)(d(i,k)-d(i,l)) (4-L)d(k,l) 2d(k,l) gt 0,
contradicting the minimality assumption.
38
Neighbor Finding
Case 2 Not case 1. Then both T1 and T2 contain 2
neighboring leaves. We show that if D(i,j) is
minimal, then we must have both T1 gt T2 and
T2 gt T1 - which is a contradiction, hence
D(i,j) is not minimal.
We prove that T1 gt T2 by assuming that T1
T2 and reaching a contradiction. The proof
that T2 gt T1 is similar. Let n,m be
neighboring leaves in T1.
39
Neighbor Finding
A. 0 D(m,n) - D(i,j) (L-2)(d(m,n) - d(i,j) )
(rirj) (rmrn)
B. rj-rmlt (L-2)(d(j,k) d(m,p))
(T1-T2)d(k,p) (Because Nj(e)- Nm(e ) lt
T1-T2).
C. ri-rn lt (L-2)(d(i,k) d(n,p))
(T1-T2)d(l,p) Adding B and C, noting that
d(l,p)gtd(k,p) and using the assumption T1 -
T2 0 D. (rirj) (rmrn) lt
(L-2)(d(i,j)-d(n,m)) 2(T1-T2)d(k,p)
Substituting D in the right hand side of A 0
D(m,n) - D(i,j)lt 2(T1-T2)d(k,p), hence
T1-T2 gt 0, a contradiction.

Write a Comment

User Comments (0)