Constructing Phylogenies from Quartets: Elucidation of Eutherian Superordinal Relationships - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Constructing Phylogenies from Quartets: Elucidation of Eutherian Superordinal Relationships

Description:

Computation Theory Lab, CSIE, CCU, Taiwan. 3. Evolutionary trees ... O(nn2) possible evolutionary trees. 11/11/09. Computation Theory Lab, CSIE, CCU, Taiwan ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 40

Provided by: josephchua

Category:

more less

Transcript and Presenter's Notes

Title: Constructing Phylogenies from Quartets: Elucidation of Eutherian Superordinal Relationships

1
Constructing Phylogenies from Quartets
Elucidation of Eutherian Superordinal
Relationships
A. Ben-Dor, B. Chor, D. Graur, R. Ophir, D.
Pelleg Journal of Computational Biology, Vol. 5,
1998, pp. 377?390.

Speaker Chuang-Chieh Lin
National Chung Cheng University

2
Outline

Introduction and preliminaries
Problem description
The dynamic programming algorithm
The space complexity and the time complexity

3
Evolutionary trees

Let S be a set of taxa and S n.
An evolutionary tree T on S is an unrooted,
leaf-labeled tree such that the leaves of T are
bijectively labeled by the taxa in S, and each
internal node of T has degree 3.

4
Evolutionary trees

For 4 taxa a, b, c, d, we have 3 possible
topologies

a
c
a
b
a
c
b
d
d
c
d
b
adbc
abcd
acbd
5
Evolutionary trees (contd.)

For 5 taxa a, b, c, d, e, how many possible
evolutionary trees can we derive?
The answer is 5 ? 3 15.

a
c
There are 5 possible positions for e to be
inserted.
b
d
6
Evolutionary trees (contd.)

For n taxa, how many possible evolutionary trees
can we derive?
The answer is (2n ? 5)!!
This observation can be verified by induction on
n.
For an odd positive integer n, it is defined that
n!! n?? (n ? 2) ? (n ? 4) ? ? 3 ? 1.
If n 15, (2n ? 5)!! is approximately 8 ? 1012.

7
n!!

Let us analyze n!! in another way.
For a nonnegative integer m ? 0, let n 2m 1.
Then we have

8
(2n-5)!! O(nn-2)

For n taxa, we have (2n ? 5)!! O((n ? 3)n?2)
O(nn-2) possible evolutionary trees.

9
Quartet topologies

A set of four taxa is called a quartet.
Given an evolutionary tree T and a quartet a, b,
c, d, the quartet topology of a, b, c, d
induced by T is obtained by the following
procedure.

10
Step 1 All leaves but a, b, c and d are deleted
from the tree. Edges adjacent to these leaves are
also deleted.
11
Step 2 Internal nodes with degree two are
contracted and deleted, so their two adjacent
nodes become connected. This process is repeated
until no internal nodes of degree two are left.
For simplicity, we denote the quartet topology
above by bcad, which is a kind of bipartition
of a, b, c, d.
12

For simplicity, we denote the quartet topology
above by bcad, which is a kind of bipartition
of a, b, c, d.
Note that each input quartet topology t is
accom-panied by a positive weight Ct .

13
Problem description

Input
A list of weighted quartet topologies over n
taxa.
Output
A binary tree with n leaves such that the total
weight of the satisfied quartet topologies is
maximized.
This problem was shown to be NP-hard.

14
Quartet method

The fact that small phylogenies are easier to
infer than large ones leads to another approach
the quartet method.
First, consider subsets of 4 taxa, one at a time,
and infer the phylogenies (i.e., quartet
topologies) for these subsets.
The next stage combines the multiple quartet
topologies into a single phylogeny.

Given a set of quartet topologies Q, how to
determine whether an evolutionary tree T is
good or bad?

Given an evolutionary tree T and a set of quartet
topologies Q.
We say that T satisfies a quartet topology tq of
a quartet q if the induced quartet topology of q
by T is exactly tq.

For example, T satisfies abdg, cefg,
adbc, etc.
17
Score

We denote by S, where S ? Q, the set of quartet
topologies that are satisfied by T, and let U Q
? S.
We define the score of the evolutionary tree T as
follows.

18
Score (contd.)

The latter term was chosen because there are
three possible topologies for every quartet.
Therefore this term equals the expected increase.
In a variant of the same method, the latter term
is zeroed, so the quartet topologies which are
not satisfied by T do not contribute to the score.

19
Score (contd.)

It can be easily derived that is
an upper bound on the score of any evolutionary
tree T.

20
Preliminaries for the dynamic programming
algorithm

For technical reasons, the following discussion
deals with rooted evolutionary trees.
For a node v, its left and right children are
denoted by vl and vr respectively.

21
Preliminaries for the dynamic programming
algorithm (contd.)

Given a rooted evolutionary tree T and a node v
in it we denote by T(v) the subtree of T rooted
at v.

u
w
v
T(v)
22
Preliminaries for the dynamic programming
algorithm (contd.)

We denote by L(T) the set of leaves (i.e., taxa)
of the tree T.

u
w
v
L(Tv)

23
Preliminaries for the dynamic programming
algorithm (contd.)

For a pair of nodes u, v, the least common
ancestor of u and v, lca(u, v), is defined as an
ancestor p of both u and v such that no node in
T(p) other than p is an ancestor of both u and v.

24
Preliminaries for the dynamic programming
algorithm (contd.)
The lca of a and c.
a
c
a
c
b
d
b
d
25
Preliminaries for the dynamic programming
algorithm (contd.)

Definition Given a quartet topology t abcd
and an evolutionary tree T, the quartet least
common ancestor of t, qlca(t) is defined as a
node p that is the lca of two or more pairs of
elements from a, b, c, d, and no node in T(p)
except p is the lca of two or more pairs of
elements from a, b, c, d.

26
Preliminaries for the dynamic programming
algorithm (contd.)
The qlca for abcd.
a
a
b
c
b
c
d
d
27
Another equivalent definition for the quartet
least common ancestor

Definition Given a quartet topology t abcd
and an evolutionary tree T, the qlca of t is a
node p such that
L(T(p))??a, b, c, d ? 3.
For any child s of p, L(T(s))??a, b, c, d ? 2.

28
Some observations

Every quartet topology t has a unique qlca(t).
Given a tree T and a quartet topology t, the
subtree rooted at qlca(t) determines whether t is
satisfied in the evolutionary tree T.
Let t abcd and v qlca(t). We look at vl ,
vr , T(vl) and T(vr).
At least one of these subtrees contains exactly
two taxa e, f from a, b, c, d.
Then t is satisfied iff the pair e, f is either
a, b or c, d.

29
Some observations (contd.)

Given a quartet topology t abcd and an
evolutionary tree T, let v qlca(t). Then T
satisfies t if and only if at least one of the
following holds
a, b ? L(T(s)).
c, d ? L(T(s)).
where s vl or s vr.

30
The algorithm

We denote by SATQ(T(v)) the set of quartet
topologies t ? Q such that t is satisfied by T,
and qlca(t) is a node in T(v).
Let TOPQ(T(v))?? SATQ(T(v)) be the set of quartet
topologies in Q that have v as their qlca and are
satisfied by T.

31
The algorithm (contd.)

For a set A ? Q of quartet topologies, let
denote the sum of their weights.
The score of the subtree T(v) (with respect to Q)
is defined as

32
The algorithm (contd.)

By the above equation, we have

33
The algorithm (contd.)

Let S be a set of three or more taxa.
Denote by opt_scoreQ(S) the maximum score with
respect to Q among all trees that have S as their
set of leaves.
We denote by opt_treeQ(S) a tree which attains
the maximum score.

34
The algorithm (contd.)

For every proper partition of S into two subsets
S1 and S2, let T(S1, S2) denote a tree whose left
subtree equals opt_treeQ(S1) and its right
subtree equals opt_treeQ(S2).
We then have

35
The algorithm (contd.)

This implies that
By employing the dynamic programming paradigm, we
can avoid wasteful repetitions.
To do this, we scan the subsets S?? 1 ,2 , n
by increasing size of S.

36
The algorithm (contd.)