Constructing Phylogenies from Quartets: Elucidation of Eutherian Superordinal Relationships - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Constructing Phylogenies from Quartets: Elucidation of Eutherian Superordinal Relationships

Description:

Computation Theory Lab, CSIE, CCU, Taiwan. 3. Evolutionary trees ... O(nn2) possible evolutionary trees. 11/11/09. Computation Theory Lab, CSIE, CCU, Taiwan ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Constructing Phylogenies from Quartets: Elucidation of Eutherian Superordinal Relationships


1
Constructing Phylogenies from Quartets
Elucidation of Eutherian Superordinal
Relationships
A. Ben-Dor, B. Chor, D. Graur, R. Ophir, D.
Pelleg Journal of Computational Biology, Vol. 5,
1998, pp. 377?390.
  • Speaker Chuang-Chieh Lin
  • National Chung Cheng University

2
Outline
  • Introduction and preliminaries
  • Problem description
  • The dynamic programming algorithm
  • The space complexity and the time complexity

3
Evolutionary trees
  • Let S be a set of taxa and S n.
  • An evolutionary tree T on S is an unrooted,
    leaf-labeled tree such that the leaves of T are
    bijectively labeled by the taxa in S, and each
    internal node of T has degree 3.

4
Evolutionary trees
  • For 4 taxa a, b, c, d, we have 3 possible
    topologies

a
c
a
b
a
c
b
d
d
c
d
b
adbc
abcd
acbd
5
Evolutionary trees (contd.)
  • For 5 taxa a, b, c, d, e, how many possible
    evolutionary trees can we derive?
  • The answer is 5 ? 3 15.

a
c
There are 5 possible positions for e to be
inserted.
b
d
6
Evolutionary trees (contd.)
  • For n taxa, how many possible evolutionary trees
    can we derive?
  • The answer is (2n ? 5)!!
  • This observation can be verified by induction on
    n.
  • For an odd positive integer n, it is defined that
    n!! n?? (n ? 2) ? (n ? 4) ? ? 3 ? 1.
  • If n 15, (2n ? 5)!! is approximately 8 ? 1012.

7
n!!
  • Let us analyze n!! in another way.
  • For a nonnegative integer m ? 0, let n 2m 1.
  • Then we have

8
(2n-5)!! O(nn-2)
  • For n taxa, we have (2n ? 5)!! O((n ? 3)n?2)
  • O(nn-2) possible evolutionary trees.

9
Quartet topologies
  • A set of four taxa is called a quartet.
  • Given an evolutionary tree T and a quartet a, b,
    c, d, the quartet topology of a, b, c, d
    induced by T is obtained by the following
    procedure.

10
Step 1 All leaves but a, b, c and d are deleted
from the tree. Edges adjacent to these leaves are
also deleted.
11
Step 2 Internal nodes with degree two are
contracted and deleted, so their two adjacent
nodes become connected. This process is repeated
until no internal nodes of degree two are left.
For simplicity, we denote the quartet topology
above by bcad, which is a kind of bipartition
of a, b, c, d.
12
  • For simplicity, we denote the quartet topology
    above by bcad, which is a kind of bipartition
    of a, b, c, d.
  • Note that each input quartet topology t is
    accom-panied by a positive weight Ct .

13
Problem description
  • Input
  • A list of weighted quartet topologies over n
    taxa.
  • Output
  • A binary tree with n leaves such that the total
    weight of the satisfied quartet topologies is
    maximized.
  • This problem was shown to be NP-hard.

14
Quartet method
  • The fact that small phylogenies are easier to
    infer than large ones leads to another approach
    the quartet method.
  • First, consider subsets of 4 taxa, one at a time,
    and infer the phylogenies (i.e., quartet
    topologies) for these subsets.
  • The next stage combines the multiple quartet
    topologies into a single phylogeny.

15
  • Given a set of quartet topologies Q, how to
    determine whether an evolutionary tree T is
    good or bad?

16
  • Given an evolutionary tree T and a set of quartet
    topologies Q.
  • We say that T satisfies a quartet topology tq of
    a quartet q if the induced quartet topology of q
    by T is exactly tq.

For example, T satisfies abdg, cefg,
adbc, etc.
17
Score
  • We denote by S, where S ? Q, the set of quartet
    topologies that are satisfied by T, and let U Q
    ? S.
  • We define the score of the evolutionary tree T as
    follows.

18
Score (contd.)
  • The latter term was chosen because there are
    three possible topologies for every quartet.
  • Therefore this term equals the expected increase.
  • In a variant of the same method, the latter term
    is zeroed, so the quartet topologies which are
    not satisfied by T do not contribute to the score.

19
Score (contd.)
  • It can be easily derived that is
    an upper bound on the score of any evolutionary
    tree T.

20
Preliminaries for the dynamic programming
algorithm
  • For technical reasons, the following discussion
    deals with rooted evolutionary trees.
  • For a node v, its left and right children are
    denoted by vl and vr respectively.

21
Preliminaries for the dynamic programming
algorithm (contd.)
  • Given a rooted evolutionary tree T and a node v
    in it we denote by T(v) the subtree of T rooted
    at v.

u
w
v
T(v)
22
Preliminaries for the dynamic programming
algorithm (contd.)
  • We denote by L(T) the set of leaves (i.e., taxa)
    of the tree T.

u
w
v
L(Tv)

23
Preliminaries for the dynamic programming
algorithm (contd.)
  • For a pair of nodes u, v, the least common
    ancestor of u and v, lca(u, v), is defined as an
    ancestor p of both u and v such that no node in
    T(p) other than p is an ancestor of both u and v.

24
Preliminaries for the dynamic programming
algorithm (contd.)
The lca of a and c.
a
c
a
c
b
d
b
d
25
Preliminaries for the dynamic programming
algorithm (contd.)
  • Definition Given a quartet topology t abcd
    and an evolutionary tree T, the quartet least
    common ancestor of t, qlca(t) is defined as a
    node p that is the lca of two or more pairs of
    elements from a, b, c, d, and no node in T(p)
    except p is the lca of two or more pairs of
    elements from a, b, c, d.

26
Preliminaries for the dynamic programming
algorithm (contd.)
The qlca for abcd.
a
a
b
c
b
c
d
d
27
Another equivalent definition for the quartet
least common ancestor
  • Definition Given a quartet topology t abcd
    and an evolutionary tree T, the qlca of t is a
    node p such that
  • L(T(p))??a, b, c, d ? 3.
  • For any child s of p, L(T(s))??a, b, c, d ? 2.

28
Some observations
  • Every quartet topology t has a unique qlca(t).
  • Given a tree T and a quartet topology t, the
    subtree rooted at qlca(t) determines whether t is
    satisfied in the evolutionary tree T.
  • Let t abcd and v qlca(t). We look at vl ,
    vr , T(vl) and T(vr).
  • At least one of these subtrees contains exactly
    two taxa e, f from a, b, c, d.
  • Then t is satisfied iff the pair e, f is either
    a, b or c, d.

29
Some observations (contd.)
  • Given a quartet topology t abcd and an
    evolutionary tree T, let v qlca(t). Then T
    satisfies t if and only if at least one of the
    following holds
  • a, b ? L(T(s)).
  • c, d ? L(T(s)).
  • where s vl or s vr.

30
The algorithm
  • We denote by SATQ(T(v)) the set of quartet
    topologies t ? Q such that t is satisfied by T,
    and qlca(t) is a node in T(v).
  • Let TOPQ(T(v))?? SATQ(T(v)) be the set of quartet
    topologies in Q that have v as their qlca and are
    satisfied by T.

31
The algorithm (contd.)
  • For a set A ? Q of quartet topologies, let
    denote the sum of their weights.
  • The score of the subtree T(v) (with respect to Q)
    is defined as

32
The algorithm (contd.)
  • By the above equation, we have

33
The algorithm (contd.)
  • Let S be a set of three or more taxa.
  • Denote by opt_scoreQ(S) the maximum score with
    respect to Q among all trees that have S as their
    set of leaves.
  • We denote by opt_treeQ(S) a tree which attains
    the maximum score.

34
The algorithm (contd.)
  • For every proper partition of S into two subsets
    S1 and S2, let T(S1, S2) denote a tree whose left
    subtree equals opt_treeQ(S1) and its right
    subtree equals opt_treeQ(S2).
  • We then have

35
The algorithm (contd.)
  • This implies that
  • By employing the dynamic programming paradigm, we
    can avoid wasteful repetitions.
  • To do this, we scan the subsets S?? 1 ,2 , n
    by increasing size of S.

36
The algorithm (contd.)
  • For simplicity, the details of implementing the
    dynamic programming algorithm are omitted.

37
The space complexity and the time complexity
  • The time complexity
  • The space complexity ?
  • ?(2n).

38
Thank you.
39
References
  • S92 M. Steel The complexity of reconstructing
    trees from qualitative characters and subtrees.
    Journal of Classification, 9 (1992), pp. 91-116.
Write a Comment
User Comments (0)
About PowerShow.com