ALGORITHMS FOR QUARTET - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

ALGORITHMS FOR QUARTET

Description:

Ultrametric Tree: We label each internal node with a number. If along any root to leaf path, the labels of the internal nodes on the path is ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 37
Provided by: gan98
Category:

less

Transcript and Presenter's Notes

Title: ALGORITHMS FOR QUARTET


1
ALGORITHMS FOR QUARTET BASED PHYLOGENY
CONSTRUCTION PROBLEMS
Gang Wu Department of Computing
Science University of Alberta
2
Common Phylogenetic Tree Terminology
Phylogeny pattern of historical relationships
among species . Tree mathematical structure used
to depict the evolutionary history of a group of
species
Leaf Nodes
Branches or Edges
A
Represent the species (genes, populations,
etc.) used to infer the phylogeny
internal
B
C
D
ROOT of the Tree (common ancestor of all species)
E
Internal Nodes (represent hypothetical ancestors
of the species)
3
Phylogeny Example for Mammal
4
Rooted and Unrooted tree
5
General Process of Phylogeny Construction
Input A set of (DNA or protein) sequences for
the species
Output An evolutionary tree(phylogeny) whose
leaf nodes are the input species
Methods Maximum Parsimony (MP), Maximum
Likelyhood (ML),etc
Not suitable for large trees (over 20 species).
Current software all use heuristics to speed up
the computational time
6
Quartet Based Phylogeny Construction
  • There is only one unrooted tree for one, two or
    three species.
  • There are three possible unrooted trees for four
    species (A, B, C, D)
  • Quartets are smallest informative unrooted trees
  • MP or ML can be solved exactly on quartets

ABCD
ACBD
ADBC
7
Process of Quartet Based Phylogeny Construction
8
Definitions
A quartet abcd is consistent with a phylogeny T,
or a phylogeny T satisfies a quartet abcd , if
and only if a,b,c,d are all leaves of T and the
path from a to b does not share any nodes with
the path from c to d.
9
aecd abcd abce abcf abde abdf abef
afcd acef adef becd bfcd bcef
bdef cdef
Phylogeny T
Quartet Set Q
Phylogeny T
quartet aecd is consistent with T, or T
satisfies aecd
bdcf ?
10
Definitions
Given a set of quartets Q on a set S of species,
Q is compatible, if and only if there is a
phylogeny on S which satisfies all the quartets
in Q.
A set Q of quartet topologies is complete if Q
contains a quartet topology for each four labels
over label set S.
11
aecd abcd abce abcf abde abdf abef
afcd acef adef becd bfcd bcef
bdef cdef
Quartet Set Q
Phylogeny T
  • The quartet set Q is complete ( total 15 quartets
    on 6 species).
  • The quartet set Q is compatible(T satisfies all
    the quartets).

12
Problem Descriptions
Quartet Compatibility Problem(QCP) Input A
set Q of quartets on S Question Is Q
compatible? Equivalently, is there a phylogeny T
on S such that all quartets in Q are satisfied?
In practice, the given quartet set Q usually
contains errors and thus is incompatible.
Maximum Quartet Consistency Problem (MQC) Input
A set Q of quartets on S. Goal Find a phylogeny
T on S such that the number of consistent
quartets in Q is maximized.
Minimum Quartet Inconsistency Problem
(MQI) Input A set Q of quartets on S. Goal
Find a phylogeny T on S such that the number of
inconsistent quartets in Q is minimized.
13
aced abcd abce abcf abde abdf abef
afcd acef adef becd bfcd bcef
bdef cdef
Input Quartet Set Q
Quartet Compatibility Problem(QCP)?
No
MQC or MQI ?
Only aced is not satisfied
14
Known Results
Quartet Compatibility Problem(QCP) can be solved
in polynomial time if the given quartet set Q is
complete. But it is NP-Complete if Q is
incomplete.
Maximum Quartet Consistency Problem (MQC) and
Minimum Quartet Inconsistency Problem (MQI) are
NP-Complete even if Q is complete.
Exact algorithms "Guarantee" to find the
optimal or "best" tree. Heuristic algorithms
Approximate or quick-and-dirty methods that
attempt to find the optimal tree, but cannot
guarantee to do so.
15
Algorithms
  • Exact Algorithms
  • Dynamic programming (Solving MQC problem)
  • Answer Set programming (Solving MQC problem)
  • Fixed Parameter Algorithm (Solving MQI problem)
  • Lookahead Branch and Bound Algorithm (Solving MQI
    problem)
  • Approximation Algorithms
  • Hyper Cleaning/Local Edge Cleaning (approximate
    complete MQI problem with O(n2) ratio)
  • Sibling Merging (approximate complete MQI problem
    with O(n) ratio ?)

16
Dynamic Programming
Label Set S 1,,n Quartet Set Q
  • Quartet a,bc,d is satisfied by a partition
    (S1, S2), if
  • and one
    of the following holds
  • and at least one of c,d is in
    S2
  • and at least one of c,d is in
    S1

S11,2,5 S23,4 1,23,4, 1,63,4
are consistent with T
17
Dynamic Programming
1,,n,1,2,,n-1,n,,1,,n is an
ordering of all 2n-1 non-empty subsets of
1,..,n, which every set appears after all its
subsets.
Go over all these subsets in order
For every subset S if S lt2 then
OptimalScore(S)0 else OptimalScore(S)
max Top(S1, S2)OptimalScore(S1)OptimalScore(S
2), for every possible partition of
S. Function Top(S1, S2) calculates the number of
quartets satisfied by (S1, S2 ).
Complexity O(n43n)
18
Ultrametric Tree and Matrix
Ultrametric Tree We label each internal node
with a number. If along any root to leaf path,
the labels of the internal nodes on the path is
strictly decreasing, then the tree with its
labels is called ultrametric tree.
Ultrametric Matrix Each entry value is the label
of least common ancestor of the two leaf nodes.
It is
  • Symmetric, M(i, i) 0 and
  • For every triplet (i, j, k) there are two equal
    values among
  • M(i, j), M(j, k), and M(i, k) and they are
    greater than the third value.

e.g. i1, j3, k4, M(1, 3)M(3, 4)gt M(1, 4)
19
Theorem 1 A quartet abcd is consistent with a
phylogeny T if and only if any ultrametric
labeling scheme M of T satisfies min M(a, c),
M(b, d) gt minM(a, b), M(c, d).
20
Theorem 1 A quartet abcd is consistent with a
phylogeny T if and only if any ultrametric
labeling scheme M of T satisfies min M(a, c),
M(b, d) gt minM(a, b), M(c, d).
s1 s5 s2 s3 is consistent with the tree and
its corresponding matrix min M(1, 2), M(5,
3)4 gt minM(1, 5), M(2, 3)1. Condition
satisfied!
21
Theorem 1 A quartet abcd is consistent with a
phylogeny T if and only if any ultrametric
labeling scheme M of T satisfies min M(a, c),
M(b, d) gt minM(a, b), M(c, d).
s1 s4 s2 s5 is NOT consistent with the tree and
its corresponding matrix min M(1, 2), M(4, 5)
minM(1, 4), M(2, 5)3. Condition not
satisfied!
22
Theorem 2 Given a set Q of quartets on a set of
species S and an ultrametric phylogeny T on S, T
satisfies the maximum number of quartets in Q if
and only if the corresponding ultrametric matrix
M on S satisfies the maximum number of quartets
in Q.
We transfer the original MQC problem into an
ultrametric matrix searching problem
23
(No Transcript)
24
Formulation in Answer Set Programming
Domain
1m(1, 2, 1),m(1, 2, 2),m(1, 2, 3),m(1, 2,
4),m(1, 2, 5)1 matrix entry (1,2) takes exactly
one value in the domain 1,5
Ultrametric Constraints
for three matrix values, m(i,j), m(j,k) and
m(i,k), two of them are equal and greater than
the third one
Quartet Constraints
if minm(i,k),m(j,l)gtminm(i,j),m(k,l) then
quartet i,jk,l is satisfied
Objective
maximize q(i,j,k,l)
25
Optimizations
26
Optimizations
For a pair of labels (a, b) and a quartet
topology q involving both of them and two other
labels in S, q conflicts (a, b) if q is in the
form of a, b, . For example a,eb,c
conflicts (a,b)
Let q1 and q2 be two quartets on a, c, d, e and
b, c, d, e, respectively. If ignoring the
difference between a and b gives rise to
identical quartet topologies, then q1 and q2 are
exchangeable on (a, b) Or otherwise, they are
nonexchangeable on (a, b). For example, a,cd,e
and b,cd,e are exchangeable on (a,b) But
a,dc,e and b,cd,e are nonexchangeable on
(a,b)
3. Reducing the Number of Species by finding
siblings
Theorem Let Q be a complete set Q of quartet
topologies on a set S of n taxa. For a pair of
taxa a and b, let p1 be the number of quartets
conflicting (a, b) and p2 be the number of
nonexchangeable pairs on (a, b). If p1 p2 lt
(n-3)/ 2, then a and b must siblings in the
optimal phylogeny.
27
Fixed Parameter Algorithm
Local conflict incompatible quartet set with 3
quartets and 5 labels. For example, abcd,
acbe and acde.
Theorem Given a complete set of quartets Q over
a label set S and some label e in S, Q is
compatible if and only if there exists no local
conflict whose label set includes e.
  • Given a local conflict abcd, acbe and
    acde, there are four ways to resolve it.
  • change abcd to acbd
  • change abcd to adbc
  • abcd is fixed, change acbe to abce
  • abcd is fixed, change acde to aecd.

Idea try to resolve all the local conflicts by
changing less than k number of quartets.
28
Fixed Parameter Algorithm
Method Branch and Bound
  • At every node in the search tree,
  • Build a list of local conflicts
  • If k quartets have been changed but the conflict
    list is nonempty,kill the node and return to the
    parent node
  • If the conflict list is empty, update bound k,
    update the best solution, and return to the
    parent node
  • Randomly select an local conflict and try to
    resolve it in four ways
  • Continue on search at the four different branches.

Complexity O(4knn4) computation and O(kn4)
memory.
29
Lookahead Branch and Bound
Theorem 1 Let Q be a complete set Q of quartet
topologies on a set S of n taxa. For a pair of
taxa a and b, let p1 be the number of quartets
conflicting (a, b) and p2 be the number of
nonexchangeable pairs on (a, b). If p1 p2 lt
(n-3)/ 2, then a and b must siblings in the
optimal phylogeny.
Theorem 2 For a quartet q in Q, if there are
more than 3k distinct local conflicts that
contain q, then q must be changed in the optimal
solution.
Theorem 3 For a set of 5 taxa a, b, c, d, e,
if a, bc, d and a, bc, e are fixed, then
a, bd, e must be fixed too if a, bc, d
and a, cd, e are fixed, then a, bc, e, a,
bd, e, and b, cd, e must be fixed too.
30
Lookahead Branch and Bound
Clever branch At each search tree node, find the
most possible error quartet , then branch on that
quartet.
The difference between the size of the new
conflict list and the size of the conflict list
before quartet changing is defined to be the
contribution of this change.
31
Lookahead Branch and Bound
1. Use Theorem 1 to determine fixed quartets 2.
Use Theorem 3 to deduce as many fixed quartets as
possible 3. Use Theorem 2 to determine
need-to-be-changed quartets 4. Build conflict
list and partition it into two parts 4.1. For
need-to-be-changed quartet, calculate its
contribution 4.2. Pick the need-to-be-changed
quartet achieving the largest contribution 4.3.
If there is no need-to-be-changed quartet,
then 4.3.1. For every way of resolving a local
conflict, calculate its contribution 4.3.2. Pick
the resolvement way achieving the largest
contribution
32
Hyper Cleaning
Bipartition or Edge (X, Y) where X gt0, Y
gt0, X Y S and X ? Y Ø Quartet Set across
(X, Y) Q(X, Y) x,xy,y x,x are in X
and y, y are in Y Number of quartet errors
across edge (X, Y) Q(X,Y)-Q
Label Set S 1,2,3,4,5,6
Bipartition(Edge) (1,2,5,6, 3,4)
Q(X, Y) 1,23,4, 1,53,4, 2,53,4,.
33
Hyper Cleaning
Find all the bipartitions with the
property Best(Q,m)(X,Y) Q(X,Y)-Q lt
(X-1)(Y-1)m/2. m can be 0,1,. Then
recover the phylogeny based on these
partitions. Complexity O(n5 f(m)n7 f(2m)),
where f(m)4m2(12m)4m Local Edge Cleaning is a
special case of Hyper Cleaning, where m1 .
34
Sibling Merging
Add weight to quartet set Q
Assign a weight 1 to every given quartet in Q,
and add to Q all the other possible quartet
topologies each with a weight 0. Q contains
exactly 3 quartet topologies for every 4-taxa
set, among which at most 1 has a weight 1 and all
the others have a weight 0. For each quartet
a,bc,d in Q, we transform it to a,bc,d1,
a,cb,d0, a,db,c0
MQI problem is then transformed to finding a
phylogeny minimizing the total weight of
inconsistent quartets
35
Sibling Merging
For a pair of labels (a, b) and a quartet
topology q involving both of them and two other
labels in S, q conflicts (a, b) if q is in the
form of a, b, . For example a,eb,c
conflicts (a,b)
For every pair (a, b), we denote the total weight
of quartets in Q conflicting (a, b) as wc(a,b).
36
Sibling Merging
  • At every iteration,
  • A pair (a, b) whose wc(a,b) reaches the minimum
    is identified, and taxa a and b are merged into a
    super-taxon a.
  • quartets involving both of a and b are removed
    from Q,
  • quartets involving none of them remain unchanged,
  • quartets involving exactly one of them are
    revised by substituting the involved taxon with
    the super-taxon a and subsequently two identical
    quartets are merged into one by adding up their
    weights.

The algorithm iterates the above process until
there are exactly 3 taxa left. It then uses the
tree on 3 taxa as the starting point to recover a
global phylogeny in a straightforward way, which
is inverse to the merging process.
Write a Comment
User Comments (0)
About PowerShow.com