A Lookahead BranchandBound - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

A Lookahead BranchandBound

Description:

University of Alberta. Edmonton, Canada. Outline. Introduction. Research Methods ... A quartet topology ab|cd is consistent with a phylogeny T, or a phylogeny T ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 27
Provided by: gan98
Category:

less

Transcript and Presenter's Notes

Title: A Lookahead BranchandBound


1
A Lookahead Branch-and-Bound Algorithm For
Solving MQC Problem
Gang Wu, Jia-huai You, and Guohui Lin Department
of Computing Science University of
Alberta Edmonton, Canada
2
Outline
  • Introduction
  • Research Methods
  • Computational Results and Conclusions

3
Common Phylogeny Terminology
Phylogeny pattern of historical relationships
among species (taxa). Tree mathematical
structure used to depict the evolutionary history
of a group of species
Leaf Nodes
Branches or Edges
A
Represent the taxa (genes, populations,
etc.) used to infer the phylogeny
internal
B
C
D
ROOT of the Tree (common ancestor of all taxa)
E
Internal Nodes (represent hypothetical ancestors
of the taxa)
4
General Process of Phylogeny Construction
Input A set of (DNA or protein) sequences for
the species
Output An evolutionary tree(phylogeny) whose
leaf nodes are the input species
Methods Maximum Parsimony (MP), Maximum
Likelyhood (ML),etc
Not suitable for large trees (over 20 species).
Current software all use heuristics to speed up
computations.
5
Quartet Based Phylogeny Construction
  • Only one unrooted tree for one, two or three
    species
  • Three possible unrooted resolved trees for four
    species (A, B, C, D)
  • Quartets are the smallest informative unrooted
    trees
  • MP or ML can be solved exactly on quartets

ABCD
ACBD
ADBC
6
Process of Quartet Based Phylogeny Construction
7
Definitions
A quartet topology abcd is consistent with a
phylogeny T, or a phylogeny T satisfies a
quartet topology abcd , iff a,b,c,d are all
leaves of T and the path from a to b does not
share any nodes with the path from c to d.
8
b
a
aecd abcd abce abcf abde abdf abef
afcd acef adef becd bfcd bcef
bdef cdef
c
d
f
e
Quartet topologies set Q
Phylogeny T
quartet topology aecd is consistent with T, or T
satisfies aecd
9
Definitions
Given a set of quartet topologies Q on a set S of
taxa, Q is compatible, iff there is a phylogeny
on S which satisfies all the quartet topologies
in Q.
A set Q of quartet topologies is complete iff Q
contains a quartet topology for each four taxa
over taxa set S.
10
aecd abcd abce abcf abde abdf abef
afcd acef adef becd bfcd bcef
bdef cdef
Quartet topologies set Q
Phylogeny T
  • Q is compatible
  • Q is complete

11
Problem Descriptions
Quartet Compatibility Problem(QCP) Input A set
Q of quartet topologies on S Question Is Q
compatible? Equivalently, is there a phylogeny T
on S such that all quartet topologies in Q are
satisfied?
12
aced abcd abce abcf abde abdf abef
afcd acef adef becd bfcd bcef
bdef cdef
Input quartet topologies set Q
No
Quartet Compatibility Problem(QCP)?
MQC or MQI ?
13
aced abcd abce abcf abde abdf abef
afcd acef adef becd bfcd bcef
bdef cdef
Input quartet topologies set Q
No
Quartet Compatibility Problem(QCP)?
MQC or MQI ?
Only aced is not satisfied The satisfied
quartet topology is aecd
14
Known Results
Quartet Compatibility Problem(QCP) can be solved
in polynomial time if the given quartet
topologies set Q is complete. But it is
NP-Complete if Q is incomplete.
Maximum Quartet Consistency Problem (MQC) and
Minimum Quartet Inconsistency Problem (MQI) are
NP-Complete even if Q is complete.
Exact algorithms "Guarantee" to find the
optimal or "best" tree. Heuristic algorithms
Approximate or quick-and-dirty methods that
attempt to find the optimal tree, but cannot
guarantee to do so.
15
Known Results
Lots of Heuristics. Best known approximation
algorithm is Hypercleaning, with approximation
ratio of for MQI, where n is number of
taxa.
Dynamic programming can solve MQC problem with 20
taxa in 6 days in a 300MHz computer. Fixed
Parameter Algorithm can solve MQI problem with 50
taxa when k 100 in 40 minutes in a 750MHz
computer.
16
Theorems
Local conflict Incompatible quartet topologies
set with 3 quartet topologies and 5 taxa. For
example, abcd, acbe and acde. Theorem 1.
Given a complete set of quartet topologies Q over
a set of taxa S and some taxon e in S, Q is
compatible iff there exists no local conflict
whose taxa set includes e. Idea Construct a
local conflict list involving a taxon e, and then
try to resolve all the local conflicts in the
list by changing less than k quartet
topologies. Method Branch and
Bound Complexity O(4knn4) computation and
O(kn4) memory.
17
Theorems
Theorem 2. m number of local conflicts
involving e. We need change at least
quartet topologies to resolve all the local
conflicts.
This theorem can be used as a bound factor to cut
a node during the Branch-and-Bound search.
18
Theorems
Theorem 3. For a quartet topology q in Q, if
there are more than 3k distinct local conflicts
that contain q, then q must be changed in the
optimal solution.
This theorem can be used as a branch factor used
to choose which quartet topology we should choose
to change
19
Theorems
  • Theorem 4. For a bipartition(edge) (X,Y) of S
    where Xl,
  • p1 the number of quartet errors in Q across
    (X,Y),
  • p2 the number of nonexchangeable l-subsets on X,
  • p3 the number of nonexchangeable (n-l)-subsets
    on Y.
  • If 2p1(l-1)p2(n-l-1)p3 lt (l-1)(n-l-1), then
    bipartition (X,Y) must be in the optimal
    phylogeny.

Quartet inference rules abcd, abce
abde abce, acde abce, abde, bcde
They are used to construct a need-to-be-fixed
quartet list, i.e., all the quartet topologies in
the list should not be changed during search.
20
Lookahead
Contribution of changing a quartet topology The
difference between the size of the local conflict
lists before and after a quartet topology
changing.
At each search node, we first have a lookahead
mechanism to test the contribution of each
possible branch and choose the one with maximum
contribution to continue searching.
21
Outline of Algorithm
  • At every node in the search tree
  • Use Theorem 2 to decide to cut the node or not
    (test
    k1 is the number of changed quartet
    topologies so far)
  • Use Theorem 3 to determine need-to-be-changed
    quartet (If there are 3(k-k1) distinct local
    conflicts involving q, then q must be changed)
  • Use Theorem 4 to determine need-to-be-fixed
    quartets (find optimal bipartitions and all the
    quartet topologies consistent with the optimal
    bipartitions are fixed)
  • Use the quartet inference rules on the quartet
    topologies generated in step 3

22
Outline of Algorithm-Contd
5. Build a local conflict list and partition it
into two parts IF there are
need-to-be-changed quartet topologies
Pick the need-to-be-changed quartet topology
achieving the largest contribution to resolve
ELSE Pick the resolvement way achieving
the largest contribution
23
Experimental Results
Comparison between the GN algorithm and our
algorithm on Finding the first solution whose
quartet errors are less than k
24
Experimental Results
Comparison among Hypercleaning, LBnB-1st, and
LBnB-Opt. Hypercleaning is a heuristic algorithm
to MQC problem LBnB-1st will stop when the first
solution is found LBnB-Opt will search all
possible solutions and output the optimal one
25
Conclusions
  • Our algorithm can be regarded as an improvement
    over the GN algorithm.
  • It outperforms other exact algorithms
    significantly in both finding the first solution
    and the optimal solution.
  • In some instances, our algorithm has competitive
    running times to the heuristic hypercleaning
    method.

26
Acknowledgement
  • This research work was supported by
  • CFI
  • NSERC
  • NNSF Grant 60373012

Thanks
Write a Comment
User Comments (0)
About PowerShow.com