Title: A Lookahead BranchandBound
 1A Lookahead Branch-and-Bound Algorithm For 
Solving MQC Problem
Gang Wu, Jia-huai You, and Guohui Lin Department 
of Computing Science University of 
Alberta Edmonton, Canada 
 2Outline
- Introduction 
- Research Methods 
- Computational Results and Conclusions 
3Common Phylogeny Terminology
Phylogeny  pattern of historical relationships 
among species (taxa). Tree mathematical 
structure used to depict the evolutionary history 
of a group of species
Leaf Nodes 
Branches or Edges
A
Represent the taxa (genes, populations, 
etc.) used to infer the phylogeny 
internal
B
C
D
ROOT of the Tree (common ancestor of all taxa)
E
Internal Nodes (represent hypothetical ancestors 
of the taxa) 
 4General Process of Phylogeny Construction
Input  A set of (DNA or protein) sequences for 
the species 
Output  An evolutionary tree(phylogeny) whose 
leaf nodes are the input species
Methods  Maximum Parsimony (MP), Maximum 
Likelyhood (ML),etc
Not suitable for large trees (over 20 species). 
Current software all use heuristics to speed up 
computations. 
 5Quartet Based Phylogeny Construction
- Only one unrooted tree for one, two or three 
 species
- Three possible unrooted resolved trees for four 
 species (A, B, C, D)
- Quartets are the smallest informative unrooted 
 trees
- MP or ML can be solved exactly on quartets
ABCD
ACBD
ADBC 
 6Process of Quartet Based Phylogeny Construction 
 7Definitions
A quartet topology abcd is consistent with a 
phylogeny T, or a phylogeny T satisfies a 
quartet topology abcd , iff a,b,c,d are all 
leaves of T and the path from a to b does not 
share any nodes with the path from c to d. 
 8b
a
aecd abcd abce abcf abde abdf abef 
 afcd acef adef becd bfcd bcef 
bdef cdef
c
d
f
e
Quartet topologies set Q
Phylogeny T
quartet topology aecd is consistent with T, or T 
satisfies aecd  
 9Definitions
Given a set of quartet topologies Q on a set S of 
taxa, Q is compatible, iff there is a phylogeny 
on S which satisfies all the quartet topologies 
in Q.
A set Q of quartet topologies is complete iff Q 
contains a quartet topology for each four taxa 
over taxa set S. 
 10aecd abcd abce abcf abde abdf abef 
 afcd acef adef becd bfcd bcef 
bdef cdef
Quartet topologies set Q
Phylogeny T
- Q is compatible 
- Q is complete 
11Problem Descriptions
Quartet Compatibility Problem(QCP) Input A set 
Q of quartet topologies on S Question Is Q 
compatible? Equivalently, is there a phylogeny T 
on S such that all quartet topologies in Q are 
satisfied? 
 12aced abcd abce abcf abde abdf abef 
 afcd acef adef becd bfcd bcef 
bdef cdef
Input quartet topologies set Q 
No
 Quartet Compatibility Problem(QCP)? 
 MQC or MQI ?  
 13aced abcd abce abcf abde abdf abef 
 afcd acef adef becd bfcd bcef 
bdef cdef
Input quartet topologies set Q 
No
 Quartet Compatibility Problem(QCP)? 
 MQC or MQI ? 
Only aced is not satisfied The satisfied 
quartet topology is aecd 
 14Known Results
Quartet Compatibility Problem(QCP) can be solved 
in polynomial time if the given quartet 
topologies set Q is complete. But it is 
NP-Complete if Q is incomplete. 
Maximum Quartet Consistency Problem (MQC) and 
Minimum Quartet Inconsistency Problem (MQI) are 
NP-Complete even if Q is complete.
Exact algorithms "Guarantee" to find the 
optimal or "best" tree. Heuristic algorithms 
Approximate or quick-and-dirty methods that 
attempt to find the optimal tree, but cannot 
guarantee to do so.  
 15Known Results
Lots of Heuristics. Best known approximation 
algorithm is Hypercleaning, with approximation 
ratio of for MQI, where n is number of 
taxa.
Dynamic programming can solve MQC problem with 20 
taxa in 6 days in a 300MHz computer. Fixed 
Parameter Algorithm can solve MQI problem with 50 
taxa when k 100 in 40 minutes in a 750MHz 
computer. 
 16Theorems
Local conflict Incompatible quartet topologies 
set with 3 quartet topologies and 5 taxa. For 
example, abcd, acbe and acde. Theorem 1. 
Given a complete set of quartet topologies Q over 
a set of taxa S and some taxon e in S, Q is 
compatible iff there exists no local conflict 
whose taxa set includes e. Idea Construct a 
local conflict list involving a taxon e, and then 
try to resolve all the local conflicts in the 
list by changing less than k quartet 
topologies. Method Branch and 
Bound Complexity O(4knn4) computation and 
O(kn4) memory.  
 17Theorems
Theorem 2. m number of local conflicts 
involving e. We need change at least 
quartet topologies to resolve all the local 
conflicts.
This theorem can be used as a bound factor to cut 
a node during the Branch-and-Bound search. 
 18Theorems
Theorem 3. For a quartet topology q in Q, if 
there are more than 3k distinct local conflicts 
that contain q, then q must be changed in the 
optimal solution.
This theorem can be used as a branch factor used 
to choose which quartet topology we should choose 
to change 
 19Theorems
- Theorem 4. For a bipartition(edge) (X,Y) of S 
 where Xl,
- p1 the number of quartet errors in Q across 
 (X,Y),
- p2 the number of nonexchangeable l-subsets on X, 
- p3 the number of nonexchangeable (n-l)-subsets 
 on Y.
- If 2p1(l-1)p2(n-l-1)p3 lt (l-1)(n-l-1), then 
 bipartition (X,Y) must be in the optimal
 phylogeny.
Quartet inference rules abcd, abce 
abde abce, acde abce, abde, bcde
They are used to construct a need-to-be-fixed 
quartet list, i.e., all the quartet topologies in 
the list should not be changed during search. 
 20Lookahead
Contribution of changing a quartet topology The 
difference between the size of the local conflict 
lists before and after a quartet topology 
changing.
At each search node, we first have a lookahead 
mechanism to test the contribution of each 
possible branch and choose the one with maximum 
contribution to continue searching. 
 21Outline of Algorithm
- At every node in the search tree 
- Use Theorem 2 to decide to cut the node or not 
 (test
 k1 is the number of changed quartet
 topologies so far)
- Use Theorem 3 to determine need-to-be-changed 
 quartet (If there are 3(k-k1) distinct local
 conflicts involving q, then q must be changed)
- Use Theorem 4 to determine need-to-be-fixed 
 quartets (find optimal bipartitions and all the
 quartet topologies consistent with the optimal
 bipartitions are fixed)
- Use the quartet inference rules on the quartet 
 topologies generated in step 3
-  
-  
22Outline of Algorithm-Contd
5. Build a local conflict list and partition it 
into two parts IF there are 
need-to-be-changed quartet topologies 
Pick the need-to-be-changed quartet topology 
 achieving the largest contribution to resolve 
 ELSE Pick the resolvement way achieving 
the largest contribution 
 23Experimental Results
Comparison between the GN algorithm and our 
algorithm on Finding the first solution whose 
quartet errors are less than k 
 24Experimental Results
Comparison among Hypercleaning, LBnB-1st, and 
LBnB-Opt. Hypercleaning is a heuristic algorithm 
to MQC problem LBnB-1st will stop when the first 
solution is found LBnB-Opt will search all 
possible solutions and output the optimal one 
 25Conclusions
- Our algorithm can be regarded as an improvement 
 over the GN algorithm.
- It outperforms other exact algorithms 
 significantly in both finding the first solution
 and the optimal solution.
- In some instances, our algorithm has competitive 
 running times to the heuristic hypercleaning
 method.
-  
-  
26Acknowledgement
- This research work was supported by 
- CFI 
- NSERC 
- NNSF Grant 60373012
Thanks