Title: A Faster Reconstruction of Binary Near-Perfect Phylogenetic Trees
1A Faster Reconstruction of Binary Near-Perfect
Phylogenetic Trees
- Srinath Sridhar
- Joint work with Kedar Dhamdhere, Guy E.
Blelloch, Eran Halperin, R. Ravi and Russell
Schwartz
2Steiner Tree Problem
- Input Graph G(V, E) with edge weights w E ? R
and a terminal set S ? V - Output Subtree T of G connecting all vertices in
S - Objective Minimize w(T)
- Informally MST with intermediate vertices
- NP-complete, even if G is m-dimensional hypercube
with unit edge weights
3Near-Perfect Phylogenetic Trees
- Input set S of n points on an m-dimensional
hypercube (n bit-strings of length m) - Output Steiner (unrooted) tree T connecting S
using intermediate nodes (Steiner nodes) of
hypercube - Objective Minimize T
- Assumption Topt ? m q, constant q
4Why is this important? (Foster et al., 98)
5Why is this important? (Wirth et al., 04)
6Typical Input Data
- Rows different species, languages etc
- Columns yes/no, 0/1 properties of rows
- Phenotypes Each column can represent binary
questions thumbs? color-blind? - DNA Each position has 2 possibilities (almost
always)
7Example
0001 Boggart
- H W RS B/NB
- Basilisk 1 1 0 0
- Boggart 0 0 0 1
- Centaur 1 0 1 1
- Goblin 1 0 0 1
- H Head
- W Wings
- RS Can read stars
- B/NB Bad/not-so-bad
1001Goblin
1000 Steiner
1011 Centaur
Basilisk 1100
8Perfectness
0001 Boggart
1
1001Goblin
- Annotate tree T with the column flip
- Tree T perfect annotations occur only once
- Evolution is assumed to be (nearly) perfect
4
3
1000 Steiner
1011 Centaur
2
Basilisk 1100
9Perfectness
0001 Boggart
1
1001Goblin
- Annotate tree T with the column flip
- Tree T perfect annotations occur only once
- Evolution is assumed to be (nearly) perfect
- q-near-perfect Topt ? m q, constant q
4
3
1000 Dementor
1011 Centaur
2
Basilisk 1100
1-near perfect
4
Hippogriff 1101
10General Phylogeny Problem
- Input S set of n strings in 1, , km
- Output Steiner tree T connecting all of S
(Hamming distance) - Objective Minimize T
- Variants
- k is bounded by a constant, k is 2
- Tree T is perfect
- Tree T is near-perfect
11Some Prior Work
States k Perfect-ness Time Work
2 perfect O(n m) Gusfield, 92
unbounded perfect NP-complete Bodlaender et al.92, Steel 92
constant perfect O(23k(n m3m4))O(22kn m2) Agarwala, Fernandez-Baca, 93, Kannan, Warnow, 97
constant q-near Fernandez-Baca, Lagergren 03
2 q-near qO(q)nmO(nm2) Our work
12Overview
Discover O(q) edges, induced topology
Optimal Tree
13Overview
Discover assignment of rows to super nodes
Optimal Tree
14Overview
Grow perfect phylogeny within Each super node
Optimal Tree
15Overview
Link the super nodes
Optimal Tree
16Current/Future Work
- Simpler algorithm
- States k gt 2, near-perfect
- Experimental evaluation, useable code
- Related harder problem Input is mixture of 2
strings over 0, 1mInput 2 0 1 1
2Output 1 0 1 1 0 0 0 1 1 1