Title: Adaptive Fast Convergence Towards Optimal Reconstruction Guarantees for Phylogenetic Trees
1Adaptive Fast ConvergenceTowards Optimal
Reconstruction Guarantees for Phylogenetic Trees
- Ilan Gronau
- Technion Israel Institute of Technology
- Haifa, Israel
Joint work with Shlomo Moran , Sagi Snir
2Phylogenetic Reconstruction
reconstructed tree
true tree
F
D
F
B
G
B
G
A
C
D
E
A
C
I
E
H
J
I
H
J
k
Goal reconstruct the true tree as accurately as
possible
3Evaluating Reconstructed Tree
reconstructed tree
true tree
F
D
F
D
B
G
B
G
A
A
C
I
E
H
J
I
H
J
E
C
False Negatives edges in the true tree which we
dont reconstruct False Positives edges we
reconstruct which arent in the true tree
Wed like to reduce the number of reconstruction
errors (FP and FN)
4The Reconstruction Threshold
e1
e2
e6
e5
e7
e3
e4
F
D
B
A
E
G
J
C
I
H
seq. length k
Input length may be insufficient to reconstruct
some edges (short and deep)
Can we guarantee reconstruction of all edges
above the threshold?
5Fast Convergence
Near-optimal information efficiency
- P. Erdos, M. Steel, L. Szekely, and T. Warnow. A
few logs suffice to build (almost) all trees (I).
Random Structures and Algorithms, 14153184,
1999. - D. Huson, S. Nettles, and T. Warnow.
Disk-Covering, a fast-converging method for
phylogenetic tree reconstruction. J Comp Biol,
6369386, 1999. - T. Warnow, B. Moret, and K. St. John. Absolute
convergence true trees from short sequences. In
SODA, pages 186195, 2001. - M. Csürös. Fast recovery of evolutionary trees
with thousands of nodes. Journal of Computational
Biology, 9(2)277297, 2002. - E. Mossel. Distorted metrics on trees and
phylogenetic forests. ACM Transactions on
computational biology and bioinformatics,
4108116, 2007. - C. Daskalakis, C. Hill, A. Jaffe, R. Mihaescu, E.
Mossel, and S. Rao. Maximal accurate forests from
distance matrices. In RECOMB, pages 281295,
2006. - And more
Reconstruct the entire tree (w.h.p.) from
sequences of polynomial-length.
6The Reconstruction ThresholdFast Converging
Algorithms
e1
e2
e6
e5
e7
e3
e4
F
D
B
A
E
G
J
C
I
H
seq. length k
Existing FC methods provide guarantees only when
the threshold is lower than the weight of the
shortest edge
7Forest Reconstruction Methods
Mossel 07 Daskalakis et al 06
Shallow edges are easier to reconstruct
8Forest Reconstruction Methods
Mossel 07 Daskalakis et al 06
Short edges block reconstruction of long edges
deeper in the tree
9Adaptive Fast Convergence
e1
e2
e6
e5
e7
e3
e4
F
! Adaptive Fast Convergence !
D
B
A
E
G
J
C
I
H
e
seq. length k
Existing FC methods provide guarantees only when
the threshold is lower than the weight of the
shortest edge
10Incremental Reconstruction
fast converging
- The incremental approach WSSB 77 Csuros
99 KZZ 03 - Use directional queries to insert taxa one at a
time. - Directional queries implemented using a quartet
oracle. - Total time complexity of O(n2).
B
A
D
E
F
G
C
- Short edges (below reconstruction threshold)
lead to false positives. - False positives lead to faulty reconstruction of
long edges.
11A Reliable Quartet Oracle
The basic building-block a reliable quartet
oracle
i
k
similar oracle used also in Daskalakis et
al 06
j
l
- Never returns a false quartet-split (may return
fail). - Returns correct split if
- separating path is long enough (above the
reconstruction threshold) - quartet is short enough (proportional to
tree-depth)
12A Reliable Incremental Algorithm
- The idea never reconstruct faulty edges!
- Use truthful directional oracle (never wrong ,
may return fail). - Insertion zone leaves point inwards
internal vertices give no direction. - Contract edges already reconstructed, if
necessary.
B
A
?
?
D
E
?
F
G
C
- False Positives None. (returned tree is an
edge-contraction of true tree) - False Negatives Only short edges. (contracted
edges are below rec. thres.)
13Main Challenges
- Directional oracle on vertices of high degree
- - Correctness no faulty answers enough
correct answers - - Complexity using O(deg(v)) quartet queries
- (sustaining O(n2) time complexity of algorithm)
- Querying only quartets of O(depth)-diameter
- - Representing each direction with a close
taxon - - Dealing with very large contracted subtrees
More details in Fast and Reliable
Reconstruction of Phylogenetic Trees with Very
Short Edges, In SODA 08
14Towards Optimal Reconstruction Guarantees for
Phylogenetic Trees
- Further optimizing reconstruction threshold
- Reducing bound on diameter of quartets we query
- Allowing reconstruction of short shallow edges
- (using ideas from forest reconstruction)
- Practical issues
- Improving reliability of directional oracle
- Using reliable partial reconstruction
e1
?
e3
e4
e5
e6
e
e7
seq. length k
e2
?
15(No Transcript)
16Short Edges
- very tough to reconstruct - correspond to
delicate splits
F
D
B
G
E
J
A
C
I
H
We want to ensure (correct) reconstruction of as
short edges as possible
17The Reconstruction Threshold
e1
e2
e6
e5
e7
e3
e4
F
D
B
A
E
G
J
C
I
H
seq. length k
Input length may be insufficient to reconstruct
some edges
Can we guarantee reconstruction of all edges
above the threshold?
18The Reconstruction Threshold
e1
e2
e6
e5
e7
e3
e4
F
D
B
A
E
G
J
C
I
H
seq. length k
Existing FC methods provide guarantees only when
the threshold is lower than the weight of the
shortest edge
19Adaptive Fast Convergence
- Reconstruct all edges above the reconstruction
threshold e - e is unknown to the algorithm
e1
e2
e6
e5
e1
e7
e3
e4
?
e3
F
e4
D
B
e5
A
E
G
J
C
e
e6
I
H
seq. length k
e7
e2
?