Adaptive Fast Convergence Towards Optimal Reconstruction Guarantees for Phylogenetic Trees PowerPoint PPT Presentation

presentation player overlay
1 / 15
About This Presentation
Transcript and Presenter's Notes

Title: Adaptive Fast Convergence Towards Optimal Reconstruction Guarantees for Phylogenetic Trees


1
Adaptive Fast ConvergenceTowards Optimal
Reconstruction Guarantees for Phylogenetic Trees
  • Ilan Gronau
  • Technion Israel Institute of Technology
  • Haifa, Israel

Joint work with Shlomo Moran , Sagi Snir
2
Phylogenetic Reconstruction
reconstructed tree
true tree
F
D
F
B
G
B
G
A
C
D
E
A
C
I
E
H
J
I
H
J
k
Goal reconstruct the true tree as accurately as
possible
3
Evaluating Reconstructed Tree
reconstructed tree
true tree
F
D
F
D
B
G
B
G
A
A
C
I
E
H
J
I
H
J
E
C
False Negatives edges in the true tree which we
dont reconstruct False Positives edges we
reconstruct which arent in the true tree
Wed like to reduce the number of reconstruction
errors (FP and FN)
4
The Reconstruction Threshold
e1
e2
e6
e5
e7
e3
e4
F
D
B
A
E
G
J
C
I
H
seq. length k
Input length may be insufficient to reconstruct
some edges (short and deep)
Can we guarantee reconstruction of all edges
above the threshold?
5
Fast Convergence
Near-optimal information efficiency
  • P. Erdos, M. Steel, L. Szekely, and T. Warnow. A
    few logs suffice to build (almost) all trees (I).
    Random Structures and Algorithms, 14153184,
    1999.
  • D. Huson, S. Nettles, and T. Warnow.
    Disk-Covering, a fast-converging method for
    phylogenetic tree reconstruction. J Comp Biol,
    6369386, 1999.
  • T. Warnow, B. Moret, and K. St. John. Absolute
    convergence true trees from short sequences. In
    SODA, pages 186195, 2001.
  • M. Csürös. Fast recovery of evolutionary trees
    with thousands of nodes. Journal of Computational
    Biology, 9(2)277297, 2002.
  • E. Mossel. Distorted metrics on trees and
    phylogenetic forests. ACM Transactions on
    computational biology and bioinformatics,
    4108116, 2007.
  • C. Daskalakis, C. Hill, A. Jaffe, R. Mihaescu, E.
    Mossel, and S. Rao. Maximal accurate forests from
    distance matrices. In RECOMB, pages 281295,
    2006.
  • And more

Reconstruct the entire tree (w.h.p.) from
sequences of polynomial-length.
6
The Reconstruction ThresholdFast Converging
Algorithms
e1
e2
e6
e5
e7
e3
e4
F
D
B
A
E
G
J
C
I
H
seq. length k
Existing FC methods provide guarantees only when
the threshold is lower than the weight of the
shortest edge
7
Forest Reconstruction Methods
Mossel 07 Daskalakis et al 06
Shallow edges are easier to reconstruct
8
Forest Reconstruction Methods
Mossel 07 Daskalakis et al 06
Short edges block reconstruction of long edges
deeper in the tree
9
Adaptive Fast Convergence
e1
e2
e6
e5
e7
e3
e4
F
! Adaptive Fast Convergence !
D
B
A
E
G
J
C
I
H
e
seq. length k
Existing FC methods provide guarantees only when
the threshold is lower than the weight of the
shortest edge
10
Incremental Reconstruction
fast converging
  • The incremental approach WSSB 77 Csuros
    99 KZZ 03
  • Use directional queries to insert taxa one at a
    time.
  • Directional queries implemented using a quartet
    oracle.
  • Total time complexity of O(n2).

B
A
D
E
F
G
C
  • Short edges (below reconstruction threshold)
    lead to false positives.
  • False positives lead to faulty reconstruction of
    long edges.

11
A Reliable Quartet Oracle
The basic building-block a reliable quartet
oracle
i
k
similar oracle used also in Daskalakis et
al 06
j
l
  • Never returns a false quartet-split (may return
    fail).
  • Returns correct split if
  • separating path is long enough (above the
    reconstruction threshold)
  • quartet is short enough (proportional to
    tree-depth)

12
A Reliable Incremental Algorithm
  • The idea never reconstruct faulty edges!
  • Use truthful directional oracle (never wrong ,
    may return fail).
  • Insertion zone leaves point inwards
    internal vertices give no direction.
  • Contract edges already reconstructed, if
    necessary.

B
A
?
?
D
E
?
F
G
C
  • False Positives None. (returned tree is an
    edge-contraction of true tree)
  • False Negatives Only short edges. (contracted
    edges are below rec. thres.)

13
Main Challenges
  • Directional oracle on vertices of high degree
  • - Correctness no faulty answers enough
    correct answers
  • - Complexity using O(deg(v)) quartet queries
  • (sustaining O(n2) time complexity of algorithm)
  • Querying only quartets of O(depth)-diameter
  • - Representing each direction with a close
    taxon
  • - Dealing with very large contracted subtrees

More details in Fast and Reliable
Reconstruction of Phylogenetic Trees with Very
Short Edges, In SODA 08
14
Towards Optimal Reconstruction Guarantees for
Phylogenetic Trees
  • Further optimizing reconstruction threshold
  • Reducing bound on diameter of quartets we query
  • Allowing reconstruction of short shallow edges
  • (using ideas from forest reconstruction)
  • Practical issues
  • Improving reliability of directional oracle
  • Using reliable partial reconstruction

e1
?
e3
e4
e5
e6
e
e7
seq. length k
e2
?
15
(No Transcript)
16
Short Edges
- very tough to reconstruct - correspond to
delicate splits
F
D
B
G
E
J
A
C
I
H
We want to ensure (correct) reconstruction of as
short edges as possible
17
The Reconstruction Threshold
e1
e2
e6
e5
e7
e3
e4
F
D
B
A
E
G
J
C
I
H
seq. length k
Input length may be insufficient to reconstruct
some edges
Can we guarantee reconstruction of all edges
above the threshold?
18
The Reconstruction Threshold
e1
e2
e6
e5
e7
e3
e4
F
D
B
A
E
G
J
C
I
H
seq. length k
Existing FC methods provide guarantees only when
the threshold is lower than the weight of the
shortest edge
19
Adaptive Fast Convergence
  • Reconstruct all edges above the reconstruction
    threshold e
  • e is unknown to the algorithm

e1
e2
e6
e5
e1
e7
e3
e4
?
e3
F
e4
D
B
e5
A
E
G
J
C
e
e6
I
H
seq. length k
e7
e2
?
Write a Comment
User Comments (0)
About PowerShow.com