On the Hardness of Inferring Phylogenies from TripletDissimilarities - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

On the Hardness of Inferring Phylogenies from TripletDissimilarities

Description:

Plgw03, 17/12/07. 1. On the Hardness of Inferring Phylogenies from ... Butt'fly ...CGCG... ...AATA... ...AACG... ...CCGT... ...CAGA... ...AAGT... B E G H L M ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 31
Provided by: newt5
Category:

less

Transcript and Presenter's Notes

Title: On the Hardness of Inferring Phylogenies from TripletDissimilarities


1
On the Hardness of Inferring Phylogenies from
Triplet-Dissimilarities
  • Ilan Gronau Shlomo Moran
  • Technion Israel Institute of Technology
  • Haifa, Israel

2
Pairwise-Distance Based Reconstruction
DT
E
M
L
G
H
B
3
Optimization Criteria
We wish the tree-metric DT to approximate
simultaneously the pairwise distances in D.
should be close to
D
DT
Two closeness measures studied here
Maximal Difference (l8 )
  • Maximal Distortion

4
Maximal Difference (l8 ) vs. Maximal Distortion
B E G H L M
D
DT
B E G H L M
Goal Find optimal T, which minimizes the
maximal difference/distortion between D and DT
5
Previous works on Approximating Dissimilarities
by Tree Distances
  • Negative results (NP-hardness)
  • Closest tree-metric (even ultrametric ) to
    dissimilarity matrix under l1 l2 Day 87
  • Closest tree-metric to dissimilarity matrix
    under l8 ABFPT99
  • Hard to approximate better than 1.125
  • Implicit Hard to approximate closest MaxDist
    tree within any constant factor
  • Positive results
  • Closest ultrametric to dissimilarity matrix
    under l8 Krivanek 88
  • 3-approximation of closest additive metric to a
    given metric ABFPT99
  • (implicit 6-approximation for general
    dissimilarity matrices)

6
This Work Triplet-Distances Distances to
Triplets Midpoints
C(i,j,k)
tT (i jk)
  • tT (i jk) tT (i kj)
  • tT (i ij) 0
  • tT (i jj) DT (i, j)

i
k
j
7
Triplet-Distances Defined by 2-Distances
  • Each distance Matrix D defines 3-trees

t(i jk) ½D(i,j)D(i,k)-D(j,k).
i
Any metric on 3 taxa
8
9
j
7
k
8
Triplet-Distance Based Reconstruction
t(i jk) ½D(i,j)D(i,k)-D(j,k).
BB BE BG.. LL LM MM
B E G H L M
reconstruct
?
9
Why use Triplet-Distances?
1. They enable more accurate estimations of
2-distances. 2. They are used (de facto) by known
reconstruction algorithms
10
Improved Estimations of Pairwise Distances
Information Loss
D
Calculate D(H,E)
11
Improved Estimations (cont)
  • Estimate D(H,E) by calculating all the 3-trees on
    H,E,XX?H,E
  • (Or calculate just one 3-tree, for a trusted
    3rd taxon X
  • V. Ranwez, O. Gascuel, Improvement of
    distance-based phylogenetic methods by a local
    maximum likelihood approach using triplets,
    Mol.Biol. Evol. 19(11) 19521963. (2002)

12
(Implicit) use of Triplet-Distances in
2-Distance Reconstruction Algorithms
t(i jk) ½D(i,j)D(i,k)-D(j,k).
13
1st use Triplet Distances from a Single
Source
  • Fix a taxon r, and construct a tree T which
    minimizes
  • Optimal solution is doable in O(n2) time, and is
    used eg in
  • (FKW95) Optimal approximation of distances by
    ultrametric trees.
  • (ABFPT99) The best known approximation of
    distances by general trees
  • (BB99) Fast construction of Buneman trees.

14
2nd useSaitouNei Neighbour Joining
The neighbors-selection criterion of NJ selects a
taxon-pair i,j which maximizes the sum
r
r
i
r
r
r
r
j
r
r
15
Previous Works on Triplet-Dissimilarities/Distanc
es
  • I. Gronau, S. Moran Neighbor Joining Algorithms
    for Inferring Phylogenies via LCA-Distances,
    Journal of Computational Biology 14(1) pp. 1-15
    (2007).
  • Works which use the total weights of 3 trees
  • S. Joly, GL Calve, Three Way Distances, Journal
    of Classification 12 pp. 191-205 (1995)
  • L. Pachter, D. Speyer Reconstructing Trees from
    Subtrees Weights , Applied Mathematics Letters 17
    pp. 615-621 (2004)
  • D. Levy, R. Yoshida, L. Pachter, Beyond pairwise
    distances Neighbor-joining with phylogenetic
    diversity estimates, Mol. Biol. Evol. 23(3)
    491498 (2006) .

16
Summary of Results
  • Results for Maximal Difference (l8)
  • Decision problem is NP-Hard
  • ? IS there a tree T s.t. t,tT 8 ? ?
  • Hardness-of-approximation of optimization problem
  • ? Finding a tree T s.t. t,tT 8
    1.4t,tOPT8
  • A 15-approximation algorithm
  • ? Using the 6-approximation algorithm for
    2-dissimilarities from ABFPT99
  • Result for Maximal Distortion
  • Hardness-of-approximation within any constant
    factor

17
NP Hardness of the Decision Problem
We use a reduction from 3SAT (the problem of
determining whether a 3CNF formula is
satisfiable)
We show
If one can determine for (t,?) whether there
exists a tree T s.t. t,tT 8 ?, then one can
determine for every 3CNF formula f whether it is
satisfiable.
18
The Reduction
Given a 3CNF formula f we define triplet
distances ? and an error bound ? which enforce
the output tree to imply a satisfying assignment
to f.
  • The set of taxa
  • Taxa T , F.
  • A taxon for every literal ( ).
  • 3 taxa for every clause Cj ( y j1 , y j2 , y j3
    ).

19
Properties Enforced by the Input (?,?)
  • One the following can be enforced on each taxa
    triplet (u,v,w)
  • taxon u is close to Path(v,w), or
  • taxon u is far to Path(v,w)

u
20
Enforcing Truth Assignmaent
  • A truth assignment to f is implied by the
    following
  • T is far from F
  • For each i, is far from , and both of
    and are close to Path(T ,F)

Thus we set xi T iff xi is close to T.
21
Enforcing Clauses-Satisfaction
A clause C( l 1 ? l 2 ? l 3 ) is satisfied iff
At least one literal l i is true, i.e. is close
to T.
(l 1 ? l 2 ? l 3 ) is satisfied iff it is not
like this
We need to guarantee that all clauses avoid the
above by the close/far relations.
22
Clauses-Satisfaction (cont)
-?(l 1 ? l 2 ? l 3 ) is satisfied iff out of the
three paths Path(l 1 , l 2), Path(l 1 , l 3),
Path(l 2 , l 3), at least two paths are close
to T .
l 3
T
F
l 1
l 2
23
Clauses-Satisfaction (cont)
We attach a taxon to each such path y1 is
close to Path ( l 2,l 3) y2 is close to Path (
l 1,l 3) y3 is close to Path ( l 1,l 2)
?(l 1 ? l 2 ? l 3 ) is satisfied iff at least
two yis can be located close to T.
24
Clauses-Satisfaction (end)
and, at least two of the yis can be located
close to T Path( y 2,y 3), Path( y 1,y 3),
Path( y 1,y 2), are close to T
So, (l 1 ? l 2 ? l 3 ) is satisfied iff all the
above paths are close to T
25
Construction Example
f is satisfiable ? there is a tree T which
satisfies all bounds
A1 tT (T , F ) 2a2ß A2 i1..n
tT (T ) a tT (F
) a B1 j1..m tT (y j1 l j2 l j3 )
a tT (y j2 l j1 l j3 ) a tT (y
j3 l j1 l j2 ) a B2 j1..m tT (y j1
T F ) a tT (y j2 T F ) a tT
(y j3 T F ) a B3 j1..m tT (T y j2
y j3 ) a tT (T y j1 y j3 ) a
tT (T y j1 y j2 ) a
26
Hardness of Approximation Results
By stretching the close/far restrictions, the
following problems are also shown NP hard
  • Approximating Maximal Difference
  • Finding a tree T s.t. t,tT 8 1.4t,tOPT8
  • Approximating Maximal Distortion
  • Finding a tree T s.t.
  • MaxDist(t,tT ) C MaxDist(t,tOPT) for any
    constant C

Details in I. Gronau and S. moran, On The
Hardness of Inferring Phylogenies from
Triplet-Dissimilarities, Theoretical Computer
Science 389(1-2), December 2007, pp. 44-55.
27
Open Problems/Further Research
  • Extending hardness results for 3-diss tables
    induced by 2-diss matrices
  • (t(i jk) ½D(i,j)D(i,k)-D(j,k) )
  • Extending hardness results for naturally
    looking trees
  • (binary trees with constant-bounded edge
    weights)
  • Check Performance of NJ when neighbor selection
    formula computed from real 3-distances.
  • Devise algorithms which use 3-distances as input.
  • Does optimization of 3-diss lead to good
    topological accuracy (under accepted models of
    sequence evolution)
  • (it is known that optimization of 2-diss doesnt
    lead to good topological accuracy)

28
Thank You
29
Distance-Based Phylogenetic Reconstruction
  • Compute distances between all taxon-pairs
  • Find a tree (edge-weighted) best-describing the
    distances

30
Optimization Criteria
  • Known measures of closeness
  • l8 -
  • lp -
  • MaxDist -

( where 0/01 )
31
The Reduction
f
, ?
3CNF formula
There is a tree T s.t. t,tT 8 ?
f is satisfiable
If one can determine for (t,?) whether there
exists a tree T s.t. t,tT 8 ?, then one can
determine for every 3CNF formula f whether it is
satisfiable.
32
The Reduction
Define a set of lower and upper bounds A1 tT (T
, F ) 2a2ß A2 i1..n tT (T
) a tT (F ) a B1
j1..m tT (y j1 l j2 l j3 ) a tT (y
j2 l j1 l j3 ) a tT (y j3 l j1 l j2 )
a B2 j1..m tT (y j1 T F ) a
tT (y j2 T F ) a tT (y j3 T F )
a B3 j1..m tT (T y j2 y j3 ) a
tT (T y j1 y j3 ) a tT (T y j1 y j2
) a
33
The Reduction
f
tu
2?
,
3CNF formula
There is a tree T s.t. tl tT tu
f is satisfiable
If one can determine for (t,?) whether there
exists a tree T s.t. t,tT 8 ?, then one can
determine for every 3CNF formula f whether it is
satisfiable.
34
The Reduction
  • Define the set of taxa.
  • Define a set of lower and upper bounds on some
    entries of tT.
  • f is satisfiable ? there is a tree T which
    satisfies all bounds
  • Define ? according to the slackness required for
    the proof of ?.

35
The Reduction
  • Define the set of taxa
  • Taxa T , F.
  • A taxon for every literal ( ).
  • 3 taxa for every clause ( y j1 , y j2 , y j3 ).

36
The Analysis
A1 tT (T , F ) 2a2ß A2 i1..n tT
(T ) a tT (F )
a
  • Trees satisfying A1 and A2 imply a
    truth-assignment to x1 ,..., xn.

37
The Analysis
B1 j1..m tT (y j1 l j2 l j3 ) a tT
(y j2 l j1 l j3 ) a tT (y j3 l j1 l
j2 ) a B2 j1..m tT (y j1 T F )
a tT (y j2 T F ) a tT (y j3 T F
) a B3 j1..m tT (T y j2 y j3 )
a tT (T y j1 y j3 ) a tT (T y j1
y j2 ) a
There is a tree T which satisfies all bounds ? f
is satisfiable
  • B1 and B2 imply that y ja l jb l jc for
    a,b,c1,2,3.
  • B3 implies that at least two of y j1, y j2, y j3
    are satisfied.

38
The Reduction t(f)
A1 tT (T , F ) 2a2ß A2 i1..n
tT (T ) a tT (F
) a B1 j1..m tT (y j1 l j2 l j3 )
a tT (y j2 l j1 l j3 ) a tT (y
j3 l j1 l j2 ) a B2 j1..m tT (y j1
T F ) a tT (y j2 T F ) a tT
(y j3 T F ) a B3 j1..m tT (T y j2
y j3 ) a tT (T y j1 y j3 ) a
tT (T y j1 y j2 ) a
  • In our constructed tree
  • All 2-distances are in 2a , 2a2ß.
  • All 3-distances are in a , a2ß.
  • ? ?ß.

A1 t(T , F ) 2a3ß A2 i1..n t(T
) a-ß t(F )
a-ß B1 j1..m t(y j1 l j2 l j3 ) a-ß
t(y j2 l j1 l j3 ) a-ß t(y j3 l j1 l j2
) a-ß B2 j1..m t(y j1 T F ) aß
t(y j2 T F ) aß t(y j3 T F )
aß B3 j1..m t(T y j2 y j3 ) a-ß
t(T y j1 y j3 ) a-ß t(T y j1 y j2 )
a-ß Other 2-distances t(s , t )
2a2ß Other 3-distances t(s t u ) a2ß
Write a Comment
User Comments (0)
About PowerShow.com