Gad M' Landau - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Gad M' Landau

Description:

Gad M. Landau. Avivit Levy. Ilan Newman. LCS Approximation. via Embedding into ... Landau,Crochemore,Ziv-Ukelson - fast for well compressed strings ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 19
Provided by: avi80
Category:
Tags: gad | landau

less

Transcript and Presenter's Notes

Title: Gad M' Landau


1
LCS Approximation via Embedding into Locally
Non-repetitive Strings
  • Gad M. Landau
  • Avivit Levy
  • Ilan Newman

2
The Problem
  • Longest Common Subsequence (LCS)
  • important similarity measure
  • Example
  • X a a b a e c b b e a c d c e
  • Y a b d d a e c b b e c d c e
  • ED deletions, insertions, mismatches

3
Motivation
  • Exact computations
  • dynamic programming - O(n2)
  • Hunt-Szymanski - O((r n)log n),
    rmatches
  • Hirschberg - O(nLCS)
  • Landau,Crochemore,Ziv-Ukelson - fast for well
    compressed strings
  • Computing large LCS (of near-linear size) may
    still take quadratic time.

4
Motivation
  • Our direction approximation.
  • Input?
  • Small alphabet ?-approximation
  • Small LCS exact algorithms
  • Our input
  • Relatively large alphabet ?(n?), 0lt??1
  • Large LCS near linear
  • Main tool low distortion embedding.

5
What is embedding?
A
B
x
f(x)
f
pA(x,y)
pB(f(x),f(y))
y
f(y)
6
LCS embedding
Definition Let A and B be classes of n-long
strings. A LCS preserving embedding from A into B
with distortion ? is an injective mapping fA?B,
such that for every x,y?A, ??LCS(x,y) ?
LCS(f(x),f(y)) ? LCS(x,y)
Our goal f RL(n,?)?LNR(n) where ???(n?),
?????.
7
Locally Non-repetitive Strings
Definition A string S is (t,w)-non-repetitive if
every w successive t-substrings in S are
distinct. If t1 then S is locally
non-repetitive. Example n14 S1abbacbababbacb
S2abcdeabcdeabcd
(3,7)-non-rep. ?
(4,8)-non-rep.
(1,5)-non-rep. LNR
8
Approximating LCS in RL strings
  • Observation
  • RL(n) imply (t,w)-non-rep. for good w.
  • Property Let S be n-long string.
  • If S has period length p, S is (p,p)-non-rep.
  • If S is aperiodic, S is (n,w)-non-rep. n/2?w?n.

S1 is aperiodic (14,14)-non-rep. S2 has period
size 5 (5,5)-non-rep.
Example n14 S1abbacbababbacb S2abcdeabcdeabcd
c
9
Approximating LCS in RL strings
  • Basic idea
  • Embed RL(n) into LNR with good w.
  • Approximate LCS in LNR strings.

Why is it a good idea?
10
Approximating LCS in LNR strings
Assumption (1,n/c)-non-rep. strings
x1
x2
x3
C3
x
LIS(xi,yj)
y
y1
y2
y3
Cost c2 (n/c) loglog(n/c)cnloglog n
11
Approximating LCS in LNR strings
What to do with the c2 block results?
  • choose the best
  • assures ?(1/c)-apprx. ratio
  • additional O(c2) time
  • best combine non-crossing pairs
  • assures ?(k/c)-apprx. ratio
  • where LCS(x,y)?kn/c
  • takes O(c2) time using DP

12
Approximating LCS in LNR strings
This means
Large CS can be found much faster in LNR strings
with wn/c.
  • The total time O(cnloglog nc2)
  • if wO(n?) we get O(n2-?loglogn)
  • The apprx. ratio
  • ?(k/c) for LCS(x,y)?kn/c
  • is constant for LCS of linear size.

13
Embedding into LNR strings
How to embed RL(n) into LNR strings?
Naïve idea Use the fact that all p-substrings
are distinct in wp. Rename p-substrings.
abbacbab1 bbacbaba2 bacbabab3 acbababb4 cbabab
ba5 bababbac6 ababbacb7 babbacba8
Example n14 Sabbacbababbacb S12345678123456
14
Embedding into LNR strings
How much of the LCS we lost?
LCS(x,y)?LCS(x,y) no expansion. LCS(x,y) ?
n - t?(n-LCS(x,y)) ?LCS(x,y) -
(t-1)?(n-LCS(x,y)) ?LCS(x,y) - (t-1)?ED(x,y)/2
Both t and ED may be too big. Unbearable
contraction.
15
Embedding into LNR strings
How to overcome this obstacle?
The idea Use stronger property. All p-substrings
are distinct in wp in many locations. Can use
few coordinates to rename p-substrings and still
get different names.
Formally dgt2 arbitrary constant Fix a random
(t-1)-long binary vector every location is 1
with prob. 2dln t/? and 0 otherwise.
16
Embedding into LNR strings
Example n14 Sabbacbababbacb S1234567812345
6
V(1,0,1,1,0,0,1,0)
abbacbab1 bbacbaba2 bacbabab3 acbababb4 cbabab
ba5 bababbac6 ababbacb7 babbacba8
Our embedding f
17
Embedding into LNR strings
What is the distortion now?
LCS(x,y)?LCS(f(x),f(y)) no expansion. LCS(f(x),f
(y)) ? n (12(t-1)dln t/?)?(n-LCS(x,y)) ?LC
S(x,y) -2(t-1)dln t/??(n-LCS(x,y)) ?LCS(x,y) -
(t-1)dln t/??ED(x,y)
If EDo(LCS(x,y)??/tlnt) the distortion is
1-o(1). Large LCS preserved unless ED is too
large.
18
Conclusions
  • Large LCS can be approximated also for RL(n).
  • LNR accelerates computations.
  • Embedding accelerates computations.
  • Batu, Ergun, Sahinalp (SODA,06)
  • Andoni Onak (STOC,09)

Thank You.
Write a Comment
User Comments (0)
About PowerShow.com