Title: Gad M' Landau
1LCS Approximation via Embedding into Locally
Non-repetitive Strings
- Gad M. Landau
- Avivit Levy
- Ilan Newman
2The Problem
- Longest Common Subsequence (LCS)
- important similarity measure
- Example
- X a a b a e c b b e a c d c e
- Y a b d d a e c b b e c d c e
- ED deletions, insertions, mismatches
3Motivation
- Exact computations
- dynamic programming - O(n2)
- Hunt-Szymanski - O((r n)log n),
rmatches - Hirschberg - O(nLCS)
- Landau,Crochemore,Ziv-Ukelson - fast for well
compressed strings - Computing large LCS (of near-linear size) may
still take quadratic time.
4Motivation
- Our direction approximation.
- Input?
- Small alphabet ?-approximation
- Small LCS exact algorithms
- Our input
- Relatively large alphabet ?(n?), 0lt??1
- Large LCS near linear
- Main tool low distortion embedding.
5What is embedding?
A
B
x
f(x)
f
pA(x,y)
pB(f(x),f(y))
y
f(y)
6LCS embedding
Definition Let A and B be classes of n-long
strings. A LCS preserving embedding from A into B
with distortion ? is an injective mapping fA?B,
such that for every x,y?A, ??LCS(x,y) ?
LCS(f(x),f(y)) ? LCS(x,y)
Our goal f RL(n,?)?LNR(n) where ???(n?),
?????.
7Locally Non-repetitive Strings
Definition A string S is (t,w)-non-repetitive if
every w successive t-substrings in S are
distinct. If t1 then S is locally
non-repetitive. Example n14 S1abbacbababbacb
S2abcdeabcdeabcd
(3,7)-non-rep. ?
(4,8)-non-rep.
(1,5)-non-rep. LNR
8Approximating LCS in RL strings
- Observation
- RL(n) imply (t,w)-non-rep. for good w.
- Property Let S be n-long string.
- If S has period length p, S is (p,p)-non-rep.
- If S is aperiodic, S is (n,w)-non-rep. n/2?w?n.
S1 is aperiodic (14,14)-non-rep. S2 has period
size 5 (5,5)-non-rep.
Example n14 S1abbacbababbacb S2abcdeabcdeabcd
c
9Approximating LCS in RL strings
- Basic idea
- Embed RL(n) into LNR with good w.
- Approximate LCS in LNR strings.
Why is it a good idea?
10Approximating LCS in LNR strings
Assumption (1,n/c)-non-rep. strings
x1
x2
x3
C3
x
LIS(xi,yj)
y
y1
y2
y3
Cost c2 (n/c) loglog(n/c)cnloglog n
11Approximating LCS in LNR strings
What to do with the c2 block results?
- choose the best
- assures ?(1/c)-apprx. ratio
- additional O(c2) time
- best combine non-crossing pairs
- assures ?(k/c)-apprx. ratio
- where LCS(x,y)?kn/c
- takes O(c2) time using DP
12Approximating LCS in LNR strings
This means
Large CS can be found much faster in LNR strings
with wn/c.
- The total time O(cnloglog nc2)
- if wO(n?) we get O(n2-?loglogn)
- The apprx. ratio
- ?(k/c) for LCS(x,y)?kn/c
- is constant for LCS of linear size.
13Embedding into LNR strings
How to embed RL(n) into LNR strings?
Naïve idea Use the fact that all p-substrings
are distinct in wp. Rename p-substrings.
abbacbab1 bbacbaba2 bacbabab3 acbababb4 cbabab
ba5 bababbac6 ababbacb7 babbacba8
Example n14 Sabbacbababbacb S12345678123456
14Embedding into LNR strings
How much of the LCS we lost?
LCS(x,y)?LCS(x,y) no expansion. LCS(x,y) ?
n - t?(n-LCS(x,y)) ?LCS(x,y) -
(t-1)?(n-LCS(x,y)) ?LCS(x,y) - (t-1)?ED(x,y)/2
Both t and ED may be too big. Unbearable
contraction.
15Embedding into LNR strings
How to overcome this obstacle?
The idea Use stronger property. All p-substrings
are distinct in wp in many locations. Can use
few coordinates to rename p-substrings and still
get different names.
Formally dgt2 arbitrary constant Fix a random
(t-1)-long binary vector every location is 1
with prob. 2dln t/? and 0 otherwise.
16Embedding into LNR strings
Example n14 Sabbacbababbacb S1234567812345
6
V(1,0,1,1,0,0,1,0)
abbacbab1 bbacbaba2 bacbabab3 acbababb4 cbabab
ba5 bababbac6 ababbacb7 babbacba8
Our embedding f
17Embedding into LNR strings
What is the distortion now?
LCS(x,y)?LCS(f(x),f(y)) no expansion. LCS(f(x),f
(y)) ? n (12(t-1)dln t/?)?(n-LCS(x,y)) ?LC
S(x,y) -2(t-1)dln t/??(n-LCS(x,y)) ?LCS(x,y) -
(t-1)dln t/??ED(x,y)
If EDo(LCS(x,y)??/tlnt) the distortion is
1-o(1). Large LCS preserved unless ED is too
large.
18Conclusions
- Large LCS can be approximated also for RL(n).
- LNR accelerates computations.
- Embedding accelerates computations.
- Batu, Ergun, Sahinalp (SODA,06)
- Andoni Onak (STOC,09)
Thank You.