2 Dimensional Parameterized Matching - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

2 Dimensional Parameterized Matching

Description:

Encode the linearization of strips. Why overflow problem solved? ... For first strip, compute predecessors on its linearization. ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 29
Provided by: har1163
Category:

less

Transcript and Presenter's Notes

Title: 2 Dimensional Parameterized Matching


1
2 Dimensional Parameterized Matching
  • Carmit Hazay
  • Moshe Lewenstein
  • Dekel Tsur

2
Outline
  • Definitions
  • HistoryMotivation
  • Text Preprocessing
  • Algorithm Outline
  • Pattern Preprocessing

3
Parameterized Matching
  • Input two strings s and t, st, over
    alphabets ?s and ?t.
  • s parameterize matches t if bijection
    ?s ?t , such that (s) t.

Example
a
a
b
b
b
s
(a)x
x
x
y
y
y
t
(b)y
4
1D Parameterized Matching
  • Input Two strings T, P Tn, Pm.
  • Output All text locations i,
  • such that (P)Ti Tim-1.

5
2D Parameterized Matching
  • Input Text T and pattern P
    Tnn, Pmm.
  • Output All text locations (i,j),
  • such that (P)Ti,j Tim-1,jm-1.
  • Example-

T
a b c a a b b b b
(x)a (y)b (z)c
P
x y z x x y y y y
6
Parameterized Matching History
  • Introduced by Brenda Baker Baker93.
  • Two Dimensions AACLP03This work.
  • Used in scaled matching ABL99.
  • Periodicity of parameterized matching
    ApostolicoGiancarlo.
  • Approximate parameterized matching AEL,
    HLS04.
  • Others AFM94, Bak95, Bak97.

7
Mismatch pairs
  • Pair of locations such that the characters
    disagree parameterized.
  • Example,

a a b a a a
x x y x z y
8
1D Encoding
  • Encode every text location by its predecessor
    location.

First a to its left
a b a d d a b d b c b d a a b d a a a a b b b
T
1 3 6 13 14 15
16 17 18
Encoded T
0 1 3 6 13 14 15 16 17
9
1D Encoding
  • Two p-matching strings have the same encoded
    texts.

S
a b b c b a a c b b c b a
0 0 2 0 3 1 6 4 5 9 8 10 7
Encoded S
x y y z y x x z y y z y x
T
0 0 2 0 3 1 6 4 5 9 8 10 7
Encoded T
10
1D Encoding
  • Two strings p-match iff encoded strings match.
  • Reduction to exact matching problem.

S
a b b c b b a c b b c b a
0 0 2 0 3 5 6 4 5 9 8 10 7
Encoded S
x y y z y x x z y y z y x
T
0 0 2 0 3 1 6 4 5 9 8 10 7
Encoded T
11
2D Mismatch Pairs
  • Same as 1D mismatch pairs, but with 2D strings.
  • Example

a b a b a b b a b
x y x y y y y y y
12
2D Encoding
  • First idea,
  • Encode the linearization of text and pattern.
  • Overflow problem!!

b
a
Different character than b
b
a
b
Different character than a
13
2D Encoding
  • Second idea, use strips.
  • Strip Substring of T of size nm.
  • i-th strip of T, is nm substring
    T1n,iim-1.

i
Encode the linearization of strips. Why overflow
problem solved? Every predecessor within mm
window.
14
Text Preprocessing
  • For first strip, compute predecessors on its
    linearization.
  • How to compute predecessors for rest of strips?
  • First solution Do same as above.
  • Time O(n2m).
  • Can we do better?

15
Update strips
  • Yes, exploit information from previous strips.
  • When moving from strip i to i1, update only O(n)
    pointers of first and last column.

i
i1
Time O(n2)
16
Check Predecessors
  • Are we done?
  • No!!
  • Need to check every predecessor against every
    text location contains it.
  • Worst case O(n2m2).
  • How to improve?

17
Algorithm Outline
  • Use Duel and Sweep paradigm
  • Find candidates - Dueling
  • Divide candidates by strips
  • Update predecessors of every new strip
  • Check new predecessors - Sweep
  • Assume pattern witness table given.

18
Witness
  • Witness Mismatch pair between P and its
    alignment to location (a,b).

a
b
19
Set Candidates
  • Using duel-
  • Two text locations with witness one can be
    eliminated.
  • Apply algorithm of ABF94 and return list of
    candidates.
  • Time O(n2).

20
Sweep Technique
  • Observation,
  • All candidates agree with each other.
  • Hence,
  • Mismatch pair eliminates all candidates
    containing it.
  • Therefore,
  • For every predecessor, enough to find one
    candidate that contains it.

21
Sweep Technique
  • How to find?
  • Create new nn array A such that,
  • Ai,j largest row among candidates that
    starts at column j and overlap with row i.

x
22
Sweep Technique
  • For every predecessor (i,j), (x,y), use range
    minima query to find highest candidate contain it.

In case of a mismatch pair, eliminate all
candidates containing it. How?
23
Sweep Technique
  • Use mismatch vector.
  • Every mismatch pair translates into range.
  • For new predecessors, add mistake predecessors,
    and delete old ones.

All candidates within this range are
eliminated. O(n2) time.
m
24
Sweep Technique
  • Observation-
  • T p-matches P
  • Every text location and its predecessor are not
    mismatch pair
  • of distinct characters in P and T equal
  • Left to do?
  • Count distinct characters for every candidate.
  • Use algorithm of Amir Cole Dar Church, time O(n2).

25
Overview
Checking all predecessors takes linear
time. Total time O(n2).
26
Pattern Preprocessing
  • Find witness table for P in time O(m2.5
    polylogm).
  • For every pattern location (i,j), create list of
    size O( ) pointers.
  • Pointer i is predecessor of lines above
    (i,j).
  • Reduce to exact matching with dont cares.

27
Pattern Preprocessing
  • End cases, multiple cases.

Less than
B1
A1
A2
B2
B3
A3
A4
B4
28
Open Questions
  • Can the algorithm time complexity be reduced into
    O(n2m2)?
Write a Comment
User Comments (0)
About PowerShow.com