Boyer Moore Searches on Binary Texts - PowerPoint PPT Presentation

About This Presentation
Title:

Boyer Moore Searches on Binary Texts

Description:

Boyer Moore Searches. on Binary Texts. Shmuel Tomi Klein ... World Factbook (1.5MB) Text: Huffman encoded. Patterns: Random substrings. of lengths 10 to 500 ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 26
Provided by: string
Category:

less

Transcript and Presenter's Notes

Title: Boyer Moore Searches on Binary Texts


1
Boyer Moore Searches on Binary Texts
Accelerating
  • Shmuel Tomi Klein
  • Miri Kopel Ben-Nissan
  • Bar Ilan University, ISRAEL

2
Outline
Background and motivation
Boyer Moore algorithm
New binary variant
Analysis
Experiments
Summary
3
Important application of Automata
PATTERN MATCHING
KMP BDM BM
Boyer Moore
Match Backwards ! !
this-is-a-sample-text---
pattern
4
Boyer Moore Algorithm
Mismatch case 1 delta1
b does not occur in x
y
u
b
x
u
a
5
Boyer Moore Algorithm
Mismatch case 2 delta1
b occurs in x
y
u
b
x
u
a
6
Boyer Moore Algorithm
Mismatch case 3 delta2
u reoccurs in x preceded by c ? a
y
u
b
x
u
a
7
Boyer Moore Algorithm
Mismatch case 4 delta2
Only a suffix v of u reoccurs in x
y
u
b
x
u
a
v
8
Boyer Moore Example
rest x p m l e a
7 5 2 3 1 0 4
e l p m a x e
1 7 8 9 10 11 12
9
Problems of Binary Boyer Moore
most work by delta1
delta1 useless
10
Need for Binary Boyer Moore
Compressed Matching
Given E(T) and P look for
E(P) in E(T)
rather than
P in D(E(T))
Suggested Solution
BBBMM
Blocked Binary Boyer Moore Matching
11
BBBMM
12
BBBMM
More information in binary case
ffghabdgttiocb sbgghj
ASCII
01100010 01101010
BINARY
13
BBBMM
extended delta1
14
BBBMM
Total size of delta1 tables
If too large, use limit value
Size of delta1 tables reduced to
15
BBBMM
Original delta1 increase of text pointer
BBBMM delta1 shift size
Mismatch not in last block
Correctsh,j
16
BBBMM
delta2
16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 j
1 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 Patj
1 2 15 7 3 13 13 13 13 13 13 13 13 13 13 13 delta2j
17
Analysis
Assumption random input
Reasonable for compressed text
Expected comparisons till mismatch
Bit-wise
Blocked
18
Analysis
Expected bits shifted after mismatch
Bit-wise M
Blocked M
19
Experiments
English Bible (2.5MB)
World Factbook (1.5MB)
Text Huffman encoded
k 8
Patterns Random substrings
of lengths 10 to 500
20
Experiments
Average comparisons between shifts
21
Experiments
Average size of shifts
Bit-wise
22
Experiments
Average comparisons for 1000 bits
23
Experiments
Time to locate first occurrence (ms)
24
Summary
Blocked variant of BM
Faster than alternatives, Overhead 1-10 K
Extensions
ASCII, words instead of characters
25
Thank you !
Write a Comment
User Comments (0)
About PowerShow.com