A new matching algorithm based on prime numbers

About This Presentation

Title:

A new matching algorithm based on prime numbers

Description:

2. The Knuth-Morris-Pratt algorithm: Runs in O(N M) time, avoiding unecessary re ... involves character by character comparison by using backwards checking. ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 14

Provided by: hom4453

Category:

more less

Transcript and Presenter's Notes

Title: A new matching algorithm based on prime numbers

1
A new matching algorithm based on prime numbers

N. D. Atreas and C. Karanikas
Department of Informatics
Aristotle University of Thessaloniki

2
Exact Matching find all the occurences of a
pattern within a text.

1. The Brute Force algorithm performs character
by character comparison in O(N M) time
complexity, where M is the length of the pattern
and N is the length of the text.
2. The Knuth-Morris-Pratt algorithm Runs in
O(NM) time, avoiding unecessary re-examinations
of previously matched characters.

3. The Boyer-Moore algorithm
involves character by character comparison
by using backwards checking. Best case execution
O(N/M), worst time O(N).
4. The Karp Rabin algorithm
It is a randomised algorithm that seeks a
pattern within a text by using hashing. Expected
running time O(NM).

A hash function must be
efficiently computable
highly discriminating for strings
hash(x(j1 ... jM)) must be easily computable
from hash(x(j jM-1)) and x(jM).
not injective, i.e. the equality of two hash
values suggests, but does not guarantee, equality
of the inputs.

5
Let x x(1),x(N) be a set of positive
integers and p(1)ltltp(N) be primes such that
p(1)gtMaxx(i), i1,..,N, we define the
transform
6
Properties of T(x(1)x(N))

T(x(1),x(N)) is one to one.
x(1),,x(N) can be recovered from T(x) as the
unique solution of a system of N linear
Diophantine equations defined recursively
(p(i1)p(N))x(i)p(i)c(i1) c(i)
where c(1)T(x)p(1)P(N).

7
Properties of T(x(1)x(N))

T(x) can be used as a measure of similarity
between two strings, since it can be used for
counting the different elements between them.
It provides a necessary and sufficient condition
to detect whenever a binding operation on strings
can be implemented.
It is not a hash function.

8
Modelling a hash function approximating T.
9
Definition of the hash function

We prove

10
Final form of hash function

Theorem

11
Software implementation

Let Xx(1),,x(N) be the text and
Yy(1),,y(M) be the pattern.
Compute T(y(1),,y(M)) and T(x(1),,x(M)) in O(M)
time.
Compute the hash values in O(N-M) time

12
Software implementation

for some i then x(i1),,x(iM-1) is a candidate
for string matching.
For all candidates perform at most p (p is the
length of the alphabet) character comparisons to
throw out false matches.
The algorithm executes in O(N) time complexity.

13
Conclusions

We introduce the idea of a hash function
approximation in order to reduce the
computational complexity of an algorithm.
Although the time bounds are the same or in some
times inferiors compared to Boyer-Moore
algorithm, our algorithm is superior for multiple
matching problems.

Write a Comment

User Comments (0)