A new matching algorithm based on prime numbers - PowerPoint PPT Presentation

About This Presentation
Title:

A new matching algorithm based on prime numbers

Description:

2. The Knuth-Morris-Pratt algorithm: Runs in O(N M) time, avoiding unecessary re ... involves character by character comparison by using backwards checking. ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 14
Provided by: hom4453
Category:

less

Transcript and Presenter's Notes

Title: A new matching algorithm based on prime numbers


1
A new matching algorithm based on prime numbers
  • N. D. Atreas and C. Karanikas
  • Department of Informatics
  • Aristotle University of Thessaloniki

2
Exact Matching find all the occurences of a
pattern within a text.
  • 1. The Brute Force algorithm performs character
    by character comparison in O(N M) time
    complexity, where M is the length of the pattern
    and N is the length of the text.
  • 2. The Knuth-Morris-Pratt algorithm Runs in
    O(NM) time, avoiding unecessary re-examinations
    of previously matched characters.

3
  • 3. The Boyer-Moore algorithm
  • involves character by character comparison
    by using backwards checking. Best case execution
    O(N/M), worst time O(N).
  • 4. The Karp Rabin algorithm
  • It is a randomised algorithm that seeks a
    pattern within a text by using hashing. Expected
    running time O(NM).

4
  • A hash function must be
  • efficiently computable
  • highly discriminating for strings
  • hash(x(j1 ... jM)) must be easily computable
    from hash(x(j jM-1)) and x(jM).
  • not injective, i.e. the equality of two hash
    values suggests, but does not guarantee, equality
    of the inputs.

5
Let x x(1),x(N) be a set of positive
integers and p(1)ltltp(N) be primes such that
p(1)gtMaxx(i), i1,..,N, we define the
transform
6
Properties of T(x(1)x(N))
  • T(x(1),x(N)) is one to one.
  • x(1),,x(N) can be recovered from T(x) as the
    unique solution of a system of N linear
    Diophantine equations defined recursively
  • (p(i1)p(N))x(i)p(i)c(i1) c(i)
  • where c(1)T(x)p(1)P(N).

7
Properties of T(x(1)x(N))
  • T(x) can be used as a measure of similarity
    between two strings, since it can be used for
    counting the different elements between them.
  • It provides a necessary and sufficient condition
    to detect whenever a binding operation on strings
    can be implemented.
  • It is not a hash function.

8
Modelling a hash function approximating T.
9
Definition of the hash function
  • We prove

10
Final form of hash function
  • Theorem

11
Software implementation
  • Let Xx(1),,x(N) be the text and
    Yy(1),,y(M) be the pattern.
  • Compute T(y(1),,y(M)) and T(x(1),,x(M)) in O(M)
    time.
  • Compute the hash values in O(N-M) time

12
Software implementation
  • for some i then x(i1),,x(iM-1) is a candidate
    for string matching.
  • For all candidates perform at most p (p is the
    length of the alphabet) character comparisons to
    throw out false matches.
  • The algorithm executes in O(N) time complexity.

13
Conclusions
  • We introduce the idea of a hash function
    approximation in order to reduce the
    computational complexity of an algorithm.
  • Although the time bounds are the same or in some
    times inferiors compared to Boyer-Moore
    algorithm, our algorithm is superior for multiple
    matching problems.
Write a Comment
User Comments (0)
About PowerShow.com