The Rabin-Karp Algorithm - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

The Rabin-Karp Algorithm

Description:

n size of input string. m size of pattern to be matched. O( (n-m 1)m ) T( n2 ) if m ... 'Radix-d digits' How it works. Hash pattern P into a numeric value ... – PowerPoint PPT presentation

Number of Views:2021
Avg rating:3.0/5.0
Slides: 12
Provided by: jonathanm9
Category:
Tags: algorithm | karp | rabin | radix

less

Transcript and Presenter's Notes

Title: The Rabin-Karp Algorithm


1
The Rabin-Karp Algorithm
  • String Matching

Jonathan M. Elchison 19 November 2004 CS-3410
Algorithms Dr. Shomper
2
Background
  • String matching
  • Naïve method
  • n size of input string
  • m size of pattern to be matched
  • O( (n-m1)m )
  • T( n2 ) if m floor( n/2 )
  • We can do better

3
How it works
  • Consider a hashing scheme
  • Each symbol in alphabet S can be represented by
    an ordinal value 0, 1, 2, ..., d
  • S d
  • Radix-d digits

4
How it works
  • Hash pattern P into a numeric value
  • Let a string be represented by the sum of these
    digits
  • Horners rule ( 30.1)
  • Example
  • A, B, C, ..., Z ? 0, 1, 2, ..., 26
  • BAN ? 1 0 13 14
  • CARD ? 2 0 17 3 22

5
Upper limits
  • Problem
  • For long patterns, or for large alphabets, the
    number representing a given string may be too
    large to be practical
  • Solution
  • Use MOD operation
  • When MOD q, values will be lt q
  • Example
  • BAN 1 0 13 14
  • 14 mod 13 1
  • BAN ? 1
  • CARD 2 0 17 3 22
  • 22 mod 13 9
  • CARD ? 9

6
Searching
7
Spurious Hits
  • Question
  • Does a hash value match mean that the patterns
    match?
  • Answer
  • No these are called spurious hits
  • Possible cases
  • MOD operation interfered with uniqueness of hash
    values
  • 14 mod 13 1
  • 27 mod 13 1
  • MOD value q is usually chosen as a prime such
    that 10q just fits within 1 computer word
  • Information is lost in generalization (addition)
  • BAN ? 1 0 13 14
  • CAM ? 2 0 12 14

8
Code
  • RABIN-KARP-MATCHER( T, P, d, q )
  • n ? length T
  • m ? length P
  • h ? dm-1 mod q
  • p ? 0
  • t0 ? 0
  • for i ? 1 to m ? Preprocessing
  • do p ? ( dp P i ) mod q
  • t0 ? ( dt0 T i ) mod q
  • for s ? 0 to n m ? Matching
  • do if p ts
  • then if P 1..m T s1 .. sm
  • then print Pattern occurs with shift s
  • if s lt n m
  • then ts1 ? ( d ( ts T s 1 h )
    T s m 1 ) mod q

9
Performance
  • Preprocessing (determining each pattern hash)
  • T( m )
  • Worst case running time
  • T( (n-m1)m )
  • No better than naïve method
  • Expected case
  • If we assume the number of hits is constant
    compared to n, we expect O( n )
  • Only pattern-match hits not all shifts

10
Demonstration
  • http//www-igm.univ-mlv.fr/lecroq/string/node5.ht
    ml

11
The Rabin-Karp Algorithm
  • Sources
  • Cormen, Thomas S., et al. Introduction to
    Algorithms. 2nd ed. Boston MIT Press, 2001.
  • Karp-Rabin algorithm. 15 Jan 1997.
    lthttp//www-igm.univ-mlv.fr/lecroq/string/node5.h
    tmlgt.
  • Shomper, Keith. Rabin-Karp Animation. E-mail
    to Jonathan Elchison. 12 Nov 2004.
  • String Matching

Jonathan M. Elchison 19 November 2004 CS-3410
Algorithms Dr. Shomper
Write a Comment
User Comments (0)
About PowerShow.com