Searching Strings - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Searching Strings

Description:

Brandon Ochs Pattern: Smaller string that we are looking for String: Larger than the pattern; the part being searched Traverse the string from left to right Compare ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 29
Provided by: Corpo125
Learn more at: http://www.cs.ucf.edu
Category:

less

Transcript and Presenter's Notes

Title: Searching Strings


1
Searching Strings
  • Brandon Ochs

2
The Basics
  • Pattern Smaller string that we are looking for
  • String Larger than the pattern the part being
    searched

3
Simple Method
  • Traverse the string from left to right
  • Compare pattern to left most part of the string
  • If a match is found, compare with the patten
  • If no match is found shift the pattern right one
    unit
  • If we run out of space, there is no match

4
Simple Method Example
  • We want to find the word base in this sentence

5
Simple Method Example
  • ALL YOUR BASE ARE BELONG TO US
  • BASE
  • Left Most Character B

6
  • ALL YOUR BASE ARE BELONG TO US
  • BASE
  • Left Most Character L

7
  • ALL YOUR BASE ARE BELONG TO US
  • BASE
  • Left Most Character L

8
Eventually
  • ALL YOUR BASE ARE BELONG TO US
  • BASE
  • O(mn) where m is the length of the pattern and n
    is the length of the string

9
Worst Case Scenario
  • String AAAAAAAAAAAAAB
  • Pattern AAAB
  • Total of 44 comparisons

10
Boyer-Moore Algorithm
  • Compare to pattern from right to left
  • If the strings character being matched is not in
    the pattern, shift by the pattern length

11
Boyer-Moore Example
  • ALL YOUR BASE ARE BELONG TO US
  • BASE
  • Right Most Character
  • Shift 4

12
  • ALL YOUR BASE ARE BELONG TO US
  • BASE
  • Right Most Character R
  • Shift 4

13
  • ALL YOUR BASE ARE BELONG TO US
  • BASE
  • Right Most Character S
  • Shift 1

14
  • ALL YOUR BASE ARE BELONG TO US
  • BASE
  • Right Most Character E
  • Match Found

15
Worst Case
  • String AAAAAAAAAAAAAB
  • Pattern AAAB
  • Total of 24 comparisons

16
Improvements
  • Create a table which contains the distance from
    each unique character in the pattern, starting
    from the right.
  • Create a second table, which will indicate the
    rightmost recurrence of each possible terminal
    portion of the pattern.

17
Calculating the First Table
  • Start at the last character of the pattern and
    move towards the first character.
  • If the current character is not in the table
    already, add it.
  • The Shift value is its distance from the
    rightmost character.
  • All other characters receive a count equal to
    the length of the pattern.

18
Table 1 Example
  • Pattern Banana
  • Character    Shift        A        0        N
           1        B        5 Other        6

19
Calculating the Second Table
  • Traverse the pattern and create sub-strings for
    each value of i less than the total length.
  • Calculate the sub-string consisting of the last i
    characters preceded by a mis-match for the
    character before it (meaning anything but the
    current character).

20
Table 2 Example
  • Pattern Banana
  • i Character Shift
  • 0 A 1
  • 1 NA 4
  • 2 ANA 6
  • 3 NANA 2
  • 4 ANANA 6
  • 5 BANANA 6

21
More On Table 2
  • Remember, we are moving the Pattern, not the
    substring.
  • Example
  • ABA
  • --ABABABABA
  • Match is found after 2 shifts

22
Pseudo Code
  • ilt-width
  • while string remains
  • j lt- width
  • if j 0 then print Match at i 1
  • if string(i) pattern(j)
  • then j lt- j 1
  • i lt- j 1
  • //if
  • i lt- i max (table1(string(i)), table2(j))
  • //if
  • //while

23
How About That Worst Case?
  • String AAAAAAAAAAAAAB
  • Pattern AAAB
  • Table1 Table2
  • B 0 0 B 1
  • A 1 1 AB 4
  • Other 4 2 AAB 4
  • 3 AAAB 4
  • Total of 14 comparisons

24
Summary of Boyer Moore
  • Preprocesses the pattern
  • Time Complexity is Sub-lilnear because it
    doesnt need to check every character of the
    string to be searched.
  • Requires fewer than (i width) instructions
  • The algorithm improves in efficiency as the
    pattern becomes longer

25
Table 1 vs. Table 2
  • Table 1 is more useful with large alphabets and
    small patterns
  • Table 2 is more useful for small alphabets and
    large patterns

26
Applications to Computer Science
  • Really only useful for one thing
  • Searching text!!
  • Text based searching is very important in many
    applications and web pages

27
References
  • Dewdney, A.K. The (New) Turing Omnibus. New York
    Henry Holt and Company, 1993.
  • R.S. Boyer, J.S. Moore A Fast String Searching
    Algorithm. Communications of the ACM, 20, 10,
    762-772 (1977)

28
Homework Questions
  • 1) Construct one of the two types of shift tables
    for the pattern PICNIC
  • 2) List the number of comparisons required for
    the simple algorithm to find the pattern AAB in
    the string AAAAAAB
Write a Comment
User Comments (0)
About PowerShow.com