BP1110: Close Enough Indexed Record Retrieval In Progress Using Soundalikes and Near Matches - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

BP1110: Close Enough Indexed Record Retrieval In Progress Using Soundalikes and Near Matches

Description:

Simplify your business. 2003. Exchange. PROGRESS. Yellow Pages Smart Searching. User Can't Spell! CopyLeft 2003. BP1110: Close Enough - 1 ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 37
Provided by: prog3
Category:

less

Transcript and Presenter's Notes

Title: BP1110: Close Enough Indexed Record Retrieval In Progress Using Soundalikes and Near Matches


1
BP1110Close Enough Indexed Record Retrieval In
Progress Using Sound-alikes and Near Matches
  • Steve Southwell (ses_at_bravepointdallas.com)
  • Senior Consultant
  • BravePoint, Inc.

2
The Problem - User Perspective
  • Users expect intuitive text searches.
  • Google and other consumer-oriented web sites have
    raised the bar.
  • Find what I'm looking for not what I typed.
  • It's not my problem if I'm a bad speller
  • Oh yeah... Put the most interesting results at
    the top of the list.

3
Types of Searches Where Close Counts
  • Product Searches

4
Target Smart Searching Example
User Can't Spell!
5
Amazon Smart Searching Example
User Can't Spell!
6
Types of Searches Where Close Counts
  • Product Searches
  • Searches for Proper Names

7
Yellow Pages Smart Searching
User Can't Spell!
8
Types of Searches Where Close Counts
  • Product Searches
  • Searches for Proper Names
  • Full-text Searches

9
Google Smart Searching Example
User Can't Spell!
10
AltaVista Smart Searching Example
User Can't Spell!
11
The Problem Developer Perspective
  • Internal users need quick results. Time is
    money.
  • If customers want to to buy, I'll help them find
    it.
  • If they can't spell it, we still sell it.
  • A widget by any other name... It's still for
    sale.
  • List the good stuff first.

12
Technical Issues
  • How can Progress store what a word sounds like?
  • How do I search for sound-alikes or similar
    words?
  • How can I rank search results?

13
Determining What a Word Sounds Like
  • Soundex
  • Used by US Census Bureau since 1880
  • Intended to index surnames
  • Only codes starting letter and 3 sounds
  • Had to be simple enough to do by hand.

1 B, P, F, V 4 L
2 C, S, K, G, J, Q, X, Z 5 M,N 3
D, T 6 R
14
Soundex Examples
1 B, P, F, V 4 L
2 C, S, K, G, J, Q, X, Z 5 M,N 3
D, T 6 R
  • Last Name Southwell
  • Soundex S340
  • First letter S
  • Next consonant T 3
  • H W not represented.
  • Next consonant L 4
  • Next L is a double skip
  • Pad with 0

Other S340 Names Seidl, Steele, Staley, Stahl,
Stahley, Seidel, Settle, Shadle, Shotwell,
Shuttle, Sidwell, Southall, Stall, Steel, Steely,
Stell, Still, Stoll, Stowell, Stull, Sudlow,
Suttle
15
src/samples/soundex.p
  • DEFINE INPUT PARAMETER name AS CHARACTER
    NO-UNDO.
  • DEFINE OUTPUT PARAMETER code AS CHARACTER
    NO-UNDO.
  • DEFINE VARIABLE e AS INTEGER NO-UNDO.
  • DEFINE VARIABLE i AS INTEGER NO-UNDO.
  • DEFINE VARIABLE k AS CHARACTER NO-UNDO.
  • DEFINE VARIABLE l AS CHARACTER NO-UNDO.
  • ASSIGN
  • l ""
  • name CAPS(name)
  • code SUBSTRING(name,1,1).
  • DO i 2 TO LENGTH(name)
  • e ASC(SUBSTRING(name,i,1)) - 64.
  • IF e 1 AND e
  • k SUBSTRING("01230120022455012623010202",e,1
    ).
  • IF k l AND k "0" THEN code code k.
  • IF LENGTH(code) 3 THEN LEAVE.
  • END.

16
Soundey
  • More sound codes
  • Indexes vowel positions
  • Codes the entire word
  • Makes phonetic substitutions

0 aehiouwy 5 mn 1 bp 6 r 2
ckqx 7 fv 3 dt 8 gj 4
l 9 sz
17
Soundey Continued
  • Soundeylib.i available free at www.FreeFrameWork.o
    rg
  • More sophisticated than Soundex

0 aehiouwy 5 mn 1 bp 6 r 2
ckqx 7 fv 3 dt 8 gj 4
l 9 sz
18
Steps in Soundey Conversion
  • Pre-token
  • Mark word boundaries
  • Anywhere translations
  • Ends translations
  • Begins translations
  • Eliminate silent E
  • Unmark word boundaries
  • Translate characters to digits
  • Eliminate double digits

19
Soundey Example
  • Word Telephone Soundey 3040705
  • Replace 'ph' with 'f'
    telefone
  • Eliminate silent 'e' on the end
    telefon
  • Translate characters to digits
  • T 3, E 0, L4, E0, F7, O0, N5

  • 3040705

20
Technical Issues
  • How can Progress store what a word sounds like?
  • How do I search for sound-alikes or similar
    words?
  • How can I rank search results?

21
Using Soundey
  • Make necessary database modifications

22
Database Mods for Soundey
  • Add extra fields 2 per target field
  • SoundeyCode Straight Soundey translation
  • SoundFragList Allow matching on beginning or
    end of target word.
  • Word-indexes on above fields
  • Add soundey.df data definitions needed for
    Soundeylib.i
  • Load data for Soundey tables.

23
Using Soundey
  • Make necessary database modifications
  • Set up code to make search target fields
    Soundeyized

24
Populating Soundey Fields in DB
  • lib/soundeylib.i
  • ...
  • FOR EACH ITEM
  • EXCLUSIVE-LOCK
  • ASSIGN ITEM.soundeyCode
    toSoundey(ITEM.ItemName " "
    ITEM.CatDescription).
  • END.
  • ...

25
Using Soundey
  • Make necessary database modifications
  • Set up code to make search target fields
    Soundeyized
  • Use Soundey in 4gl queries

26
Using Soundey in 4gl Queries
  • MySearch toSoundey(MySearch).
  • FOR EACH ITEM
  • WHERE ITEM.SoundeyCode CONTAINS mySearch
    NO-LOCK
  • ...
  • END.

27
Soundey Use in 4gl
  • Demo of Sports2000 item search with Soundey
    itemsearch1.w

28
General Soundey Query Tips
  • Try regular contains search first.
  • Convert search string to Soundey code, and do
    contains search on Soundey code field.
  • Try Split and Rejoin
  • Other alternatives
  • Synonym and Related word searches
  • Neural Networks with User Feedback
  • Forced Ranking

29
Soundey Extensibility
  • Can make it replace known words or fragments
  • Anywhere
  • Beginning of words
  • Ending of words
  • GUI demonstration

30
Other Search Issues
  • Numbers and Ordinals
  • 29 Palms / Twentynine Palms
  • 5th Inning / Fifth Inning
  • Abbreviations / Slang
  • Ft. Worth, TX / Fort Worth, Texas
  • Hyphens / Compound Words
  • Word Synonyms

31
Technical Issues
  • How can Progress store what a word sounds like?
  • How do I search for sound-alikes or similar
    words?
  • How can I rank search results?

32
Ranking Search Results
  • Not an exact science
  • Can use many criteria
  • Number of word matches
  • Similarity to key words
  • Preferred results upsells, recent additions,
    etc.
  • Requires use of temp-table for results.
  • All results must be analyzed, so keep set small.
    (MAX-ROWS?)

33
Search Ranking Demonstration
  • itemsearch3.w

34
Technical Issues
  • How can Progress store what a word sounds like?
  • How do I search for sound-alikes or similar
    words?
  • How can I rank search results?

35
Source Code Availability
  • All source code used in this presentation can be
    found at the FreeFrameWork website
    http//www.freeframework.org
  • Up-to-date copy of this presentation available
    with the source code at the FreeFrameWork site.

36
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com