Search - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Search

Description:

Foundations of Computing ... Search – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 33
Provided by: DanR79
Category:
Tags: search

less

Transcript and Presenter's Notes

Title: Search


1
Search
2
Search issues
  • How do we say what we want?
  • I want a story about pigs
  • I want a picture of a rooster
  • How many televisions were sold in Vietnam during
    2000?
  • Find a movie like this one
  • How does the computer find what we said?

3
Things to search for
  • Records
  • Text
  • Images
  • Audio
  • Video

4
Records
  • Car
  • Price 5,000
  • Miles 20,000
  • Year 1994
  • Make Toyota
  • Doors 2
  • Queries
  • Price lt 6000 Mileslt100000
  • Make Toyota Year gt 1993

5
Queries
  • Make Toyota Year gt1993

6
Queries
  • Make Toyota Year gt1993

7
Queries
  • Year gt1993 or Price lt 3,000

8
Queries
  • Year gt1993 or Price lt 3,000

9
Databases
  • Large collections of records
  • Accessed by queries

10
Things to search for
  • Records
  • Text
  • Images
  • Audio
  • Video

11
Text searching
  • How do I say what I want?
  • Type some phrase
  • I want a story about pigs
  • How will the computer match this?
  • What is text?
  • An array of characters
  • What can can a computer do with text?
  • Match characters

12
Text searching
  • People think in words not characters
  • How do I convert an array of characters into an
    array of words?
  • Collect together sequences of letters
  • How do I know if character C is a letter?
  • Cgta Cltz CgtA CltZ

13
Convert to words
  • Because people think in words

14
Every document is an array of words
  • I want a story about pigs
  • How will I find the right documents?
  • Find all documents that have the word pigs

15
Searching text
  • How will I find pigs fast?
  • Hint the URL Lookup assignment
  • Create an index of all words
  • With each word store the name or address of each
    document that contains that word
  • Search the index for pigs
  • Return the list of documents
  • Use a binary search on the word list (50,000
    words)

16
Problems
  • What if a document has the word Pig but not
    pigs?
  • Normalize
  • Case - make all words lower case
  • Pig -gt pig
  • Stemming - remove all suffixes and prefixes
    before putting a word into the index
  • pigs -gt pig
  • piggy -gt pig

17
Problems
  • I want a story about pigs?
  • How does the computer know to search for pigs?
  • It doesnt
  • How does the computer know what a story is?
  • It doesnt

18
Searching
  • I want a story about pigs
  • Pick out the important words and search for them
  • Which words are important?
  • D number of times a word appears in a document
  • A average number of times a word appears in all
    documents
  • Importance D/A
  • Why?

19
How do we create an index of all documents on the
Web?
  • Try a list of URLs
  • Seen all URLs from Seen
  • While (Try is not empty)
  • Page take a URL from Try
  • Words all the important words in Page
  • add Page to the index using all of Words
  • Links all URLs in Page
  • for every Link that is not in Seen add Link to
    Try and to Seen

20
Other ways to find important words and important
documents
  • A Document is important if many other documents
    point to it
  • A word is important in document D if that word
    occurs frequently in documents that link to
    document D.

21
Images
  • What will I say when searching for an image?
  • I want a rooster picture
  • Draw a picture of a rooster?

22
Search by picture?
?
23
Whats in a picture?
  • Computers dont understand the contents of images
  • To a computer an image is an array of colors

24
I want a picture of a rooster
  • Label all of the pictures
  • How does Google do it?
  • File name of the picture rooster-crossingSt.jpg
  • Words around the picture in the HTML

25
Audio
  • Talking
  • Use speech recognition to convert audio to text
  • With each recognized word keep track of where in
    the audio it was recognized.
  • Build an index using the recognized text
  • Normalize based on how words sound rather than
    are spelled.

26
Video
  • Where in Casablanca does Bogart say Play it
    again Sam ?
  • he never does, he just says play it
  • How can the computer find that?
  • Transcribe the audio
  • Speech recognition on the audio

27
Video
  • Does Woody ever kiss Bo Peep?
  • Exactly what color is a kiss?

28
Video
  • Does Woody ever kiss Bo Peep?
  • Annotate every frame with who is in the frame and
    search for frames with both Woody and Bo Peep.

29
So whats with this?
30
Or this?
31
Is Woody cheating?
32
Search
  • Records
  • Queries
  • lt gt And Or
  • Text
  • Normalized words (case, stemming, thesaurus)
  • Images
  • Add words
  • Audio
  • Transcribe or recognize as words
  • Video
  • Transcribe
  • Annotate
Write a Comment
User Comments (0)
About PowerShow.com