Metasearch Engines - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Metasearch Engines

Description:

Sends query to: AltaVista, Excite, Infoseek, LookSmart, Lycos, The Mining Co. ... AltaVista, Excite, Excite Subj. Guide, GoTo.com, Infoseek, Lycos, Lycos' a2z, ... – PowerPoint PPT presentation

Number of Views:258
Avg rating:3.0/5.0
Slides: 31
Provided by: hoz
Category:

less

Transcript and Presenter's Notes

Title: Metasearch Engines


1
Metasearch Engines
2
Course Outline (recap)
  • Introduction and the MPEG standards
  • The research issues in MPEG-7
  • Introduction to speech processing for multimedia
  • Introduction to statistical pattern recognition
  • Media indexing and retrieval
  • Past, present and future
  • Content-based retrieval (CBR)
  • Introduction to concept-based retrieval
  • Metasearch engines
  • Human-computer interface
  • Human body movement analysis
  • Human emotion recognition
  • Media transmission over peer-2-peer networks
  • Dynamic resource allocation in media
    transmission

3
This class
  • What are metasearch engines
  • Common features
  • Some metasearch engines
  • Research issues

4
Searching more than one database
  • Users find more good documents but must
  • Learn how to use each search engine
  • Combine results

5
Metasearch Engines
  • Metasearch engines search many databases in
    parallel
  • Combine results

6
Metasearch engine
Read query
Choose databases
For each chosen data base translate and send quer
y
7
Metasearch engine
Accept search results
Select a subset from each
Merge and display results
8
Advantages
  • A uniform query language
  • Choose best databases for query
  • Save users time
  • Provide better retrieval results

9
Common features
  • Search most of the popular search engines.
  • Fast, because they use "parallel" (i.e.,
    simultaneous) querying and have high-speed
    processors
  • Allow you to set length of wait time

10
Differences
  • How results are compiled when reported
  • How and whether they can handle complex searches
  • Whether you can customize the search strategy

11
How results are compiled when reported
  • Some report the results from each search engine
    in sequence
  • Others sort the results, eliminating duplicates.

  • In some you can specify how results are sorted

12
How and whether they can handle complex searches
  • Some allow phrase searching,
  • Some allow Boolean operators (especially OR and
    NOT)
  • Some strip out quotations or Boolean operators,
    or create garbage by passing them through as
    search terms.
  • Few allow you to request truncation.

13
Whether you can customize the search strategy
  • In some you have more flexibility to vary time
    limits and choose how results are reported.
  • Some let you specify which search tool databases
    are queried and in what order.

14
Metacrawler
  • No choice
  • Fast searches
  • Sends query to AltaVista, Excite, Infoseek,
    LookSmart, Lycos, The Mining Co., WebCrawler,
    Yahoo!
  • Identifies and removed duplicates
  • Consolidates results in one large list, ranked by
    a "vote"

15
Metacrawler
  • Merges results by first normalizing all the
    scores to values 0 to 1000
  • Then adding the scores of multiply retrieved
    documents
  • Query ALL terms (AND), ANY terms ( OR), or
    exact PHRASE. use /- and " around phrases.

16
Inference Find!
  • Queries 6 search engines currently uses
    WebCrawler, Yahoo!, Lycos, AltaVista, Infoseek,
    and Excite.
  • Results are merged and clustered redundancies
    are removed.
  • Default is AND (can use OR and NOT. ignored in
    tools that dont support)
  • Allows phrases in

17
Internet sleuth www.isleuth.com
  • Users may search for appropriate database (3000
    available)
  • Will search for appropriate database
  • A search for databases with pictures (or recipes)
    finds a variety of databases
  • Then users choose ones to search
  • Does not merge results

18
Dogpile
  • AltaVista, Excite, Excite Subj. Guide,GoTo.com,
    Infoseek, Lycos, Lycos' a2z, Magellan, The
    Mining Co., PlanetSearch, Thunderstone,
    WebCrawler, What-U-Seek, yahoo

19
Dogpile
  • List of hits after each search tool queried.
    Duplicates may occur
  • If 10 or more hits found among first 3 tried,
    option to search more.
  • Click on a link to a search engine

20
Cyber 411
  • Fast. Contacts 15 search engines for each query.

  • Query one word or phrase
  • Does not merge results

21
Savvysearch www.savvysearch.com
  • (Colorado State, Howes)
  • Search engines selected based on Query text,
  • Sources and types of information selected,
  • Estimated Internet traffic,
  • Anticipated response time
  • The load on CSU computer

22
Research issues
  • How to choose best DBs
  • How to merge results

23
Choosing the best databases automatically
  • Depends on available information
  • Different researchers and systems make different
    assumptions
  • Choose DB X if it can provide good documents and
    if users query can be executed

24
Stored queries/relevancy (Voorhese)
  • Queries with relevant results are stored
  • New query compared to stored queries
  • Use previous results to select databases and
  • Number of documents to merge from each

25
DB summary index (Callan)
  • Collection information is available
  • Commonly used keywords and their dfs
  • Query is compared to databases
  • Similarity used to select database and
  • Number of documents from each

26
Gloss
  • Assumes knowledge of database/terms dfs
  • Computes the probability of finding a document
    containing all of the query terms in database

27
Merging retrieval results
  • Similarity values may not be available
  • Similarity values may not be comparable
  • Should similarity be modified when documents are
    retrieved by more than one search engine

28
Same search engine different databases
  • Same ranking functions
  • However same document different similarity
    because of different database characteristics
    (idfs)

29
Experiments by Fox
  • Used maximum score (good for relevant document)
  • Used minimum score (good if non relevant)
  • Sum of scores, average

30
Difficulty of choosing and merging
  • Search engines are constantly updated
  • Interface changes
  • Search changes
  • Rank changes
  • Display of results changes
Write a Comment
User Comments (0)
About PowerShow.com