Metasearch - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Metasearch

Description:

Metasearch. Search Engines which do not use crawlers. Metasearch Engine ... Does not store its own indexes. Does not need a crawler ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 22
Provided by: osirisSun
Category:

less

Transcript and Presenter's Notes

Title: Metasearch


1
Metasearch
  • Search Engines which do not use crawlers

2
Metasearch Engine
  • A search engine which operates as the front end
    to other search engines
  • Does not store its own indexes
  • Does not need a crawler
  • Processes queries and passes them to other search
    engines

3
Metasearch
  • How does it work ?

4
Metasearch Engine Operation
  • Query Entry Page
  • Send the query to each (full) search engine
  • Take the results from each search engine
  • Make them into a single result
  • Show them to the user including links

5
Example MetaSearch Engines
  • www.dogpile.com
  • http//vivisimo.com/
  • Interesting uses clustering
  • http//www.metacrawler.com/

6
Revision
  • Search Engine Architectures

7
Possible Architectures
  • Centralised
  • Shadows
  • Mirrors
  • Distribution by Content

8
Centralised
  • One server
  • Handles all queries
  • Accepts all updates from all spiders
  • Disadvantages
  • Slow
  • Fragile
  • Advantage
  • simple

9
Shadow
  • One Primary Server
  • All updates on Primary Server
  • Shadows periodically receive new indices
  • Queries routed to nearest available shadow

10
Mirrors
  • Each Server accepts both queries and updates
  • Periodically batches of updates are transmitted
    to other mirror sites

11
Distribution by Content
  • Each server specialises in documents about a
    topic
  • Spiders update indexes depending on content
  • Queries are routed to servers specialising in
    their (likely) content Each server specialise

12
Metasearch
  • Distorting Mirrors Variant

13
Metasearch Engine
  • Metasearch Engine accepts queries
  • Processes them and passses them on to
    conventional search engines
  • Collates results

14
Problem ???
  • How to combine results

15
Results Fusion
  • Users want to see a single result ranked list
  • They dont want to see separate result lists from
    each engine
  • They dont want duplicates

16
Round Robin
  • Take the first document from collection 1
  • Then the first document from collection 2
  • and so on for each collection
  • then the second document from collection 1
  • and so on

17
Engine at a Time
  • Use e.g. tf idf across each collection (or a
    sample) to rank searched collection by relevance
  • Display the results from the best engine first

18
Relevance based methods
  • Calculate Relevance for the documents returned by
    each selected source
  • Try to calculate some global statistics
  • Use some special measures

19
Other Alternatives
  • Random
  • Firstcome first show
  • etc .

20
Conclusions
  • Metasearch is a method of providing search
    engines without providing a full infrastructure
  • It is comparatively simple but works well

21
More Information
  • http//searchenginewatch.com/links/article.php/215
    6241
  • Uses term metacrawler but they dont crawl !
  • Weiyi Meng,  Clement Yu, King-Lup Liu  Building
    efficient and effective metasearch engines ACM
    Computing Surveys (CSUR) Volume 34 ,  Issue 1
     (March 2002) Pages 48 - 89  
Write a Comment
User Comments (0)
About PowerShow.com