Parallel and Distributed Searching - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Parallel and Distributed Searching

Description:

Indicate how Searches may be carried out in parallel. Overview ... Missing gems. Missing Gems Example. Query. wear characteristics of high titanium steel alloys ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 21
Provided by: john309
Category:

less

Transcript and Presenter's Notes

Title: Parallel and Distributed Searching


1
Parallel and Distributed Searching
2
Lecture Objectives
  • Review Boolean Searching
  • Indicate how Searches may be carried out in
    parallel
  • Overview Distributed Searching
  • Collection Partioning
  • Query Processing
  • Collection/Results Fusion

3
Boolean Queries
  • Queries with terms connected by AND OR and NOT
  • (Internet AND retrieval) AND (NOT english)
  • world wide web OR internet

4
Advantages
  • Easy to Implement
  • Allow very precise query specifications
  • Facilitate parallel execution

5
Disadvantages
  • People are bad at Boolean algebra
  • Difficult to interpret to get effective relevance
    ranking
  • Difficult to include sensible query weighting

6
Parallel Searching
  • Useful in improving performance in very
    large/heavily used search engines
  • break query down into several subqueries
  • execute each at the same time
  • combine results
  • share subqueries between different searches

7
Distributed Searching
  • More about metasearching and turning plain
    searching into metasearching

8
Distribution Methods
  • Multiple copies of collection mirror sites
  • Why not split the documents between servers
    according to their topics ?

9
Collection Partioning
  • Manual/Semi automatic Topic Partioning
  • medical vs engineering
  • books vs CDs
  • One Central Index
  • One Index per server

10
Distributed Query Processing
  • Select collections to search
  • distribute query to selected collections
  • evaluate query at selected servers in parallel
  • combine results into a final result

11
Source Selection
  • Obtain global term distribution data
  • on the web ?????
  • Analyse central index of collection relevance
  • Missing gems

12
Missing Gems Example
  • Query
  • wear characteristics of high titanium steel
    alloys
  • actually occurs in medical collection describing
    use in artificial hips

13
Results Fusion
  • Want to present a single result collected from
    several sources
  • Also known as collection fusion because it makes
    several collections appear as one

14
Results Fusion
  • How do you put together the results from several
    web sites/search engines into a single combined
    result ?

Collection at a time
Relevance Ranked
Round robin
15
Collection at a Time
  • Use e.g. tf idf across each collection to rank
    searched collection by relevance
  • Display the results from the best collection first

16
Tf idf
  • Tf - term frequency
  • terms that are frequently mentioned in individual
    documents improve recall
  • idf - inverse document frequency
  • inversely proportional to the number of documents
    which mention a term
  • prefers discriminating terms

17
Round Robin
  • Take the first document from collection 1
  • Then the first document from collection 2
  • and so on for each collection
  • then the second document from collection 1
  • and so on

18
Relevance based methods
  • Calculate Relevance for the documents returned by
    each selected source
  • Try to calculate some global statistics
  • Use some special measures

19
Other Alternatives
  • Random
  • Firstcome first show
  • etc .

20
Conclusions
  • Parallel Searching is one way to speed up
    searching
  • Distributing Information can help ease/speed
    searching and but has some dangers
  • Some solutions to the results fusion problem
Write a Comment
User Comments (0)
About PowerShow.com