Approaches to Collection Selection and Results Merging for Distributed Information Retrieval. - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Approaches to Collection Selection and Results Merging for Distributed Information Retrieval.

Description:

What's wrong with convenient IR systems ? Insufficient bandwidth, Server overload, ... Figure 1. Single index IR vs. distributed IR, Collection Selection. ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 13
Provided by: tarv5
Category:

less

Transcript and Presenter's Notes

Title: Approaches to Collection Selection and Results Merging for Distributed Information Retrieval.


1
Approaches to Collection Selection and Results
Mergingfor Distributed Information Retrieval.
  • By Y. Rasolofo, F. Abbaci, J. Savoy.

2
Whats wrong with convenient IR systems ?
  • Insufficient bandwidth,
  • Server overload,
  • Unacceptable time consumptions.

3
(No Transcript)
4
Collection Selection.
  • Two ways to select collections
  • Pick up collections with the N highest scores.
  • Pick up collections that have scores higher than
    some threshold are picked
  • Two main methods of collection picking using
  • Collections descriptions
  • Collections statistics

5
Approach Proposed
  • Main evaluation criteria
  • Number of terms included in each document
    surrogate
  • Distance between number of terms and they
    frequencies

6
Approach Proposed
Document Score calculations
Distances
and
7
Result Set Merging - Prior Work
  • Round robin
  • Raw Score Merging
  • What if collection stats are very different? idf?
  • Score merging with collection weights
  • Normalize by max document score
  • CORI
  • wi 1 C(si sm) / sm)
  • Sm is mean document score C is number of
    collections

8
Result Set Merging - LSMusing result Length to
calculate Merging Score
  • No collection statistics neccesary. Inputs are
    document scores and result lengths
  • Increase document scores if above mean score,
    decrease if below
  • Collections that return a lot of documents are
    more likely to return relevant documents
  • si log(1 ((li K)/sum(j1..C, lj))
  • wi 1 (si sm) / sm)
  • K is a constant (600), si is the score for the
    ith collection, li is the number of documents
    returned by the ith collection

9
Analysis of Experiments Results Major
Assumptions
  • Test documents came from the TREC8 and TREC9
    conferences
  • Only topic titles having two words on average
    were used to simulate typical queries sent by
    search engine users.

10
Analysis of Experiments Results Interesting
Findings
  • Spelling errors found in multiple queries led to
    no documents returned
  • The authors used two methodologies to show that
    their approach was better than those compared
    against
  • TREC_EVAL was used to compute average precision
    after retrieving 5, 10, 15, 20, 30, 100, 200, 500
    and 1000 documents
  • Sign test verification was used to validate that
    their findings were statistically significant

11
Analysis of Experiments Results TREC8 Results
12
Analysis of Experiments Results TREC9 Results
Write a Comment
User Comments (0)
About PowerShow.com