Server Selection on the World Wide Web - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Server Selection on the World Wide Web

Description:

In digital Libraries material is carefully chosen to meet peoples need of ... Retrieval is simulated using one of OKAPI BM25 or summed query term frequencies ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 20
Provided by: Rah43
Category:
Tags: okapi | selection | server | web | wide | world

less

Transcript and Presenter's Notes

Title: Server Selection on the World Wide Web


1
Server Selection on the World Wide Web
  • Nick Craswell and Peter Bailey
  • Department of Computer Science
  • The Australian National University,
  • Canberra Australia
  • David Hawking
  • CSIRO Mathematical and Information Sciences
  • Canberra Australia

2
Problem
  • In digital Libraries material is carefully chosen
    to meet peoples need of information quality and
    quantity.
  • Problem of including Web documents as they
    contain misleading information.
  • Selection of server to maintain information
    quality

3
Approach
  • Incorporate digital information retrieval broker
  • The broker is capable of selecting query and
    presenting the results of a number of chosen web
    search servers.

4
Goals
  • Evaluate the CORI, VGLOSS and CVV selection
    methods based on probe queries
  • Testing Sever effectiveness
  • Compare the effectiveness of Distributed and
    Centralized Retrieval in the same environment

5
Prior Work
  • Decision Theoretic Model has been developed for
    selection based on cost and benefit components.
  • ProFusion performs selection based on past search
    server query processing speed

6
Distributed Information Retrieval
  • Comprises document servers, search servers and a
    broker
  • S q ? S? q ? R1 RS? ? Rm
  • S Servers
  • q queries
  • S ? Selected Servers
  • R Result List
  • Rm Merged Result List

7
Distributed Vs centralized web search
  • Centralized solution - building a new search
    server that covers all documents of interest
  • Distributed solution - using a broker to address
    existing search servers.
  • The decision of which approach to use depends on
    system goals and resources availability.

8
Server Selection Motivation
  • Selection improves search efficiency
  • Selection also improves effectiveness

9
Server Selection Methods
  • Concentrates on effectiveness, taking a simple
    view of search cost.
  • Evaluates CORI, VGLOSS and CVV selection methods
    based on probe queries in an environment of
    heterogeneous web search servers.

10
CORI Selection Method
  • Ranks search servers as document surrogates
    consisting of the concatenation of the servers
    documents.
  • Uses tf, idf document ranking method as an
    analogy
  • Df analogous to tf and icf analogous to idf

11
CVV Selection Method
  • The CVV server ranking method is based on the
    Cue-Validity Variance (CVV) of query terms
  • Terms which can better discriminate between
    servers have a higher CVV and therefore
    contribute more to the suitability score

12
vGLOSS Selection Method
  • The vGLOSS server ranking methods ranks servers
    using server document frequency statistics and
    the vector sum of the servers normalized
    document vectors.

13
Probe Queries
  • Used to measure server effectiveness
  • Probe queries used were titles of TREC topics
    151-200 and 251-400
  • Multiple probe queries sent to servers to obtain
    statistics from the results.
  • Levels of probing 10, 25, 50, 100, 150 and 200

14
Estimating Server Effectiveness
  • Broker specific point in the ranking ten
    documents/server is chosen
  • Broker builds a merged list based on downloaded
    documents from servers S.
  • Top 20 documents in the merged list are marked
    relevant.
  • Broker calculates Ei, the mean number of relevant
    documents returned by server.

15
Estimating Server Effectiveness
  • CORI is modified by Ei
  • CORI belief value p p Ei(0.03)

16
Evaluation Frame Work
  • The test collection is official TREC 8 Small
    Web Track
  • The 2GB of web documents are partitioned into 956
    servers
  • Retrieval is simulated using one of OKAPI BM25
    or summed query term frequencies or Boolean
    conjunction of query terms.
  • Retrieval algorithms are assigned arbitrarily to
    servers.

17
Experiment
  • 200 Probe Queries of TREC topics 151 200, 251-
    400
  • Centralized indexing under three levels is
    considered 100, 50 and 25
  • For all Centralized Indexing retrieval is BM25

18
Results
  • CORI is best for selecting a set of 10 servers,
    outperforming vGlOSS and CVV.
  • For low numbers of Probe Queries, CORI
    effectiveness degrades more than that of vGlOSS
    or CVV
  • Modifying CORI according to estimated server
    effectiveness Ei provided no significant
    improvement to overall effectiveness

19
Results
  • Modifying CORI effectiveness based on relevance
    judgments yielded a statistically significant
    improvement for ten and 25 probe queries.
  • Distributed retrieval over ten servers per query
    is not as effective as retrieval over a
    centralized index with 100 coverage. For 50 and
    25 coverage the distributed case fares better.
Write a Comment
User Comments (0)
About PowerShow.com