Server Selection on the World Wide Web - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Server Selection on the World Wide Web

Description:

In digital Libraries material is carefully chosen to meet peoples need of ... Retrieval is simulated using one of OKAPI BM25 or summed query term frequencies ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 20

Provided by: Rah43

Category:

more less

Transcript and Presenter's Notes

Title: Server Selection on the World Wide Web

1
Server Selection on the World Wide Web

Nick Craswell and Peter Bailey
Department of Computer Science
The Australian National University,
Canberra Australia
David Hawking
CSIRO Mathematical and Information Sciences
Canberra Australia

2
Problem

In digital Libraries material is carefully chosen
to meet peoples need of information quality and
quantity.
Problem of including Web documents as they
contain misleading information.
Selection of server to maintain information
quality

3
Approach

Incorporate digital information retrieval broker
The broker is capable of selecting query and
presenting the results of a number of chosen web
search servers.

4
Goals

Evaluate the CORI, VGLOSS and CVV selection
methods based on probe queries
Testing Sever effectiveness
Compare the effectiveness of Distributed and
Centralized Retrieval in the same environment

5
Prior Work

Decision Theoretic Model has been developed for
selection based on cost and benefit components.
ProFusion performs selection based on past search
server query processing speed

6
Distributed Information Retrieval

Comprises document servers, search servers and a
broker
S q ? S? q ? R1 RS? ? Rm
S Servers
q queries
S ? Selected Servers
R Result List
Rm Merged Result List

7
Distributed Vs centralized web search

Centralized solution - building a new search
server that covers all documents of interest
Distributed solution - using a broker to address
existing search servers.
The decision of which approach to use depends on
system goals and resources availability.

8
Server Selection Motivation

Selection improves search efficiency
Selection also improves effectiveness

9
Server Selection Methods

Concentrates on effectiveness, taking a simple
view of search cost.
Evaluates CORI, VGLOSS and CVV selection methods
based on probe queries in an environment of
heterogeneous web search servers.

10
CORI Selection Method

Ranks search servers as document surrogates
consisting of the concatenation of the servers
documents.
Uses tf, idf document ranking method as an
analogy
Df analogous to tf and icf analogous to idf

11
CVV Selection Method

The CVV server ranking method is based on the
Cue-Validity Variance (CVV) of query terms
Terms which can better discriminate between
servers have a higher CVV and therefore
contribute more to the suitability score

12
vGLOSS Selection Method

The vGLOSS server ranking methods ranks servers
using server document frequency statistics and
the vector sum of the servers normalized
document vectors.

13
Probe Queries

Used to measure server effectiveness
Probe queries used were titles of TREC topics
151-200 and 251-400
Multiple probe queries sent to servers to obtain
statistics from the results.
Levels of probing 10, 25, 50, 100, 150 and 200

14
Estimating Server Effectiveness

Broker specific point in the ranking ten
documents/server is chosen
Broker builds a merged list based on downloaded
documents from servers S.
Top 20 documents in the merged list are marked
relevant.
Broker calculates Ei, the mean number of relevant
documents returned by server.

15
Estimating Server Effectiveness

CORI is modified by Ei
CORI belief value p p Ei(0.03)

16
Evaluation Frame Work

The test collection is official TREC 8 Small
Web Track
The 2GB of web documents are partitioned into 956
servers
Retrieval is simulated using one of OKAPI BM25
or summed query term frequencies or Boolean
conjunction of query terms.
Retrieval algorithms are assigned arbitrarily to
servers.

17
Experiment

200 Probe Queries of TREC topics 151 200, 251-
400
Centralized indexing under three levels is
considered 100, 50 and 25
For all Centralized Indexing retrieval is BM25

18
Results

CORI is best for selecting a set of 10 servers,
outperforming vGlOSS and CVV.
For low numbers of Probe Queries, CORI
effectiveness degrades more than that of vGlOSS
or CVV
Modifying CORI according to estimated server
effectiveness Ei provided no significant
improvement to overall effectiveness

19
Results

Modifying CORI effectiveness based on relevance
judgments yielded a statistically significant
improvement for ten and 25 probe queries.
Distributed retrieval over ten servers per query
is not as effective as retrieval over a
centralized index with 100 coverage. For 50 and
25 coverage the distributed case fares better.

Write a Comment

User Comments (0)