Impact of Database Selection on - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Impact of Database Selection on

Description:

Average precision achieved in dist-CWI and dist-LI for UBC-100, SYM-236. Bold: significantly better, italics : Significantly worse ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 22
Provided by: mot130
Category:

less

Transcript and Presenter's Notes

Title: Impact of Database Selection on


1
Impact of Database Selection on Distributed
Searching
By Pinki Thakkar Uma Gopinath
2
The impact of database selection on distributed
searching
  • The Problem
  • Centralized Information Retrieval
  • Expensive in terms of bandwidth
  • Speed
  • Size
  • Problem finding data when net grows

3
The impact of database selection on distributed
searching
  • Distributed Information Retrieval
  • Select the databases to which queries will
    be sent.
  • Process the queries at the selected
    databases producing result-lists.
  • Merge the result-lists into one.

4
The impact of database selection on distributed
searching

Submit Query
Merge and Present results

Internet

DB156
DB500
DB55

5


The impact of database selection on distributed
searching
  • Prior Work
  • Retrieval in centralized environment.
  • Heterogeneous environment.
  • Document clustering approach.

6


The impact of database selection on distributed
searching
  • Methodology used
  • Database Selection
  • Collection Inference Network(CORI) Algorithm

For a query term rk T d_t (1 - d_t) . log(df
0.5) log(max_df 1.0) I log( c 0.5
) cf log( C 1.0 )

7
The impact of database selection on distributed
searching
  • p(rk ci) d_b (1 - d_b). T . I
  • where,
  • df number of documents in ci
    containing rk
  • max_df number of docs containing the most
    frequent term in ci
  • C number of collections
  • cf number of collections containing
    term rk
  • d_t minimum term frequency component
    when term rk occurs in
    collection ci
  • d_b minimum belief component when term
    rk occurs in collection ci

8

The impact of database selection on distributed
searching
Example of CORI algorithm C 2
D1
A, B, C,C,C
C1
D2 A,A,C,C
D3 A,B,B,C,C,C,C
C2
D4 A,C,C,C,C
rk A d_t d_b 0.4 df 4 max_df 4 C
2 cf 2
9
The impact of database selection on distributed
searching
T 0.4 0.6 . ( log (4 0.5))
log(4 1.0) 0.95 I log (20.5) 2
log(2 1.0) 0.20 p(rk ci) 0.4 0.6 .
0.95 . 0.20 0.514
10
The impact of database selection on distributed
searching
  • Relevance Based Ranking(RBR)
  • Rankings produced by using relevance
    judgements supplied with TREC data
  • Databases ordered by number of relevant docs

11
The impact of database selection on distributed
searching
  • Query Processing using Inquery
  • Document network.
  • Query network.
  • Combine the networks to get conditional
    probability.
  • Rank the system using this probability.

12


The impact of database selection on distributed
searching
  • Merging Results-list
  • Raw score merge
  • Inquery multi-database merging algorithm

13
The impact of database selection on distributed
searching
  • Scenarios considered
  • Centralized
  • dist-CWI(Collection-wide-information)
  • dist-LI(Local Information)

14
The impact of database selection on distributed
searching
  • Results
  • Centralized Scenario.
  • -Comparison of Distributed and Centralized
    performance.
  • Comparison of dist-CWI and dist-LI.
  • Number of Databases Searched.
  • Alternate interpretation of dist-CWI.

15
The impact of database selection on distributed
searching
16
The impact of database selection on distributed
searching
Average precision achieved in dist-CWI and
dist-LI for UBC-100, SYM-236 Bold significantly
better, italics Significantly worse
17
The impact of database selection on distributed
searching
Average precision achieved in dist-CWI and
dist-LI for UDC-236 Bold significantly better,
italics Significantly worse
18
The impact of database selection on distributed
searching
19
The impact of database selection on distributed
searching
20
The impact of database selection on distributed
searching
  • Conclusions
  • Good database selection gives distributed
    retrieval edge.
  • Selecting more databases improves performance
    only up to a point.
  • Using local information works well if good
    selection employed.
  • Given a good selection, conceptually decomposing
    a centralized database and interposing a
    selection step has the potential to improve
    performance.

21
The impact of database selection on distributed
searching
QUESTIONS
Write a Comment
User Comments (0)
About PowerShow.com