Title: Impact of Database Selection on
1Impact of Database Selection on Distributed
Searching
By Pinki Thakkar Uma Gopinath
2The impact of database selection on distributed
searching
- The Problem
- Centralized Information Retrieval
- Expensive in terms of bandwidth
- Speed
- Size
- Problem finding data when net grows
3The impact of database selection on distributed
searching
- Distributed Information Retrieval
- Select the databases to which queries will
be sent. - Process the queries at the selected
databases producing result-lists. - Merge the result-lists into one.
4The impact of database selection on distributed
searching
Submit Query
Merge and Present results
Internet
DB156
DB500
DB55
5 The impact of database selection on distributed
searching
- Prior Work
- Retrieval in centralized environment.
- Heterogeneous environment.
- Document clustering approach.
6 The impact of database selection on distributed
searching
- Methodology used
- Database Selection
- Collection Inference Network(CORI) Algorithm
For a query term rk T d_t (1 - d_t) . log(df
0.5) log(max_df 1.0) I log( c 0.5
) cf log( C 1.0 )
7The impact of database selection on distributed
searching
- p(rk ci) d_b (1 - d_b). T . I
- where,
- df number of documents in ci
containing rk - max_df number of docs containing the most
frequent term in ci - C number of collections
- cf number of collections containing
term rk - d_t minimum term frequency component
when term rk occurs in
collection ci - d_b minimum belief component when term
rk occurs in collection ci
8 The impact of database selection on distributed
searching
Example of CORI algorithm C 2
D1
A, B, C,C,C
C1
D2 A,A,C,C
D3 A,B,B,C,C,C,C
C2
D4 A,C,C,C,C
rk A d_t d_b 0.4 df 4 max_df 4 C
2 cf 2
9The impact of database selection on distributed
searching
T 0.4 0.6 . ( log (4 0.5))
log(4 1.0) 0.95 I log (20.5) 2
log(2 1.0) 0.20 p(rk ci) 0.4 0.6 .
0.95 . 0.20 0.514
10The impact of database selection on distributed
searching
- Relevance Based Ranking(RBR)
- Rankings produced by using relevance
judgements supplied with TREC data - Databases ordered by number of relevant docs
11The impact of database selection on distributed
searching
- Query Processing using Inquery
- Document network.
- Query network.
- Combine the networks to get conditional
probability. - Rank the system using this probability.
12 The impact of database selection on distributed
searching
- Merging Results-list
- Raw score merge
- Inquery multi-database merging algorithm
13The impact of database selection on distributed
searching
- Scenarios considered
- Centralized
- dist-CWI(Collection-wide-information)
- dist-LI(Local Information)
14The impact of database selection on distributed
searching
- Results
- Centralized Scenario.
- -Comparison of Distributed and Centralized
performance. - Comparison of dist-CWI and dist-LI.
- Number of Databases Searched.
- Alternate interpretation of dist-CWI.
15The impact of database selection on distributed
searching
16The impact of database selection on distributed
searching
Average precision achieved in dist-CWI and
dist-LI for UBC-100, SYM-236 Bold significantly
better, italics Significantly worse
17The impact of database selection on distributed
searching
Average precision achieved in dist-CWI and
dist-LI for UDC-236 Bold significantly better,
italics Significantly worse
18The impact of database selection on distributed
searching
19The impact of database selection on distributed
searching
20The impact of database selection on distributed
searching
- Conclusions
- Good database selection gives distributed
retrieval edge. - Selecting more databases improves performance
only up to a point. - Using local information works well if good
selection employed. - Given a good selection, conceptually decomposing
a centralized database and interposing a
selection step has the potential to improve
performance.
21The impact of database selection on distributed
searching
QUESTIONS