Title: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists
1CLEF 2005 Multilingual Retrieval by Combining
MultipleMultilingual Ranked Lists
- Luo Si and Jamie CallanLanguage Technology
Institute, School of Computer ScienceCarnegie
Mellon University - CLEF 2005
2Task Definition
- Multi-8 Two Years On
- multilingual information retrieval
- Multi-8 Merging Only
- participants merge provided bilingual ranked
lists -
3Task 1Multilingual Retrieval System
4Task 1Multilingual Retrieval System
- Text Preprocessing
- Stop words
- Stemming
- Decompounding
- Word translation
5Task 1Multilingual Retrieval System
6Task 1 Multilingual Retrieval System
- Method 1
- Multilingual Retrieval via Query Translation no
query feedback - Raw score merge and Okapi system
- Method 2
- Multilingual Retrieval via Query Translation with
query feedback - Raw score merge and Okapi system
- Method 3
- Multilingual Retrieval via Document Translation
no query feedback Raw score merge and Okapi
system - Method 4
- Multilingual Retrieval via Document Translation
with query feedback Raw score merge and Okapi
system - Method 5 UniNe System
7Task 1Multilingual Retrieval System
8Task 1Multilingual Retrieval System
- Normalization
- drsk_mj denote the raw document score for the jth
document retrieved from the mth ranked list for
kth query,
9Task 1Multilingual Retrieval System
10Task 1Multilingual Retrieval System
- Combine Multilingual Ranked Lists
- (wm , rm) represents the weight of the vote and
the exponential normalization factor for the mth
ranked list
11Task 1Experimental Results Multilingual
Retrieval
- Qry/Doc what was translated
- fb/nofb with/without pseudo relevance back
- UniNe UniNE system
12Task 1Experimental Results Multilingual
Retrieval
- MX Combine models
- W1/Trn Equal or learned weights
13Task 2Results Merge for Multilingual Retrieval
- merge ranked lists of eight different languages
(i.e., bilingual or monolingual) into a single
final list - Logistic model (rank , doc score)?
- language-specific methods ?
- query-specific language-specific
14Task 2Results Merge for Multilingual Retrieval
- Learn Query-Independent and Language-Specific
Merging Model - estimated probability of relevance of document
dk_ij - Model parameter
- maximizing the log-likelihood (MLE)
- maximizing MAP
15Task 2Results Merge for Multilingual Retrieval
- Learn Query-Specific and Language-Specific
Merging Model - Calculate comparable scores for top ranked
documents in each language - (1) Combine scores of query-based and doc-based
translation methods - (2) Build language-specific query-specific
logistic models to transform language-specific
scores to comparable scores
16Task 2Results Merge for Multilingual Retrieval
- (2) Build language-specific query-specific
logistic models to transform language-specific
scores to comparable scores - logistic model parameter estimate
- minimize the mean squared error between exact
normalized comparable scores and the estimated
comparable scores - Estimate comparable scores for all retrieved
documents in each language - Use comparable scores to create a merged
multilingual result list
17Task 2Experimental Results Results Merge
- Query-independent , language-specific
- Mean average precision of merged multilingual
lists of different methods on UniNE result lists - Mean average precision of merged multilingual
lists of different methods on HummingBird result
lists
MAP is more accurate
18Task 2Experimental Results Results Merge
- Query-specific , language-specific
- Mean average precision of merged multilingual
lists of different methods on UniNE result lists - C_X top X docs from each list merged by exact
comparable scores. - Top_X_0.5 top X docs from each list downloaded
for logistic model to estimate comparable scores
and combine them with exact scores by equal
weight
This means that the combination of estimated
comparable scores and exact comparable scores can
be more accurate than exact comparable scores in
some cases
19Task 2Experimental Results Results Merge
- Query-specific , language-specific
- Mean average precision of merged multilingual
lists of different methods on HummingBird result
lists - Outperform query-independent and
language-specific algorithm