Title: Microsoft Research Indias Participation in FIRE2008
1Microsoft Research Indias Participation in
FIRE2008
- Raghavendra Udupa
- raghavu_at_microsoft.com
2 CLIR System
CLEF07 Query 10.2452/447-AH ??? ????????? ?????
?????? ??? ????????? ?? ???????? ??????? ?? ?????
?? ?? ???
Dictionary
??? ????????? ?? ???????
Query Translator
Pim Fortuyn politics
Inverted Index
Document Ranker
LA Times 2002 articles
3Domain Adaptation
Mining transliterations of OOV words
Mining Translation Lexicon from Comparable Corpora
Dictionary
Query Translator
Mining NETE Transliterations from Comparable
Corpora
Inverted Index
Document Ranker
Cross-Language Ranking Model
Document Collection
4Mining transliterations of OOV terms (ECIR 2009)
Domain Adaptation
Mining Translation Lexicon from Comparable
Corpora (MT Summit 2007)
Dictionary
Query Translator
Mining NETE Transliterations from Comparable
Corpora (CIKM08)
Inverted Index
Document Ranker
Cross-Language Ranking Models
Document Collection
5Baseline Retrieval System
- Language Model-Based Retrieval
Probabilistic Translation Lexicon
100K parallel sentences IBM Model 3
Alignment GIZA
J. Jagarlamudi and A. Kumaran, Cross-Lingual Infor
mation Retrieval System for Indian Languages.
Working Notes for the CLEF 2007 Workshop.
6FIRE Fighting
- Mining Transliterations of Out-Of-Vocabulary
Query Terms. - Date-Based Document Restriction.
7Mining Transliterations of Out-Of-Vocabulary
Query Terms
8OOV Query Terms
- Many OOV query terms are NEs
- NEs are often the focus of a query
- NEs form an open class of terms in all languages.
- Getting their transliterations right is extremely
important - Many OOV query terms are not NEs but
transliterations of English words. - E.g. ??????? (seminar), ???????????
(corporation), ???????? (champion), ????? (film)
9A Hypothesis
- The transliterations of most of the
transliteratable OOV terms of a query can be
found in documents relevant to the query.
10Empirical Validation
11A Practical Hypothesis
- The transliterations of many of the
transliteratable OOV terms of a query can be
found in the top results of the CLIR system for
the query.
12Mining OOV Transliteration Equivalents
- Basic Idea
- Pair the query with each of the top N results.
- Treat each pair as a comparable document pair.
- Mine transliteration equivalents from the
comparable document pairs.
They are out there, if you know where to look
Mining Transliterations of OOV Query Terms for
Cross-Language Information Retrieval ECIR 2009,
Toulouse
13Long Queries MAP
14Short Queries MAP
15FIRE 2008 MAP
16FIRE2008 MAP Difference (Long, official)
17FIRE 2008 Num_Rel_Ret
18FIRE 2008 P_at_10
19Mining Transliterations _at_ FIRE2008
20Date-Based Document Restriction
21Dates
- Some queries contain dates
- CLEF 2007, Topic 407 Who was the Australian
Prime Minister in 2002? - CLEF 2007, Topic 411 terrorist car bomb in
Bali, Indonesia, in 2002. - CLEF 2006, Topic 326 winners in any category of
the 1995 Emmy Awards. - CLEF 2006, Topic 327 earthquakes in Mexico City
in 1995.
22Hypothesis
- If a query contains a date then the relevant
documents for the query are likely to be from the
same time period.
23Empirical Validation
- CLEF07
- LATimes 2002
- CLEF06
- GH 95, LATimes 1994
24CLEF06 C327
- Title
- Earthquakes in Mexico City
- Description
- Find documents that provide details on the
impact of or the damage caused by earthquakes in
Mexico City in 1995. - Narrative
- Relevant document should contain some information
on earthquakes in Mexico City in 1995, such as
their magnitude, damages caused, panic of the
inhabitants, etc. Documents on earthquakes in
other places in Mexico are not relevant unless
the seismic impact was also felt in Mexico City.
25Relevant Document
- ltDOCNOgt LA121194-0313 lt/DOCNOgt
- ltDOCIDgt 107228 lt/DOCIDgt
- December 11, 1994, Sunday, Home Edition
- A magnitude 6.3 earthquake rocked Mexico City,
causing people to flee their homes in fear. There
were no immediate reports of injuries or severe
damage. The U.S. Geological Survey's National
Earthquake Information Center in Golden, Colo.,
said the quake's epicenter was in Petatlan in the
southwestern state of Guerrero.
26Date-Based Document Restriction
- Identify dates (if any) in the query.
- Restrict candidate documents to the set of
documents coming from the same time period.
27FIRE 2008 Relevant Docs
28FIRE 2008 Hindi?English MAP
29Date-Based Document Restriction _at_ FIRE2008
- Hurt us.
- Deeper investigation needed.