Microsoft Research Indias Participation in FIRE2008 - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Microsoft Research Indias Participation in FIRE2008

Description:

CLEF'07 Query #10.2452/447-AH. ??? ????????? ????? ?????? ??? ????????? ?? ???????? ??????? ... Mining NETE Transliterations from Comparable Corpora (CIKM'08) ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 30
Provided by: chris455
Category:

less

Transcript and Presenter's Notes

Title: Microsoft Research Indias Participation in FIRE2008


1
Microsoft Research Indias Participation in
FIRE2008
  • Raghavendra Udupa
  • raghavu_at_microsoft.com

2
CLIR System
CLEF07 Query 10.2452/447-AH ??? ????????? ?????
?????? ??? ????????? ?? ???????? ??????? ?? ?????
?? ?? ???
Dictionary
??? ????????? ?? ???????
Query Translator
Pim Fortuyn politics
Inverted Index
Document Ranker
LA Times 2002 articles
3
Domain Adaptation
Mining transliterations of OOV words
Mining Translation Lexicon from Comparable Corpora
Dictionary
Query Translator
Mining NETE Transliterations from Comparable
Corpora
Inverted Index
Document Ranker
Cross-Language Ranking Model
Document Collection
4
Mining transliterations of OOV terms (ECIR 2009)
Domain Adaptation
Mining Translation Lexicon from Comparable
Corpora (MT Summit 2007)
Dictionary
Query Translator
Mining NETE Transliterations from Comparable
Corpora (CIKM08)
Inverted Index
Document Ranker
Cross-Language Ranking Models
Document Collection
5
Baseline Retrieval System
  • Language Model-Based Retrieval

Probabilistic Translation Lexicon
100K parallel sentences IBM Model 3
Alignment GIZA
J. Jagarlamudi and A. Kumaran, Cross-Lingual Infor
mation Retrieval System for Indian Languages.
Working Notes for the CLEF 2007 Workshop.
6
FIRE Fighting
  • Mining Transliterations of Out-Of-Vocabulary
    Query Terms.
  • Date-Based Document Restriction.

7
Mining Transliterations of Out-Of-Vocabulary
Query Terms
  • Raghavendra Udupa

8
OOV Query Terms
  • Many OOV query terms are NEs
  • NEs are often the focus of a query
  • NEs form an open class of terms in all languages.
  • Getting their transliterations right is extremely
    important
  • Many OOV query terms are not NEs but
    transliterations of English words.
  • E.g. ??????? (seminar), ???????????
    (corporation), ???????? (champion), ????? (film)

9
A Hypothesis
  • The transliterations of most of the
    transliteratable OOV terms of a query can be
    found in documents relevant to the query.

10
Empirical Validation
11
A Practical Hypothesis
  • The transliterations of many of the
    transliteratable OOV terms of a query can be
    found in the top results of the CLIR system for
    the query.

12
Mining OOV Transliteration Equivalents
  • Basic Idea
  • Pair the query with each of the top N results.
  • Treat each pair as a comparable document pair.
  • Mine transliteration equivalents from the
    comparable document pairs.

They are out there, if you know where to look
Mining Transliterations of OOV Query Terms for
Cross-Language Information Retrieval ECIR 2009,
Toulouse
13
Long Queries MAP
14
Short Queries MAP
15
FIRE 2008 MAP
16
FIRE2008 MAP Difference (Long, official)
17
FIRE 2008 Num_Rel_Ret
18
FIRE 2008 P_at_10
19
Mining Transliterations _at_ FIRE2008
  • Worked.

20
Date-Based Document Restriction
  • Raghavendra Udupa

21
Dates
  • Some queries contain dates
  • CLEF 2007, Topic 407 Who was the Australian
    Prime Minister in 2002?
  • CLEF 2007, Topic 411 terrorist car bomb in
    Bali, Indonesia, in 2002.
  • CLEF 2006, Topic 326 winners in any category of
    the 1995 Emmy Awards.
  • CLEF 2006, Topic 327 earthquakes in Mexico City
    in 1995.

22
Hypothesis
  • If a query contains a date then the relevant
    documents for the query are likely to be from the
    same time period.

23
Empirical Validation
  • CLEF07
  • LATimes 2002
  • CLEF06
  • GH 95, LATimes 1994

24
CLEF06 C327
  • Title
  • Earthquakes in Mexico City
  • Description
  • Find documents that provide details on the
    impact of or the damage caused by earthquakes in
    Mexico City in 1995.
  • Narrative
  • Relevant document should contain some information
    on earthquakes in Mexico City in 1995, such as
    their magnitude, damages caused, panic of the
    inhabitants, etc. Documents on earthquakes in
    other places in Mexico are not relevant unless
    the seismic impact was also felt in Mexico City.

25
Relevant Document
  • ltDOCNOgt LA121194-0313 lt/DOCNOgt
  • ltDOCIDgt 107228 lt/DOCIDgt
  • December 11, 1994, Sunday, Home Edition
  • A magnitude 6.3 earthquake rocked Mexico City,
    causing people to flee their homes in fear. There
    were no immediate reports of injuries or severe
    damage. The U.S. Geological Survey's National
    Earthquake Information Center in Golden, Colo.,
    said the quake's epicenter was in Petatlan in the
    southwestern state of Guerrero.

26
Date-Based Document Restriction
  • Identify dates (if any) in the query.
  • Restrict candidate documents to the set of
    documents coming from the same time period.

27
FIRE 2008 Relevant Docs
28
FIRE 2008 Hindi?English MAP
29
Date-Based Document Restriction _at_ FIRE2008
  • Hurt us.
  • Deeper investigation needed.
Write a Comment
User Comments (0)
About PowerShow.com