CrossLingual IR - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

CrossLingual IR

Description:

1 TB Mem, 1000 TB disk, 1B users, 1T ... Synonyms. Translation. Probabilistic Models of IR ... synonyms. Wordnet. ontologies. hidden: topics, top N docs, ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 9
Provided by: jsmc
Category:

less

Transcript and Presenter's Notes

Title: CrossLingual IR


1
Cross-Lingual IR
  • Salim RoukosIBM T. J. Watson Research Center
  • 9/11/02

2
Assumptions for 2010 (Asilomar Report)
  • 1 TB Mem, 1000 TB disk, 1B users,
  • 1T devicesgt 1b servers
  • self-managing, very secure, and very reliable
  • Auto-x install, heal, adaptive, auto-tuning
    wizard
  • Information discovery metadata for describing
    schema,
  • cast operations
  • Federation across 1k, 1m databases
  • "Find the average enterprise-wide employee
    salary.
  • "Are there any really good Italian restaurants
    within 5 miles of where I live?"

3
Exploit multilingual information streams
  • - Xinhua
  • - SDA
  • AFP
  • AP
  • ...

- Parallel vs comparable documents - Build
Translingual search
4
X-lingual Retrieval
xxx Docs
English Docs
French Docs
Chinese Docs
online
E gt X MT
X gt E MT
E gt C MT
English for gisting
Ranked Docs
Query English
IR scoring
Chinese
Caveat Machine Translation isnt perfect and
queries tend to be short.
5
From information need to query
  • Who has the largest market share for notebooks
    IBM or Dell?
  • Q1 notebook market share
  • Q2 laptop market share IBM Dell
  • Q3 ThinkPad IBM Dell

?
D
I
q
D
P(q I) p(q D is R, C)
D
6
Probabilistic Models of IR
D document C doc collection q query
P(D is R q, C) P(q D is R, C) P (D is R
C)
Prior Link analysis,other?
LM Beyond 1g? Currently P(qD is R) k p(qD)
(1-k) p(q)
  • Need training data to estimate model
  • Order 100k queries (not 1k)

7
Probabilistic Model of What?
P(R a,D, q, C)
Many features in ME/MIX models word
ngrams synonyms Wordnet ontologies hidden
topics, top N docs, ..
8
Goal -- Give users info they are seeking in
context
  • Is XIR different from IR?
  • Translingual search ? improved monolingual
    retrieval?
  • Monolingual vs multilingual users
  • How are XIR and MT related?
  • How can we scale up?
  • Create training sets to foster probabilistic
    modeling research for IR (100k queries)
  • Modeling multilingual web content and link
    structure
  • Dialog Interaction
  • Its about modeling!
Write a Comment
User Comments (0)
About PowerShow.com