Support%20for%20Multilingual%20Information%20Access - PowerPoint PPT Presentation

About This Presentation
Title:

Support%20for%20Multilingual%20Information%20Access

Description:

College of Information Studies and. Institute for Advanced Computer Studies ... Source: Network Wizards Jan 99 Internet Domain Survey. European Web Size Projection ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 34
Provided by: asatisf212
Category:

less

Transcript and Presenter's Notes

Title: Support%20for%20Multilingual%20Information%20Access


1
Support for Multilingual Information Access
  • Douglas W. Oard
  • College of Information Studies and
  • Institute for Advanced Computer Studies
  • University of Maryland, College Park, MD, USA

2
Multilingual Information Access
Help people find information that is expressed
in any language
3
Outline
  • User needs
  • System design
  • User studies
  • Next steps

4
Global Languages
Source http//www.g11n.com/faq.html
5
Global Internet User Population
2000
2005
English
English
Chinese
Source Global Reach
6
Global Internet Hosts
Source Network Wizards Jan 99 Internet Domain
Survey
7
European Web Size Projection
Source Extrapolated from Grefenstette and
Nioche, RIAO 2000
8
Global Internet Audio
Over 2500 Internet-accessible Radio and
Television Stations
source www.real.com, Mar 2001
9
Who needs Cross-Language Search?
  • Searchers who can read several languages
  • Eliminate multiple queries
  • Query in most fluent language
  • Monolingual searchers
  • If translations can be provided
  • If it suffices to know that a document exists
  • If text captions are used to search for images

10
Outline
  • User needs
  • System design
  • User studies
  • Next steps

11
(No Transcript)
12
Multilingual Information Access
Cross-Language Search
Query
13
The Search Process
Author
Choose Document-Language Terms
Query-Document Matching
Document
14
Interactive Search
Query Formulation
15
(No Transcript)
16
Synonym Selection
17
KeyWord In Context (KWIC)
18
(No Transcript)
19
Outline
  • User needs
  • System design
  • User studies
  • Next steps

20
Cross-Language Evaluation Forum
  • Annual European-language retrieval evaluation
  • Documents 8 languages
  • Dutch, English, Finnish, French, German, Italian,
    Spanish, Swedish
  • Topics 8 languages, plus Chinese and Japanese
  • Batch retrieval since 2000
  • Interactive track (iCLEF) started in 2001
  • 2001 focus document selection
  • 2002 focus query formulation

21
iCLEF 2001 Experiment Design
144 trials, in blocks of 16, at 3 sites
Participant
Task Order
Topic Key
1
Topic11, Topic17
Topic13, Topic29
Narrow
11, 13
Broad
17, 29
2
Topic11, Topic17
Topic13, Topic29
System Key
3
Topic17, Topic11
Topic29, Topic13
System A
System B
4
Topic17, Topic11
Topic29, Topic13
22
An Experiment Session
  • Task and system familiarization
  • 4 searches (20 minutes each)
  • Read topic description
  • Examine document translations
  • Judge as many documents as possible
  • Relevant, Somewhat relevant, Not relevant,
    Unsure, Not judged
  • Instructed to seek high precision
  • 8 questionnaires
  • Initial, each topic (4), each system (2), final

23
Measure of Effectiveness
  • Unbalanced F-Measure
  • P precision
  • R recall
  • ? 0.8
  • Favors precision over recall
  • This models an application in which
  • Fluent translation is expensive
  • Missing some relevant documents would be okay

24
French Results Overview
25
English Results Overview
26
Commercial vs. Gloss Translation
  • Commercial Machine Translation (MT) is almost
    always better
  • Significant with one-tail t-test (plt0.05) over 16
    trials
  • Gloss translation usually beats random selection

27
iCLEF 2002 Experiment Design
Topic Description
Standard Ranked List
Query Formulation
Automatic Retrieval
Interactive Selection
F
Mean Average Precision
0.8
28
Maryland Experiments
  • 48 trials (12 participants)
  • Half with automatic query translation
  • Half with semi-automatic query translation
  • 4 subjects searched Der Spiegel and SDA
  • 20-60 relevant documents for 4 topics
  • 8 subjects searched Der Spiegel
  • 8-20 relevant documents for 3 topics
  • 0 relevant documents for 1 topic!

29
Some Preliminary Results
  • Average of 8 query iterations per search
  • Relatively insensitive to topic
  • Topic 4 (Hunger Strikes) 6 iterations
  • Topic 2 (Treasure Hunting) 16 iterations
  • Sometimes sensitive to system
  • Topics 1 and 2 system effect was small
  • Topics 3 and 4 fewer iterations with
    semi-automatic
  • Topic 3 European Campaigns against Racism

30
Subjective Evaluation
  • Semi-automatic system
  • Ability to select translations good
  • Automatic system
  • Simpler / less user-involvement needed - good
  • Few functions / easier to learn and use good
  • No control over translations - bad
  • Both systems
  • Highlighting keywords helps - good
  • Untranslated/poorly-translated words - bad
  • No Boolean or proximity operator bad

31
Outline
  • User needs
  • System design
  • User studies
  • Next steps

32
Next Steps
  • Quantitative analysis from 2002 (MAP, F)
  • Iterative improvement of query quality
  • Utility of MAP as a measure of query quality?
  • Utility of semiautomatic translation
  • Accuracy of relevance judgments
  • Search strategies
  • Dependence on system
  • Dependence on topic
  • Dependence on density of relevant documents

33
An Invitation
  • Join CLEF
  • A first step Hungarian topics
  • http//clef.iei.pi.cnr.it
  • Join iCLEF
  • Help us focus on true user needs!
  • http//terral.lsi.uned.es/iCLEF
Write a Comment
User Comments (0)
About PowerShow.com