Title: Support%20for%20Multilingual%20Information%20Access
1Support for Multilingual Information Access
- Douglas W. Oard
- College of Information Studies and
- Institute for Advanced Computer Studies
- University of Maryland, College Park, MD, USA
2Multilingual Information Access
Help people find information that is expressed
in any language
3Outline
- User needs
- System design
- User studies
- Next steps
4Global Languages
Source http//www.g11n.com/faq.html
5Global Internet User Population
2000
2005
English
English
Chinese
Source Global Reach
6Global Internet Hosts
Source Network Wizards Jan 99 Internet Domain
Survey
7European Web Size Projection
Source Extrapolated from Grefenstette and
Nioche, RIAO 2000
8Global Internet Audio
Over 2500 Internet-accessible Radio and
Television Stations
source www.real.com, Mar 2001
9Who needs Cross-Language Search?
- Searchers who can read several languages
- Eliminate multiple queries
- Query in most fluent language
- Monolingual searchers
- If translations can be provided
- If it suffices to know that a document exists
- If text captions are used to search for images
10Outline
- User needs
- System design
- User studies
- Next steps
11(No Transcript)
12Multilingual Information Access
Cross-Language Search
Query
13The Search Process
Author
Choose Document-Language Terms
Query-Document Matching
Document
14Interactive Search
Query Formulation
15(No Transcript)
16Synonym Selection
17KeyWord In Context (KWIC)
18(No Transcript)
19Outline
- User needs
- System design
- User studies
- Next steps
20Cross-Language Evaluation Forum
- Annual European-language retrieval evaluation
- Documents 8 languages
- Dutch, English, Finnish, French, German, Italian,
Spanish, Swedish - Topics 8 languages, plus Chinese and Japanese
- Batch retrieval since 2000
- Interactive track (iCLEF) started in 2001
- 2001 focus document selection
- 2002 focus query formulation
21iCLEF 2001 Experiment Design
144 trials, in blocks of 16, at 3 sites
Participant
Task Order
Topic Key
1
Topic11, Topic17
Topic13, Topic29
Narrow
11, 13
Broad
17, 29
2
Topic11, Topic17
Topic13, Topic29
System Key
3
Topic17, Topic11
Topic29, Topic13
System A
System B
4
Topic17, Topic11
Topic29, Topic13
22An Experiment Session
- Task and system familiarization
- 4 searches (20 minutes each)
- Read topic description
- Examine document translations
- Judge as many documents as possible
- Relevant, Somewhat relevant, Not relevant,
Unsure, Not judged - Instructed to seek high precision
- 8 questionnaires
- Initial, each topic (4), each system (2), final
23Measure of Effectiveness
- Unbalanced F-Measure
- P precision
- R recall
- ? 0.8
- Favors precision over recall
- This models an application in which
- Fluent translation is expensive
- Missing some relevant documents would be okay
24French Results Overview
25English Results Overview
26Commercial vs. Gloss Translation
- Commercial Machine Translation (MT) is almost
always better - Significant with one-tail t-test (plt0.05) over 16
trials - Gloss translation usually beats random selection
27iCLEF 2002 Experiment Design
Topic Description
Standard Ranked List
Query Formulation
Automatic Retrieval
Interactive Selection
F
Mean Average Precision
0.8
28Maryland Experiments
- 48 trials (12 participants)
- Half with automatic query translation
- Half with semi-automatic query translation
- 4 subjects searched Der Spiegel and SDA
- 20-60 relevant documents for 4 topics
- 8 subjects searched Der Spiegel
- 8-20 relevant documents for 3 topics
- 0 relevant documents for 1 topic!
29Some Preliminary Results
- Average of 8 query iterations per search
- Relatively insensitive to topic
- Topic 4 (Hunger Strikes) 6 iterations
- Topic 2 (Treasure Hunting) 16 iterations
- Sometimes sensitive to system
- Topics 1 and 2 system effect was small
- Topics 3 and 4 fewer iterations with
semi-automatic - Topic 3 European Campaigns against Racism
30Subjective Evaluation
- Semi-automatic system
- Ability to select translations good
- Automatic system
- Simpler / less user-involvement needed - good
- Few functions / easier to learn and use good
- No control over translations - bad
- Both systems
- Highlighting keywords helps - good
- Untranslated/poorly-translated words - bad
- No Boolean or proximity operator bad
31Outline
- User needs
- System design
- User studies
- Next steps
32Next Steps
- Quantitative analysis from 2002 (MAP, F)
- Iterative improvement of query quality
- Utility of MAP as a measure of query quality?
- Utility of semiautomatic translation
- Accuracy of relevance judgments
- Search strategies
- Dependence on system
- Dependence on topic
- Dependence on density of relevant documents
33An Invitation
- Join CLEF
- A first step Hungarian topics
- http//clef.iei.pi.cnr.it
- Join iCLEF
- Help us focus on true user needs!
- http//terral.lsi.uned.es/iCLEF