Translation Enhancement: a New Relevance Feedback Method for CrossLanguage Information Access - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Translation Enhancement: a New Relevance Feedback Method for CrossLanguage Information Access

Description:

none – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 26

Provided by: daq3

Category:

more less

Transcript and Presenter's Notes

Title: Translation Enhancement: a New Relevance Feedback Method for CrossLanguage Information Access

1
Translation Enhancement a New Relevance
Feedback Method for Cross-Language Information
Access
Daqing He University of Pittsburgh dah44_at_pitt.edu
Dan Wu Wuhan University woodan_at_whu.edu.cn
2
Outline

Motivations
Translation Enhancement
Experiments and Results
Conclusions

3
Query Translation Based CLIR in TREC like
Environments
query translation
4
Usages of RF Information

Query expansion (QE) methods perform QE before
query translation (thus pre-translation QE)
or/and after query translation (post-translation
QE)
Post-translation QE or the combination of the two
performed the best Ballesteros Croft 97,
McNamee Mayfield 02

Query Translation
Search on Target Language Collection
Relevance Feedback
Query
Pre-translation Query Expansion
Post-translation Query Expansion
Search on a Source Language Collection
5
Translations in Query Translation based CLIR
query translation
result translation
6
What Can Obtain From RF?
query translation
f1,f2,fn
result translation
e1,e2,em
e1 ltgt f1 en ltgt fn
7
Usages of RF Information - II

Query expansion (QE) methods expand
pre-translation or/and post-translation queries
Translation Enhancement (TE) improve query
translation resources using the obtained relevant
translation relationships

Query Translation
Search on Target Language Collection
Relevance Feedback
Query
Pre-translation Query Expansion
Post-translation Query Expansion
Search on a Source Language Collection
8
Benefits

By applying extracted translation relationships
back to query translation
Make query translation and result document
translation consistent with each other
Help future pre-translation query expansion
Tailor the query translation resources toward
users current search
New translation alternatives can be introduced by
TE
Potentially solve some out-of-vocabulary terms
(OOV)
TE does not replace QE
they work at different steps of RF in CLIA
TE can help pre-translation QE
Maybe they can be combined?

9
Contributions of This Work

many related works on applying extracted
translation relationships in improving CLIR
effectiveness
Nie, Simard, Isabelle, Durand 99 used web mined
parallel texts for CLIR
Xu, Weischedel and Nguyen 01 estimates
translation probabilities based on a parallel
corpus
Lavrenko, Choquette and Croft 02 describes a
cross lingual relevance model that uses parallel
corpus as one resource for translation
Our contributions are at
Studying methods for extracting translation
relationships
Using extracted translation relationships from
relevant returned documents pairs for enhancing
query translation directly
Exploring the combination of TE and QE

10
Research Questions on TE

How to obtain relevant translation relationships?
How to enhance query translation with the
relevant translation relationships?
Do it make sense to integrate TE with other RF
methods?

11
Obtain Translation Relationships

Borrow ideas from mining on parallel corpus
Establish alignment at certain level
Best at word alignment level between docs and
their translations
Minimum at sentence alignment level
When word alignment is available
Translations based on Word Alignment (TWA) train
GIZA to obtain a word alignment model, and get
word alignment from the model
When only sentence alignment is available
Keep All Translations (KAT) keep all the
translation relationships of the query terms
identified in the sentence pairs in relevant docs
Keep One Best Translation (K1T) based on KAT,
but keep the one has the highest translation
probability in the dictionary
Keep Most Frequent Translation (KFT) based on
KAT, but keep the one has the highest frequency
in the relevant doc

12
Obtain Translation Relationships without Word
Alignment
Dictionary
E1 F11, F1m1 E2 F21, F2m2 EnFn1, Fnmn
E1, E2, ,En
E1 E2 E2E1E1 E2 E1E1
F11 F21 F22F11F12 F22F11F11
D1
D1
E1 ? F11 (D1.4) E1 ? F12 (D1.1) E2 ? F21
(D1.1) E2 ? F22 (D1.2)
Stemming and back off strategy are used to
increase the finding of instances of query terms
and their translations inside the relevant docs
and their translation docs
KAT
E1 ? F11 (D1.4) E1 ? F12 (D1.1) E2 ? F21
(D1.1) E2 ? F22 (D1.2) E1 ? F11 (D2.4,)
K1T
E1 ? F11 (D1.4) E2 ? F21 (D1.1)
KFT
E1 ? F11 (D1.4) E2 ? F22 (D1.2)
13
Convert Extracted Relationships into Translation
Probability

Pi,j(j is trans of ij is in Rel) the
probability of translation alternative j being
the translation of term i, given that j is in the
relevant documents set
tfj,k the frequency of j being extracted as the
translation of i from the relevant document k
n all the relevant documents
mi all the translation alternatives of term i

14
Enhanced Translation Probability

Combine the translation probabilities obtained
from relevant document set with that in the
original dictionary
? the parameter to adjust different weight of
translation probability in relevant documents set
and general dictionary
Normalization

15
Experiment Goals and Objectives

Is Translation Enhancement an effective RF
method?
To test whether translation enhancement methods
can improve CLIA in blind RF
Can Translation Enhancement be combined with
other RF methods?
To test whether combining translation enhancement
with query expansion can improve CLIA in blind RF
To test whether translation enhancement can
improve CLIA in interactive RF (not discuss in
this talk)
Is Translation Enhancement effective in real
interactive search environment?

16
Experiment Resources

English to Chinese CLIR
English queries and Chinese documents
Preprocessing Tools
Stanford Chinese segmentation tool for Chinese
documents
Porter stemmer for English queries and documents
an English and a Chinese stop word list
Collections
TDT4 and TDT5 Chinese collection (83,627
documents)
TDT4 and TDT5 English MT collection (83,627
documents)
TDT4 and TDT5 English collection (306,498
documents)
Translation Resources
an English-Chinese bilingual lexicon with
translation probabilities obtained from large
parallel corpus Wang Oard 06
GIZA machine translation toolkit
Indri 2.4 search engine
Evaluation Metrics (TREC evaluation)
MAP Mean Average Precision

17
Query Types

Topics
44 TDT4 and TDT5 English topics converted into
TREC format
All topics manually translated into Chinese
Query (TREC format)
Title (short T queries)
Title Description (medium TD queries)
Title Description Narrative (long TDN
queries)

18
Baselines

Monolingual Baseline
use Chinese queries to search on Chinese
collection
Lower Cross-language Baseline
use English queries to cross language search on
Chinese collection without any performance
enhancement technique
cumulative probability threshold (CPT) from 0.0
to 1.0 with an increment of 0.1 at each time,
below display the one with the best MAP

19
Baselines - II

Higher Cross-lingual Baseline
Same as the low CL baseline, but with query
expansion
Use default Indri Pseudo RF mechanism
use top 20 documents of the result rank list
top 20 terms are expanded
Relative weight between original query and
expanded term are tuned for specific QE method
pre-translation query expansion
post-translation query expansion
combine pre and post translation query expansion

20
TE Methods vs Baselines

All four TE performed better than CL lower
baseline
TWA improved the most, KAT improved the least
TWA significantly improved in all three query
types
KFT significantly improved in T and TD query
types
But only TWA achieved 93 of Mono Baseline at TDN

21
TE Methods vs QE Methods

Pre-QE performed the worst among QE methods
All TE methods are at least comparable to best QE
TWA outperforms best QE at TD and TDN
significant at TDN

22
TE and QE Combination

Combine TWA and Post-QE
Comparable to the State-of-Art CLIR performance
Significant over the single runs in almost all
query types

? p 0.01, ? 0.01 lt p 0.05
23
TE in Resolving OOV Terms

Trough word alignment, some OOV terms can be
resolved with high quality translations
11 OOV terms found their translations through
TWA, only 2 of them are wrong (indicated by )

24
Conclusion

Translation enhancement can improve CLIA in
pseudo RF
Translation enhancement approach performs better
in the process where human are involved in
(discussed in the paper)
Translation enhancement can be combined with QE
TE and QE work on different part of RF process
Combination of them significantly improve the
CLIR performance
Translation enhancement can help resolve out of
vocabulary terms in query translation
The quality of resolving OOV is reasonable high
Future work
Extract translation relationships based on
Statistical MT output, no word alignment needed
Better integration of TE and QE
Interactive translation enhancement