Towards Interactive and Automatic Refinement of Translation Rules

About This Presentation

Title:

Towards Interactive and Automatic Refinement of Translation Rules

Description:

Towards Interactive and Automatic Refinement of Translation Rules Ariadna Font Llitj s PhD Thesis Proposal Jaime Carbonell (advisor) Alon Lavie (co-advisor) – PowerPoint PPT presentation

Number of Views:157

Avg rating:3.0/5.0

Slides: 76

Provided by: csCmuEdu105

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Towards Interactive and Automatic Refinement of Translation Rules

1
Towards Interactive and Automatic Refinement of
Translation Rules

Ariadna Font Llitjós
PhD Thesis Proposal
Jaime Carbonell (advisor)
Alon Lavie (co-advisor)
Lori Levin
Bonnie Dorr (Univ. Maryland)
5 November 2004

2
Outline

Introduction
Thesis statement and scope
Preliminary Research
Interactive elicitation of error information
A framework for automatic rule adaptation
Proposed Research
Contributions and Thesis Timeline

3
Machine Translation (MT)

Source Language (SL) sentence
Gaudi was a great artist
In Spanish, it translates as
Gaudi era un gran artista
MT System outputs
?Gaudi estaba un artista grande
? Gaudi era un artista grande

4
Spanish Adjectives
Automatic Rule Adaptation
Completed Work
General order grande ? big in size
Exception gran ? exceptional
5
Commercial and Online Systems

Correct Translation Gaudi era un gran artista
Systran, Babelfish (Altavista), WorldLingo,
Translated.net
Gaudi era ? gran artista
ImTranslation
?El Gaudi era un gran artista
1-800-Translate
? Gaudi era un fenomenal artista

Current solutions
?Manual post-editing Allen, 2003
?Automated post-edition module (APE)
Allen Hogan, 2000

7
Drawbacks of Current Methods

Manual post-editing ? Corrections do not
generalize
? Gaudi era un artista grande
? Juan es un amigo grande (Juan is a great
friend)
? Era una oportunidad grande (It is a great
opportunity)
APE ? Humans need to predict all the errors ahead
of time and code for the post-editing rules new
error ?

8
My Solution

Automate post-editing efforts by feeding
them back into the MT system.
Possible alternatives
Automatic learning of post-editing rules
system independent
- several thousands of sentences might need to
be corrected for the same error
Automatic refinement of translation rules
attacks the core of the problem
for transfer-based MT systems (need rules to fix!)

9
Related Work
Corston-Oliver Gammon, 2003 Imamura et al.
2003 Menezes Richardson, 2001
Brill, 1993 Gavaldà, 2000
Machine Translation
Rule Adaptation
Callison-Burch, 2004 Su et al. 1995
My Thesis
Post-editing
No pre-existing training data required No human
reference translations required Use Non-expert
user feedback
10
Resource-poor Scenarios (AVENUE)

Lack of electronic parallel data
Lack of manual grammar (or very small initial
grammar)
? Need to validate elicitation corpus and
automatically learned translation rules
Why bother?
Indigenous communities have difficult access to
crucial information that directly affects their
life (such as land laws, plagues, health
warnings, etc.)
Preservation of their language and culture

Mapudungun Quechua Aymara
11
How is MT possible for resource-poor languages?
Bilingual speakers
12
AVENUE Project Overview
13
My Thesis
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
Learning Module
Handcrafted rules
Run Time Transfer System
Transfer Rules
Morpho-logical analyzer
Rule Refinement Module
Elicitation Corpus
Lexical Resources
Lattice
Elicitation Tool
14
Recycle corrections of Machine Translation output
back into the system by refining and expanding
existing translation rules
15
Thesis Statement

- Given a rule-based Transfer MT system, we
can extract useful information from non-expert
bilingual speakers about the corrections required
to make MT output acceptable.
- We can automatically refine and expand
translation rules, given corrected and aligned
translation pairs and some error information, to
improve coverage and overall MT quality.

16
Assumptions

No parallel training data available
No human reference translations available
The SL sentence needs to be fully parsed by the
translation grammar.
Bilingual speakers can give enough information
about the MT errors.

17
Scope

Types of errors that
Focus 1 can be refined fully automatically just
by using correction information.
Focus 2 can be refined fully automatically using
correction and error information.
Focus 3 require a reasonable amount of further
user interaction and can be solved by available
correction and error information.

18
Technical Challenges
Automatic Evaluation of Refinement process
Elicit minimal MT information from non-expert
users
19
Preliminary Work

Interactive elicitation of error information
A framework for automatic rule adaptation

20
Interactive Elicitation of MT Errors

Goal
Simplify MT correction task maximally
Challenges
Find appropriate level of granularity for MT
error classification
Design a user-friendly graphic user interface
with
SL sentence (e.g. I see them)
TL sentence (e.g. Yo veo los)
word-to-word alignments (I-yo, see-veo, them-los)
(context)

21
MT Error Typology for RR (simplified)
Completed Work
Interactive elicitation of error information

Missing word
Extra word
Wrong word order
Incorrect word
Wrong agreement

Local vs Long distance Word vs. phrase Word
change
Sense Form Selectional restrictions Idiom
Missing constraint Extra constraint
22
TCTool (Demo)
Interactive elicitation of error information

Add a word
Delete a word
Modify a word
Change word order

Actions
23
1st Eng2Spa User Study
Interactive elicitation of error information
Completed Work

LREC 2004
MT error classification ? 9 linguistically-motivat
ed classes
word order, sense, agreement error (number,
person, gender, tense), form, incorrect word and
no translation

precision recall F1
error detection 90 89 89
error classification 72 71 72
24
Automatic Rule Refinement Framework
Completed Work
Automatic Rule Adaptation

Find best RR operations given a
Grammar (G),
Lexicon (L),
(Set of) Source Language sentence(s) (SL),
(Set of) Target Language sentence(s) (TL),
Its Parse tree (P), and
Minimal correction of TL (TL)
such that TQ2 gt TQ1
Which can also be expressed as
max TQ(TLTL,P,SL,RR(G,L))

25
Types of Refinement Operations
Automatic Rule Adaptation
Completed Work

1. Refine a translation rule
R0 ? R1 (R0 modified, either made more
specific or more general)

R0
una casa bonito
a nice house
R1
N gender ADJ gender
a nice house
una casa bonita
26
Types of Refinement Operations (2)
Automatic Rule Adaptation
Completed Work

2. Bifurcate a translation rule
R0 ? R0 (same, general rule)
? R1 (R0 modified, specific rule)

R0
una casa bonita
a nice house
R1
un gran artista
a great artist
27
Formalizing Error Information
Automatic Rule Adaptation
Completed Work

Wi error
Wi correction
Wc clue word
Example
SL the red car - TL el auto roja ? TL el
auto rojo
Wi roja Wi rojo Wc auto

need to agree
28
Triggering Feature Detection
Automatic Rule Adaptation
Completed Work

Comparison at the feature level to detect
triggering feature(s)
Delta function ?(Wi,Wi)
Examples
?(rojo,roja) gender
?(comiamos,comia) person,number
?(mujer,guitarra) ?
If ? set is empty, need to
postulate a new binary feature

29
Deciding on the Refinement Op
Automatic Rule Adaptation
Completed Work

Given
- Action performed by the user (add, delete,
modify, change word order) , and
- Error information is available (clue word,
word alignments, etc.)
? Refinement Action

30
Rule Refinement Operations
31
Proposed Work

- Batch and Interactive mode
User studies
Evaluation

32
Rule Refinement Example
Automatic Rule Adaptation

Change word order
SL Gaudí was a great artist
TL Gaudí era un artista grande
Corrected TL (TL) Gaudí era un gran artista

33
Automatic Rule Adaptation
1. Error Information
Elicitation
Refinement Operation Typology
34
2. Variable Instantiation from Log File
Automatic Rule Adaptation

Correcting Actions
1. Word order change (artista grande ? grande
artista)
Wi grande
2. Edited grande into gran
Wi gran
identified artist as clue word ? Wc artist
In this case, even if user had not identified Wc,
refinement process would have been the same

35
3. Retrieve Relevant Lexical Entries
Automatic Rule Adaptation

No lexical entry for great ? gran
Duplicate lexical entry great-grande and
change TL side
ADJADJ great -gt gran
((X1Y1)
((x0 form) great)
((y0 agr num) sg)
((y0 agr gen) masc))
(Morphological analyzer grande gran)

ADJADJ great -gt grande ((X1Y1) ((x0
form) great) ((y0 agr num) sg) ((y0 agr gen)
masc))
36
4. Finding Triggering Feature(s)
Automatic Rule Adaptation

Feature ? function ?(Wi, Wi) ?
? need to postulate a new binary feature
feat1
5. Blame assignment
tree lt((S,1 (NP,2 (N,51 "GAUDI") )
(VP,3 (VB,2 (AUX,172 "ERA") )
(NP,8 (DET,03 "UN")
(N,45 "ARTISTA")
(ADJ,54 "GRANDE")
) ) ) )gt

37
6. Variable Instantiation in the Rules
Automatic Rule Adaptation

Wi grande ? POSi ADJ Y3, y3
Wc artista ? POSc N Y2, y2
NP,8
NPNP DET ADJ N -gt DET N ADJ
( (X1Y1) (X2Y3) (X3Y2)
((x0 def) (x1 def))
(x0 x3)
((y1 agr) (y2 agr)) det-noun agreement
((y3 agr) (y2 agr)) adj-noun agreement
(y2 x3) )

(R0)
38
7. Refining Rules
Automatic Rule Adaptation

NP,8
NPNP DET ADJ N -gt DET ADJ N
( (X1Y1) (X2Y2) (X3Y3)
((x0 def) (x1 def))
(x0 x3)
((y1 agr) (y3 agr)) det-noun agreement
((y2 agr) (y3 agr)) adj-noun agreement
(y2 x3)
((y2 feat1) c ))

(R1)
39
8. Refining Lexical Entries
Automatic Rule Adaptation

ADJADJ great -gt grande
((X1Y1)
((x0 form) great)
((y0 agr num) sg)
((y0 agr gen) masc)
((y0 feat1) -))
ADJADJ great -gt gran
((X1Y1)
((x0 form) great)
((y0 agr num) sg)
((y0 agr gen) masc)
((y0 feat1) ))

40
Done? Not yet
Automatic Rule Adaptation

NP,8 (R0) ADJ(grande)
feat1 -
NP,8 (R1) ADJ(gran)
feat1 c feat1
Need to restrict application of general rule (R0)
to just post-nominal ADJ

un artista grande un artista gran un gran artista
un grande artista
41
Add Blocking Constraint
Automatic Rule Adaptation

NP,8 (R0) ADJ(grande)
feat1 - feat1 -
NP,8 (R1) ADJ(gran)
feat1 c feat1
Can we also eliminate incorrect translations
automatically?

un artista grande un artista gran un gran
artista un grande artista
42
Making the grammar tighter
Automatic Rule Adaptation

If Wc artista
Add feat1 to N(artista)
Add agreement constraint to NP,8 (R0) between N
and ADJ ((N feat1) (ADJ feat1))

un artista grande un artista gran un gran
artista un grande artista
43
Batch Mode Implementation
Automatic Rule Adaptation
Proposed Work

For Refinement operations of errors that can be
refined
Fully automatically just by using correction
information (Focus 1)
Fully automatically using correction and error
information (Focus 2)

44
Rule Refinement Operations
45
Focus 1
Rule Refinement Operations
It is a nice house Es una casa bonito
? Es una casa bonita
John and Mary fell Juan y Maria ? cayeron
? Juan y Maria se cayeron
46
Focus 1
Rule Refinement Operations
J y M cayeron ? J y M se cayeron
Es una casa bonito ? Es una casa bonita
Gaudi was a great artist Gaudi era un artista
grande ? Gaudi era un gran artista
I will help him fix the car Ayudaré a él a
arreglar el auto ? Le ayudare a
arreglar el auto
47
Focus 1
Rule Refinement Operations
I would like to go Me gustaria que ir
? Me gustaria ? ir
I will help him fix the car Ayudaré a él a
arreglar el auto ? Le ayudare a
arreglar el auto
48
Focus 1 2
Rule Refinement Operations
PP ? PREP NP
I am proud of you Estoy orgullosa tu
? Estoy orgullosa de ti
49
Interactive Mode Implementation
Automatic Rule Adaptation
Proposed Work

Extra error information is required
More sentences need to be evaluated (and
corrected) by users
Relevant Minimal Pairs (MP)
Focus 3 types of errors that require a
reasonable amount of further user interaction and
can be solved by available correction and error
information.

50
Focus 3
Rule Refinement Operations
Wally plays the guitar Wally juega la
guitarra ? Wally toca la
guitarra
I saw the woman Vi ? la mujer ?
Vi a la mujer
I see them Veo los ? Los veo
51
Example Requiring Minimal Pair
Automatic Rule Adaptation
Proposed Work

1. Run SL sentence through the transfer engine
I see them ? veo los ? los veo
2. Wi los but no Wi nor Wc
Need a minimal pair to determine appropriate
refinement
I see cars ? veo autos
3. Triggering feature(s) ?(los,autos) pos
PRON(los)pospron N(autos)posn

52
Refining and Adding Constraints
Proposed Work

VP,3 VP NP ? VP NP
VP,3 VP NP ? NP VP NP pos c pron
Percolate triggering features up to the
constituent level
NP PRON ? PRON NP pos PRON pos
Block application of general rule (VP,3)
VP,3 VP NP ? VP NP NP pos (NOT pron)

53
Generalization Power

Have other example sentences with same error that
would be translated correctly after refinement!

54
Outside Scope of Thesis
Rule Refinement Operations
John read the book A Juan leyó el libro
? ? Juan leyó el libro
Where are you from? Donde eres tu de?
? De donde eres tu?
55
User Studies
Proposed Work

TCTool new MT classification (Eng2Spa)
Different language pair (Mapudungun or Quechua ?
Spanish)
Manual vs Learned grammars
Batch vs Interactive mode (Active Learning)
Amount of error information elicited

56
User Studies Map
Proposed Work
Mapu2Spa
57
Data Set
Proposed Work

Split development set (400 sentence) into
Dev set ? Run User Studies
Develop Refinement Module
Validate functionality
Test set ? Evaluate effect of Refinement
operations
Wild test set (from naturally occurring text)
Requirement need to be fully parsed by grammar

58
Evaluation of Refined MT System

Evaluate best translation ? Automatic evaluation
metrics (BLEU, NIST, METEOR)
Evaluate candidate list ? precision (parsimony)

59
Evaluate Best translation

Hypothesis file (translations to be evaluated
automatically)
Raw MT output
Best sentence (picked by user to be correct or
requiring the least amount of correction)
Refined MT output
Use METOR score at sentence level to pick best
candidate from the list
? Run all automatic metrics on the new hypothesis
file using user corrections as reference
translations.

60
Evaluate Candidate List

Precision tp binary 0,1
tp fp total number of TC

SL TL TL TL TL TL TL
?
?

61
Expected Contributions

An efficient online GUI to display translations
and alignments and solicit pinpoint fixes from
non-expert bilingual users.
An expandable set of rule refinement operators
triggered by user corrections,
to automatically refine and expand different
types of grammars.
A mechanism to automatically evaluate rule
refinements with user corrections as reference
translations.

62
Thesis Timeline

Research components Duration (months)
Back-end implementation 7
User Studies 3
Resource-poor language (data manual grammar) 2
Adapt system to new language pair 1
Active Learning methods 1
Evaluation 1
Write and defend thesis 3
Total 18

Expected graduation date May 2006
63
References

Add references
Related work
Probst et al. 2002
AL

64
Thanks!

Questions?

65
Some Questions

Is the refinement process deterministic?

66
Others

TCTool Demo Simulation
RR operation patterns
Automatic Evaluation feasibility study
AMTA paper results
BLEU, NIST and METEOR
Precision, recall and F1

67
Automatic Rule Adaptation
68
Automatic Rule Adaptation
SL best TL picked by user
69
Automatic Rule Adaptation
Changing word order
70
Automatic Rule Adaptation
Changing grande into gran
71
Automatic Rule Adaptation
Back to main
72
Automatic Rule Adaptation
1
2
3
73
Input to RR module
Automatic Rule Adaptation

User correction log file
Transfer engine output ( parse tree)

sl I see them tl VEO LOS tree lt((S,0 (VP,3
(VP,1 (V,12 "VEO") ) (NP,0 (PRON,23
"LOS") ) ) ) )gt
sl I see cars tl VEO AUTOS tree lt((S,0 (VP,3
(VP,1 (V,12 "VEO") ) (NP,2 (N,13
AUTOS") ) ) ) )gt
74
Types of RR Operations
Automatic Rule Adaptation
Completed Work

Grammar
R0 ? R0 R1 R0 constr CovR0 ?
CovR0,R1
R0 ? R1R0 constr -
? R2R0 constrc CovR0 ?
CovR1,R2
R0 ? R1 R0 constr CovR0 ? CovR1
Lexicon
Lex0 ? Lex0 Lex1Lex0 constr
Lex0 ? Lex1Lex0 constr
Lex0 ? Lex0 Lex1?Lex0 ? TLword
? ? Lex1 (adding lexical item)

bifurcate
refine
75
Manual vs Learned Grammars
Automatic Rule Adaptation

Manual inspection
Automatic MT Evaluation

AMTA 2004

NIST BLEU METEOR
Manual grammar 4.3 0.16 0.6
Learned grammar 3.7 0.14 0.55
76
Human Oracle experiment
Automatic Rule Adaptation
Completed Work

As a feasibility experiment, compared raw output
with manually corrected MT
statistically significant (confidence interval
test)
These is an upper-bound on how much difference we
should expect any refinement approach to make.

77
Active Learning
Automatic Rule Adaptation
Proposed Work

Minimize the number of examples a human annotator
must label Cohn et al. 1994 usually by
processing examples in order of usefulness.
.
Minimize the number of Minimal Pairs presented to
users

78
Order deterministic?

Application of Rule Refinement operations is not
deterministic, it directly depends on
The order in which it sees the corrected
sentences
Example
1st agr constraint bifurcate (WWO)
C-set
Reverse order
C-set (!)

Write a Comment

User Comments (0)

About PowerShow.com

Towards Interactive and Automatic Refinement of Translation Rules - PowerPoint PPT Presentation

Towards Interactive and Automatic Refinement of Translation Rules

Towards Interactive and Automatic Refinement of Translation Rules Ariadna Font Llitj s PhD Thesis Proposal Jaime Carbonell (advisor) Alon Lavie (co-advisor) – PowerPoint PPT presentation