Towards Interactive and Automatic Refinement of Translation Rules - PowerPoint PPT Presentation

About This Presentation
Title:

Towards Interactive and Automatic Refinement of Translation Rules

Description:

Towards Interactive and Automatic Refinement of Translation Rules Ariadna Font Llitj s PhD Thesis Proposal Jaime Carbonell (advisor) Alon Lavie (co-advisor) – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 76
Provided by: csCmuEdu105
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Towards Interactive and Automatic Refinement of Translation Rules


1
Towards Interactive and Automatic Refinement of
Translation Rules
  • Ariadna Font Llitjós
  • PhD Thesis Proposal
  • Jaime Carbonell (advisor)
  • Alon Lavie (co-advisor)
  • Lori Levin
  • Bonnie Dorr (Univ. Maryland)
  • 5 November 2004

2
Outline
  • Introduction
  • Thesis statement and scope
  • Preliminary Research
  • Interactive elicitation of error information
  • A framework for automatic rule adaptation
  • Proposed Research
  • Contributions and Thesis Timeline

3
Machine Translation (MT)
  • Source Language (SL) sentence
  • Gaudi was a great artist
  • In Spanish, it translates as
  • Gaudi era un gran artista
  • MT System outputs
  • ?Gaudi estaba un artista grande
  • ? Gaudi era un artista grande

4
Spanish Adjectives
Automatic Rule Adaptation
Completed Work
General order grande ? big in size
Exception gran ? exceptional
5
Commercial and Online Systems
  • Correct Translation Gaudi era un gran artista
  • Systran, Babelfish (Altavista), WorldLingo,
    Translated.net
  • Gaudi era ? gran artista
  • ImTranslation
  • ?El Gaudi era un gran artista
  • 1-800-Translate
  • ? Gaudi era un fenomenal artista

6
  • Current solutions
  • ?Manual post-editing Allen, 2003
  • ?Automated post-edition module (APE)
    Allen Hogan, 2000

7
Drawbacks of Current Methods
  • Manual post-editing ? Corrections do not
    generalize
  • ? Gaudi era un artista grande
  • ? Juan es un amigo grande (Juan is a great
    friend)
  • ? Era una oportunidad grande (It is a great
    opportunity)
  • APE ? Humans need to predict all the errors ahead
    of time and code for the post-editing rules new
    error ?

8
My Solution
  • Automate post-editing efforts by feeding
    them back into the MT system.
  • Possible alternatives
  • Automatic learning of post-editing rules
  • system independent
  • - several thousands of sentences might need to
    be corrected for the same error
  • Automatic refinement of translation rules
  • attacks the core of the problem
  • for transfer-based MT systems (need rules to fix!)

9
Related Work
Corston-Oliver Gammon, 2003 Imamura et al.
2003 Menezes Richardson, 2001
Brill, 1993 Gavaldà, 2000
Machine Translation
Rule Adaptation
Callison-Burch, 2004 Su et al. 1995
My Thesis
Post-editing
No pre-existing training data required No human
reference translations required Use Non-expert
user feedback
10
Resource-poor Scenarios (AVENUE)
  • Lack of electronic parallel data
  • Lack of manual grammar (or very small initial
    grammar)
  • ? Need to validate elicitation corpus and
  • automatically learned translation rules
  • Why bother?
  • Indigenous communities have difficult access to
    crucial information that directly affects their
    life (such as land laws, plagues, health
    warnings, etc.)
  • Preservation of their language and culture

Mapudungun Quechua Aymara
11
How is MT possible for resource-poor languages?
Bilingual speakers
12
AVENUE Project Overview
13
My Thesis
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
Learning Module
Handcrafted rules
Run Time Transfer System
Transfer Rules
Morpho-logical analyzer
Rule Refinement Module
Elicitation Corpus
Lexical Resources
Lattice
Elicitation Tool
14
Recycle corrections of Machine Translation output
back into the system by refining and expanding
existing translation rules
15
Thesis Statement
  • - Given a rule-based Transfer MT system, we
    can extract useful information from non-expert
    bilingual speakers about the corrections required
    to make MT output acceptable.
  • - We can automatically refine and expand
    translation rules, given corrected and aligned
    translation pairs and some error information, to
    improve coverage and overall MT quality.

16
Assumptions
  • No parallel training data available
  • No human reference translations available
  • The SL sentence needs to be fully parsed by the
    translation grammar.
  • Bilingual speakers can give enough information
    about the MT errors.

17
Scope
  • Types of errors that
  • Focus 1 can be refined fully automatically just
    by using correction information.
  • Focus 2 can be refined fully automatically using
    correction and error information.
  • Focus 3 require a reasonable amount of further
    user interaction and can be solved by available
    correction and error information.

18
Technical Challenges
Automatic Evaluation of Refinement process
Elicit minimal MT information from non-expert
users
19
Preliminary Work
  • Interactive elicitation of error information
  • A framework for automatic rule adaptation

20
Interactive Elicitation of MT Errors
  • Goal
  • Simplify MT correction task maximally
  • Challenges
  • Find appropriate level of granularity for MT
    error classification
  • Design a user-friendly graphic user interface
    with
  • SL sentence (e.g. I see them)
  • TL sentence (e.g. Yo veo los)
  • word-to-word alignments (I-yo, see-veo, them-los)
  • (context)

21
MT Error Typology for RR (simplified)
Completed Work
Interactive elicitation of error information
  • Missing word
  • Extra word
  • Wrong word order
  • Incorrect word
  • Wrong agreement

Local vs Long distance Word vs. phrase Word
change
Sense Form Selectional restrictions Idiom
Missing constraint Extra constraint
22
TCTool (Demo)
Interactive elicitation of error information
  • Add a word
  • Delete a word
  • Modify a word
  • Change word order

Actions
23
1st Eng2Spa User Study
Interactive elicitation of error information
Completed Work
  • LREC 2004
  • MT error classification ? 9 linguistically-motivat
    ed classes
  • word order, sense, agreement error (number,
    person, gender, tense), form, incorrect word and
    no translation

precision recall F1
error detection 90 89 89
error classification 72 71 72
24
Automatic Rule Refinement Framework
Completed Work
Automatic Rule Adaptation
  • Find best RR operations given a
  • Grammar (G),
  • Lexicon (L),
  • (Set of) Source Language sentence(s) (SL),
  • (Set of) Target Language sentence(s) (TL),
  • Its Parse tree (P), and
  • Minimal correction of TL (TL)
  • such that TQ2 gt TQ1
  • Which can also be expressed as
  • max TQ(TLTL,P,SL,RR(G,L))

25
Types of Refinement Operations
Automatic Rule Adaptation
Completed Work
  • 1. Refine a translation rule
  • R0 ? R1 (R0 modified, either made more
    specific or more general)

R0
una casa bonito
a nice house
R1
N gender ADJ gender
a nice house
una casa bonita
26
Types of Refinement Operations (2)
Automatic Rule Adaptation
Completed Work
  • 2. Bifurcate a translation rule
  • R0 ? R0 (same, general rule)
  • ? R1 (R0 modified, specific rule)

R0
una casa bonita
a nice house
R1
un gran artista
a great artist
27
Formalizing Error Information
Automatic Rule Adaptation
Completed Work
  • Wi error
  • Wi correction
  • Wc clue word
  • Example
  • SL the red car - TL el auto roja ? TL el
    auto rojo
  • Wi roja Wi rojo Wc auto

need to agree
28
Triggering Feature Detection
Automatic Rule Adaptation
Completed Work
  • Comparison at the feature level to detect
    triggering feature(s)
  • Delta function ?(Wi,Wi)
  • Examples
  • ?(rojo,roja) gender
  • ?(comiamos,comia) person,number
  • ?(mujer,guitarra) ?
  • If ? set is empty, need to
  • postulate a new binary feature

29
Deciding on the Refinement Op
Automatic Rule Adaptation
Completed Work
  • Given
  • - Action performed by the user (add, delete,
    modify, change word order) , and
  • - Error information is available (clue word,
    word alignments, etc.)
  • ? Refinement Action

30
Rule Refinement Operations
31
Proposed Work
  • - Batch and Interactive mode
  • User studies
  • Evaluation

32
Rule Refinement Example
Automatic Rule Adaptation
  • Change word order
  • SL Gaudí was a great artist
  • TL Gaudí era un artista grande
  • Corrected TL (TL) Gaudí era un gran artista

33
Automatic Rule Adaptation
1. Error Information
Elicitation
Refinement Operation Typology
34
2. Variable Instantiation from Log File
Automatic Rule Adaptation
  • Correcting Actions
  • 1. Word order change (artista grande ? grande
    artista)
  • Wi grande
  • 2. Edited grande into gran
  • Wi gran
  • identified artist as clue word ? Wc artist
  • In this case, even if user had not identified Wc,
    refinement process would have been the same

35
3. Retrieve Relevant Lexical Entries
Automatic Rule Adaptation
  • No lexical entry for great ? gran
  • Duplicate lexical entry great-grande and
    change TL side
  • ADJADJ great -gt gran
  • ((X1Y1)
  • ((x0 form) great)
  • ((y0 agr num) sg)
  • ((y0 agr gen) masc))
  • (Morphological analyzer grande gran)

ADJADJ great -gt grande ((X1Y1) ((x0
form) great) ((y0 agr num) sg) ((y0 agr gen)
masc))
36
4. Finding Triggering Feature(s)
Automatic Rule Adaptation
  • Feature ? function ?(Wi, Wi) ?
  • ? need to postulate a new binary feature
    feat1
  • 5. Blame assignment
  • tree lt((S,1 (NP,2 (N,51 "GAUDI") )
  • (VP,3 (VB,2 (AUX,172 "ERA") )
  • (NP,8 (DET,03 "UN")
  • (N,45 "ARTISTA")
  • (ADJ,54 "GRANDE")
    ) ) ) )gt

37
6. Variable Instantiation in the Rules
Automatic Rule Adaptation
  • Wi grande ? POSi ADJ Y3, y3
  • Wc artista ? POSc N Y2, y2
  • NP,8
  • NPNP DET ADJ N -gt DET N ADJ
  • ( (X1Y1) (X2Y3) (X3Y2)
  • ((x0 def) (x1 def))
  • (x0 x3)
  • ((y1 agr) (y2 agr)) det-noun agreement
  • ((y3 agr) (y2 agr)) adj-noun agreement
  • (y2 x3) )

(R0)
38
7. Refining Rules
Automatic Rule Adaptation
  • NP,8
  • NPNP DET ADJ N -gt DET ADJ N
  • ( (X1Y1) (X2Y2) (X3Y3)
  • ((x0 def) (x1 def))
  • (x0 x3)
  • ((y1 agr) (y3 agr)) det-noun agreement
  • ((y2 agr) (y3 agr)) adj-noun agreement
  • (y2 x3)
  • ((y2 feat1) c ))

(R1)
39
8. Refining Lexical Entries
Automatic Rule Adaptation
  • ADJADJ great -gt grande
  • ((X1Y1)
  • ((x0 form) great)
  • ((y0 agr num) sg)
  • ((y0 agr gen) masc)
  • ((y0 feat1) -))
  • ADJADJ great -gt gran
  • ((X1Y1)
  • ((x0 form) great)
  • ((y0 agr num) sg)
  • ((y0 agr gen) masc)
  • ((y0 feat1) ))

40
Done? Not yet
Automatic Rule Adaptation
  • NP,8 (R0) ADJ(grande)
  • feat1 -
  • NP,8 (R1) ADJ(gran)
  • feat1 c feat1
  • Need to restrict application of general rule (R0)
    to just post-nominal ADJ

un artista grande un artista gran un gran artista
un grande artista
41
Add Blocking Constraint
Automatic Rule Adaptation
  • NP,8 (R0) ADJ(grande)
  • feat1 - feat1 -
  • NP,8 (R1) ADJ(gran)
  • feat1 c feat1
  • Can we also eliminate incorrect translations
    automatically?

un artista grande un artista gran un gran
artista un grande artista
42
Making the grammar tighter
Automatic Rule Adaptation
  • If Wc artista
  • Add feat1 to N(artista)
  • Add agreement constraint to NP,8 (R0) between N
    and ADJ ((N feat1) (ADJ feat1))

un artista grande un artista gran un gran
artista un grande artista
43
Batch Mode Implementation
Automatic Rule Adaptation
Proposed Work
  • For Refinement operations of errors that can be
    refined
  • Fully automatically just by using correction
    information (Focus 1)
  • Fully automatically using correction and error
    information (Focus 2)

44
Rule Refinement Operations
45
Focus 1
Rule Refinement Operations
It is a nice house Es una casa bonito
? Es una casa bonita
John and Mary fell Juan y Maria ? cayeron
? Juan y Maria se cayeron
46
Focus 1
Rule Refinement Operations
J y M cayeron ? J y M se cayeron
Es una casa bonito ? Es una casa bonita
Gaudi was a great artist Gaudi era un artista
grande ? Gaudi era un gran artista
I will help him fix the car Ayudaré a él a
arreglar el auto ? Le ayudare a
arreglar el auto
47
Focus 1
Rule Refinement Operations
I would like to go Me gustaria que ir
? Me gustaria ? ir
I will help him fix the car Ayudaré a él a
arreglar el auto ? Le ayudare a
arreglar el auto
48
Focus 1 2
Rule Refinement Operations
PP ? PREP NP
I am proud of you Estoy orgullosa tu
? Estoy orgullosa de ti
49
Interactive Mode Implementation
Automatic Rule Adaptation
Proposed Work
  • Extra error information is required
  • More sentences need to be evaluated (and
    corrected) by users
  • Relevant Minimal Pairs (MP)
  • Focus 3 types of errors that require a
    reasonable amount of further user interaction and
    can be solved by available correction and error
    information.

50
Focus 3
Rule Refinement Operations
Wally plays the guitar Wally juega la
guitarra ? Wally toca la
guitarra
I saw the woman Vi ? la mujer ?
Vi a la mujer
I see them Veo los ? Los veo
51
Example Requiring Minimal Pair
Automatic Rule Adaptation
Proposed Work
  • 1. Run SL sentence through the transfer engine
  • I see them ? veo los ? los veo
  • 2. Wi los but no Wi nor Wc
  • Need a minimal pair to determine appropriate
    refinement
  • I see cars ? veo autos
  • 3. Triggering feature(s) ?(los,autos) pos
  • PRON(los)pospron N(autos)posn

52
Refining and Adding Constraints
Proposed Work
  • VP,3 VP NP ? VP NP
  • VP,3 VP NP ? NP VP NP pos c pron
  • Percolate triggering features up to the
    constituent level
  • NP PRON ? PRON NP pos PRON pos
  • Block application of general rule (VP,3)
  • VP,3 VP NP ? VP NP NP pos (NOT pron)

53
Generalization Power
  • Have other example sentences with same error that
    would be translated correctly after refinement!

54
Outside Scope of Thesis
Rule Refinement Operations
John read the book A Juan leyó el libro
? ? Juan leyó el libro
Where are you from? Donde eres tu de?
? De donde eres tu?
55
User Studies
Proposed Work
  • TCTool new MT classification (Eng2Spa)
  • Different language pair (Mapudungun or Quechua ?
    Spanish)
  • Manual vs Learned grammars
  • Batch vs Interactive mode (Active Learning)
  • Amount of error information elicited

56
User Studies Map
Proposed Work
Mapu2Spa
57
Data Set
Proposed Work
  • Split development set (400 sentence) into
  • Dev set ? Run User Studies
  • Develop Refinement Module
  • Validate functionality
  • Test set ? Evaluate effect of Refinement
    operations
  • Wild test set (from naturally occurring text)
  • Requirement need to be fully parsed by grammar

58
Evaluation of Refined MT System
  • Evaluate best translation ? Automatic evaluation
    metrics (BLEU, NIST, METEOR)
  • Evaluate candidate list ? precision (parsimony)

59
Evaluate Best translation
  • Hypothesis file (translations to be evaluated
    automatically)
  • Raw MT output
  • Best sentence (picked by user to be correct or
    requiring the least amount of correction)
  • Refined MT output
  • Use METOR score at sentence level to pick best
    candidate from the list
  • ? Run all automatic metrics on the new hypothesis
    file using user corrections as reference
    translations.

60
Evaluate Candidate List
  • Precision tp binary 0,1
  • tp fp total number of TC

SL TL TL TL TL TL TL
?
?

61
Expected Contributions
  • An efficient online GUI to display translations
    and alignments and solicit pinpoint fixes from
    non-expert bilingual users.
  • An expandable set of rule refinement operators
  • triggered by user corrections,
  • to automatically refine and expand different
    types of grammars.
  • A mechanism to automatically evaluate rule
    refinements with user corrections as reference
    translations.

62
Thesis Timeline
  • Research components Duration (months)
  • Back-end implementation 7
  • User Studies 3
  • Resource-poor language (data manual grammar) 2
  • Adapt system to new language pair 1
  • Active Learning methods 1
  • Evaluation 1
  • Write and defend thesis 3
  • Total 18

Expected graduation date May 2006
63
References
  • Add references
  • Related work
  • Probst et al. 2002
  • AL

64
Thanks!
  • Questions?

65
Some Questions
  • Is the refinement process deterministic?

66
Others
  • TCTool Demo Simulation
  • RR operation patterns
  • Automatic Evaluation feasibility study
  • AMTA paper results
  • BLEU, NIST and METEOR
  • Precision, recall and F1

67
Automatic Rule Adaptation
68
Automatic Rule Adaptation
SL best TL picked by user
69
Automatic Rule Adaptation
Changing word order
70
Automatic Rule Adaptation
Changing grande into gran
71
Automatic Rule Adaptation
Back to main
72
Automatic Rule Adaptation
1
2
3
73
Input to RR module
Automatic Rule Adaptation
  • User correction log file
  • Transfer engine output ( parse tree)

sl I see them tl VEO LOS tree lt((S,0 (VP,3
(VP,1 (V,12 "VEO") ) (NP,0 (PRON,23
"LOS") ) ) ) )gt
sl I see cars tl VEO AUTOS tree lt((S,0 (VP,3
(VP,1 (V,12 "VEO") ) (NP,2 (N,13
AUTOS") ) ) ) )gt
74
Types of RR Operations
Automatic Rule Adaptation
Completed Work
  • Grammar
  • R0 ? R0 R1 R0 constr CovR0 ?
    CovR0,R1
  • R0 ? R1R0 constr -
  • ? R2R0 constrc CovR0 ?
    CovR1,R2
  • R0 ? R1 R0 constr CovR0 ? CovR1
  • Lexicon
  • Lex0 ? Lex0 Lex1Lex0 constr
  • Lex0 ? Lex1Lex0 constr
  • Lex0 ? Lex0 Lex1?Lex0 ? TLword
  • ? ? Lex1 (adding lexical item)

bifurcate
refine
75
Manual vs Learned Grammars
Automatic Rule Adaptation
  • Manual inspection
  • Automatic MT Evaluation
  • AMTA 2004

NIST BLEU METEOR
Manual grammar 4.3 0.16 0.6
Learned grammar 3.7 0.14 0.55
76
Human Oracle experiment
Automatic Rule Adaptation
Completed Work
  • As a feasibility experiment, compared raw output
    with manually corrected MT
  • statistically significant (confidence interval
    test)
  • These is an upper-bound on how much difference we
    should expect any refinement approach to make.

77
Active Learning
Automatic Rule Adaptation
Proposed Work
  • Minimize the number of examples a human annotator
    must label Cohn et al. 1994 usually by
    processing examples in order of usefulness.
  • .
  • Minimize the number of Minimal Pairs presented to
    users

78
Order deterministic?
  • Application of Rule Refinement operations is not
    deterministic, it directly depends on
  • The order in which it sees the corrected
    sentences
  • Example
  • 1st agr constraint bifurcate (WWO)
  • C-set
  • Reverse order
  • C-set (!)
Write a Comment
User Comments (0)
About PowerShow.com