Statistical XFER: Hybrid Statistical Rule-based Machine Translation - PowerPoint PPT Presentation

About This Presentation

Title:

Statistical XFER: Hybrid Statistical Rule-based Machine Translation

Description:

SL: the old man, TL: ha-ish ha-zaqen. NP::NP [DET ADJ N] - [DET N DET ADJ] (X1::Y1) ... Automatic extraction of 'clean' base NPs from parallel data ... – PowerPoint PPT presentation

Number of Views:94

Avg rating:3.0/5.0

Slides: 38

Provided by: AlonL

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Statistical XFER: Hybrid Statistical Rule-based Machine Translation

1
Statistical XFERHybrid Statistical Rule-based
Machine Translation

Alon Lavie
Language Technologies Institute
Carnegie Mellon University
Joint work with
Jaime Carbonell, Lori Levin, Bob Frederking, Erik
Peterson, Christian Monson, Vamshi Ambati, Greg
Hanneman, Kathrin Probst, Ariadna Font-Llitjos,
Alison Alvarez, Roberto Aranovich

2
Outline

Background and Rationale
Stat-XFER Framework Overview
Elicitation
Learning Transfer Rules
Automatic Rule Refinement
Example Prototypes
Major Research Challenges

3
Progression of MT

Started with rule-based systems
Very large expert human effort to construct
language-specific resources (grammars, lexicons)
High-quality MT extremely expensive ? only for
handful of language pairs
Along came EBMT and then Statistical MT
Replaced human effort with extremely large
volumes of parallel text data
Less expensive, but still only feasible for a
small number of language pairs
We traded human labor with data
Where does this take us in 5-10 years?
Large parallel corpora for maybe 25-50 language
pairs
What about all the other languages?
Is all this data (with very shallow
representation of language structure) really
necessary?
Can we build MT approaches that learn deeper
levels of language structure and how they map
from one language to another?

4
Rule-based vs. Statistical MT

Traditional Rule-based MT
Expressive and linguistically-rich formalisms
capable of describing complex mappings between
the two languages
Accurate clean resources
Everything constructed manually by experts
Main challenge obtaining broad coverage
Phrase-based Statistical MT
Learn word and phrase correspondences
automatically from large volumes of parallel data
Search-based decoding framework
Models propose many alternative translations
Effective search algorithms find the best
translation
Main challenge obtaining high translation
accuracy

5
Main Principles of Stat-XFER

Integrate the major strengths of rule-based and
statistical MT within a common framework
Linguistically rich formalism that can express
complex and abstract compositional transfer rules
Rules can be written by human experts and also
acquired automatically from data
Easy integration of morphological analyzers and
generators
Word and basic phrase correspondences (i.e. base
NPs) can be automatically acquired from parallel
text when available
Search-based decoding from statistical MT adapted
to find the best translation within the search
space multi-feature scoring, beam-search,
parameter optimization, etc.
Framework suitable for both resource-rich and
resource-poor language scenarios

6
Stat-XFER MT Approach

Interlingua

Semantic Analysis
Sentence Planning
Syntactic Parsing
Text Generation
Transfer Rules
Statistical-XFER
Source (e.g. Quechua)
Target (e.g. English)
Direct SMT, EBMT
7
(No Transcript)
8
Transfer Rule Formalism
SL the old man, TL ha-ish ha-zaqen NPNP
DET ADJ N -gt DET N DET ADJ ( (X1Y1) (X1Y3)
(X2Y4) (X3Y2) ((X1 AGR) 3-SING) ((X1 DEF
DEF) ((X3 AGR) 3-SING) ((X3 COUNT)
) ((Y1 DEF) DEF) ((Y3 DEF) DEF) ((Y2 AGR)
3-SING) ((Y2 GENDER) (Y4 GENDER)) )

Type information
Part-of-speech/constituent information
Alignments
x-side constraints
y-side constraints
xy-constraints,
e.g. ((Y1 AGR) (X1 AGR))

9
Transfer Rule Formalism (II)
SL the old man, TL ha-ish ha-zaqen NPNP
DET ADJ N -gt DET N DET ADJ ( (X1Y1) (X1Y3)
(X2Y4) (X3Y2) ((X1 AGR) 3-SING) ((X1 DEF
DEF) ((X3 AGR) 3-SING) ((X3 COUNT)
) ((Y1 DEF) DEF) ((Y3 DEF) DEF) ((Y2 AGR)
3-SING) ((Y2 GENDER) (Y4 GENDER)) )

Value constraints
Agreement constraints

10
Hebrew Manual Transfer Grammar (human-developed)

Initially developed in a couple of days, with
some later revisions by a CL post-doc
Current grammar has 36 rules
21 NP rules
one PP rule
6 verb complexes and VP rules
8 higher-phrase and sentence-level rules
Captures the most common (mostly local)
structural differences between Hebrew and English

11
Hebrew Transfer GrammarExample Rules
NP1,2 SL MLH ADWMH TL A RED
DRESS NP1NP1 NP1 ADJ -gt ADJ
NP1 ( (X2Y1) (X1Y2) ((X1 def) -) ((X1
status) c absolute) ((X1 num) (X2 num)) ((X1
gen) (X2 gen)) (X0 X1) )
NP1,3 SL H MLWT H ADWMWT TL THE RED
DRESSES NP1NP1 NP1 "H" ADJ -gt ADJ
NP1 ( (X3Y1) (X1Y2) ((X1 def) ) ((X1
status) c absolute) ((X1 num) (X3 num)) ((X1
gen) (X3 gen)) (X0 X1) )
12
The XFER Engine

Input source-language input sentence, or
source-language confusion network
Output lattice representing collection of
translation fragments at all levels supported by
transfer rules
Basic Algorithm bottom-up integrated
parsing-transfer-generation guided by the
transfer rules
Start with translations of individual words and
phrases from translation lexicon
Create translations of larger constituents by
applying applicable transfer rules to previously
created lattice entries
Beam-search controls the exponential
combinatorics of the search-space, using multiple
scoring features

13
Source-language Confusion Network Hebrew Example

Input word BWRH
0 1 2 3 4
--------BWRH--------
-----B-----WR--H--
--B---H----WRH---

14
XFER Output Lattice
(28 28 "AND" -5.6988 "W" "(CONJ,0 'AND')") (29 29
"SINCE" -8.20817 "MAZ " "(ADVP,0 (ADV,5 'SINCE'))
") (29 29 "SINCE THEN" -12.0165 "MAZ " "(ADVP,0
(ADV,6 'SINCE THEN')) ") (29 29 "EVER SINCE"
-12.5564 "MAZ " "(ADVP,0 (ADV,4 'EVER SINCE'))
") (30 30 "WORKED" -10.9913 "BD " "(VERB,0 (V,11
'WORKED')) ") (30 30 "FUNCTIONED" -16.0023 "BD "
"(VERB,0 (V,10 'FUNCTIONED')) ") (30 30
"WORSHIPPED" -17.3393 "BD " "(VERB,0 (V,12
'WORSHIPPED')) ") (30 30 "SERVED" -11.5161 "BD "
"(VERB,0 (V,14 'SERVED')) ") (30 30 "SLAVE"
-13.9523 "BD " "(NP0,0 (N,34 'SLAVE')) ") (30 30
"BONDSMAN" -18.0325 "BD " "(NP0,0 (N,36
'BONDSMAN')) ") (30 30 "A SLAVE" -16.8671 "BD "
"(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NP0,0
(N,34 'SLAVE')) ) ) ) ") (30 30 "A BONDSMAN"
-21.0649 "BD " "(NP,1 (LITERAL 'A') (NP2,0
(NP1,0 (NP0,0 (N,36 'BONDSMAN')) ) ) ) ")
15
The Lattice Decoder

Simple Stack Decoder, similar in principle to
simple Statistical MT decoders
Searches for best-scoring path of non-overlapping
lattice arcs
No reordering during decoding
Scoring based on log-linear combination of
scoring components, with weights trained using
MERT
Scoring components
Statistical Language Model
Fragmentation how many arcs to cover the entire
translation?
Length Penalty
Rule Scores
Lexical Probabilities

16
XFER Lattice Decoder
0 0 ON THE FOURTH DAY THE LION ATE THE RABBIT
TO A MORNING MEAL Overall -8.18323, Prob
-94.382, Rules 0, Frag 0.153846, Length 0,
Words 13,13 235 lt 0 8 -19.7602 B H IWM RBII
(PP,0 (PREP,3 'ON')(NP,2 (LITERAL 'THE') (NP2,0
(NP1,1 (ADJ,2 (QUANT,0 'FOURTH'))(NP1,0 (NP0,1
(N,6 'DAY')))))))gt 918 lt 8 14 -46.2973 H ARIH
AKL AT H PN (S,2 (NP,2 (LITERAL 'THE') (NP2,0
(NP1,0 (NP0,1 (N,17 'LION')))))(VERB,0 (V,0
'ATE'))(NP,100 (NP,2 (LITERAL 'THE') (NP2,0
(NP1,0 (NP0,1 (N,24 'RABBIT')))))))gt 584 lt 14 17
-30.6607 L ARWXH BWQR (PP,0 (PREP,6 'TO')(NP,1
(LITERAL 'A') (NP2,0 (NP1,0 (NNP,3 (NP0,0 (N,32
'MORNING'))(NP0,0 (N,27 'MEAL')))))))gt
17
Data Elicitation for Languages with Limited
Resources

Rationale
Large volumes of parallel text not available ?
create a small maximally-diverse parallel corpus
that directly supports the learning task
Bilingual native informant(s) can translate and
align a small pre-designed elicitation corpus,
using elicitation tool
Elicitation corpus designed to be typologically
and structurally comprehensive and compositional
Transfer-rule engine and new learning approach
support acquisition of generalized transfer-rules
from the data

18
Elicitation Tool English-Chinese Example
19
Elicitation ToolEnglish-Chinese Example
20
Elicitation ToolEnglish-Hindi Example
21
Elicitation ToolEnglish-Arabic Example
22
Elicitation ToolSpanish-Mapudungun Example
23
Designing Elicitation Corpora

Goal Create a small representative parallel
corpus that contains examples of the most
important translation correspondences and
divergences between the two languages
Method
Elicit translations and word alignments for a
broad diversity of linguistic phenomena and
constructions
Current Elicitation Corpus 3100 sentences and
phrases, constructed based on a broad
feature-based specification
Open Research Issues
Feature Detection discover what features exist
in the language and where/how they are marked
Example does the language mark gender of nouns?
How and where are these marked?
Dynamic corpus navigation based on feature
detection no need to elicit for combinations
involving non-existent features

24
Rule Learning - Overview

Goal Acquire Syntactic Transfer Rules
Use available knowledge from the source side
(grammatical structure)
Three steps
Flat Seed Generation first guesses at transfer
rules flat syntactic structure
Compositionality Learning use previously learned
rules to learn hierarchical structure
Constraint Learning refine rules by learning
appropriate feature constraints

25
Flat Seed Rule Generation
Learning Example NP Eng the big apple Heb ha-tapuax ha-gadol
Generated Seed Rule NPNP ART ADJ N ? ART N ART ADJ ((X1Y1) (X1Y3) (X2Y4) (X3Y2))
26
Compositionality Learning
Initial Flat Rules SS ART ADJ N V ART N ? ART N ART ADJ V P ART N ((X1Y1) (X1Y3) (X2Y4) (X3Y2) (X4Y5) (X5Y7) (X6Y8)) NPNP ART ADJ N ? ART N ART ADJ ((X1Y1) (X1Y3) (X2Y4) (X3Y2)) NPNP ART N ? ART N ((X1Y1) (X2Y2))
Generated Compositional Rule SS NP V NP ? NP V P NP ((X1Y1) (X2Y2) (X3Y4))
27
Constraint Learning
Input Rules and their Example Sets SS NP V NP ? NP V P NP ex1,ex12,ex17,ex26 ((X1Y1) (X2Y2) (X3Y4)) NPNP ART ADJ N ? ART N ART ADJ ex2,ex3,ex13 ((X1Y1) (X1Y3) (X2Y4) (X3Y2)) NPNP ART N ? ART N ex4,ex5,ex6,ex8,ex10,ex11 ((X1Y1) (X2Y2))
Output Rules with Feature Constraints SS NP V NP ? NP V P NP ((X1Y1) (X2Y2) (X3Y4) (X1 NUM X2 NUM) (Y1 NUM Y2 NUM) (X1 NUM Y1 NUM))
28
Automated Rule Refinement

Bilingual informants can identify translation
errors and pinpoint the errors
A sophisticated trace of the translation path can
identify likely sources for the error and do
Blame Assignment
Rule Refinement operators can be developed to
modify the underlying translation grammar (and
lexicon) based on characteristics of the error
source
Add or delete feature constraints from a rule
Bifurcate a rule into two rules (general and
specific)
Add or correct lexical entries
See Font-Llitjos, Carbonell Lavie, 2005

29
Stat-XFER MT Prototypes

General Statistical XFER framework under
development for past five years (funded by NSF
and DARPA)
Prototype systems so far
Chinese-to-English
Dutch-to-English
French-to-English
Hindi-to-English
Hebrew-to-English
Mapudungun-to-Spanish
In progress or planned
Brazilian Portuguese-to-English
Native-Brazilian languages to Brazilian
Portuguese
Hebrew-to-Arabic
Iñupiaq-to-English
Urdu-to-English
Turkish-to-English

30
Chinese-English Stat-XFER System

Bilingual lexicon over 1.1 million entries
(multiple resources, incl. ADSO, Wikipedia,
extracted base NPs)
Manual syntactic XFER grammar 76 rules! (mostly
NPs, a few PPs, and reordering of NPs/PPs within
VPs)
Multiple overlapping Chinese word segmentations
English morphology generation
Uses CMU SMT-groups Suffix-Array LM toolkit for
LM
Current Performance (GALE dev-test)
NW
XFER 10.89(B)/0.4509(M)
Best (UMD) 15.58(B)/0.4769(M)
NG
XFER 8.92(B)/0.4229(M)
Best (UMD) 12.96(B)/0.4455(M)
In Progress
Automatic extraction of clean base NPs from
parallel data
Automatic learning and extraction of high-quality
transfer-rules from parallel data

31
Translation Example

REFERENCE When responding to whether it is
possible to extend Russian fleet's stationing
deadline at the Crimean peninsula, Yanukovych
replied, "Without a doubt.
Stat-XFER (0.3989) In reply to whether the
possibility to extend the Russian fleet stationed
in Crimea Pen. left the deadline of the problem ,
Yanukovich replied " of course .
IBM-ylee (0.2203) In response to the
possibility to extend the deadline for the
presence in Crimea peninsula , the Queen Vic said
" of course .
CMU-SMT (0.2067) In response to a possible
extension of the fleet in the Crimean Peninsula
stay on the issue , Yanukovych vetch replied "
of course .
maryland-hiero (0.1878) In response to the
possibility of extending the mandate of the
Crimean peninsula in , replied "of course.
IBM-smt (0.1862) The answer is likely to be
extended the Crimean peninsula of the presence of
the problem, Yanukovych said " Of course.
CMU-syntax (0.1639) In response to the
possibility of extension of the presence in the
Crimean Peninsula , replied " of course .

32
Major Research Directions

Automatic Transfer Rule Learning
From manually word-aligned elicitation corpus
From large volumes of automatically word-aligned
wild parallel data
In the absence of morphology or POS annotated
lexica
Compositionality and generalization
Identifying good rules from bad rules
Effective models for rule scoring for
Decoding using scores at runtime
Pruning the large collections of learned rules
Learning Unification Constraints

33
Major Research Directions

Extraction of Base-NP translations from parallel
data
Base-NPs are extremely important building
blocks for transfer-based MT systems
Frequent, often align 1-to-1, improve coverage
Correctly identifying them greatly helps
automatic word-alignment of parallel sentences
Parsers (or NP-chunkers) available for both
languages Extract base-NPs independently on
both sides and find their correspondences
Parsers (or NP-chunkers) available for only one
language (i.e. English) Extract base-NPs on one
side, and find reliable correspondences for them
using word-alignment, frequency distributions,
other features
Promising preliminary results

34
Major Research Directions

Algorithms for XFER and Decoding
Integration and optimization of multiple features
into search-based XFER parser
Complexity and efficiency improvements (i.e.
Cube Pruning)
Non-monotonicity issues (LM scores, unification
constraints) and their consequences on search

35
Major Research Directions

Discriminative Language Modeling for MT
Current standard statistical LMs provide only
weak discrimination between good and bad
translation hypotheses
New Idea Use occurrence-based statistics
Extract instances of lexical, syntactic and
semantic features from each translation
hypothesis
Determine whether these instances have been seen
before (at least once) in a large monolingual
corpus
The Conjecture more grammatical MT hypotheses
are likely to contain higher proportions of
feature instances that have been seen in a corpus
of grammatical sentences.
Goals
Find the set of features that provides the best
discrimination between good and bad translations
Learn how to combine these into a LM-like
function for scoring alternative MT hypotheses