Title: Statistical XFER: Hybrid Statistical Rule-based Machine Translation
1Statistical XFERHybrid Statistical Rule-based
Machine Translation
- Alon Lavie
- Language Technologies Institute
- Carnegie Mellon University
- Joint work with
- Jaime Carbonell, Lori Levin, Bob Frederking, Erik
Peterson, Christian Monson, Vamshi Ambati, Greg
Hanneman, Kathrin Probst, Ariadna Font-Llitjos,
Alison Alvarez, Roberto Aranovich
2Outline
- Background and Rationale
- Stat-XFER Framework Overview
- Elicitation
- Learning Transfer Rules
- Automatic Rule Refinement
- Example Prototypes
- Major Research Challenges
3Progression of MT
- Started with rule-based systems
- Very large expert human effort to construct
language-specific resources (grammars, lexicons) - High-quality MT extremely expensive ? only for
handful of language pairs - Along came EBMT and then Statistical MT
- Replaced human effort with extremely large
volumes of parallel text data - Less expensive, but still only feasible for a
small number of language pairs - We traded human labor with data
- Where does this take us in 5-10 years?
- Large parallel corpora for maybe 25-50 language
pairs - What about all the other languages?
- Is all this data (with very shallow
representation of language structure) really
necessary? - Can we build MT approaches that learn deeper
levels of language structure and how they map
from one language to another?
4Rule-based vs. Statistical MT
- Traditional Rule-based MT
- Expressive and linguistically-rich formalisms
capable of describing complex mappings between
the two languages - Accurate clean resources
- Everything constructed manually by experts
- Main challenge obtaining broad coverage
- Phrase-based Statistical MT
- Learn word and phrase correspondences
automatically from large volumes of parallel data - Search-based decoding framework
- Models propose many alternative translations
- Effective search algorithms find the best
translation - Main challenge obtaining high translation
accuracy
5Main Principles of Stat-XFER
- Integrate the major strengths of rule-based and
statistical MT within a common framework - Linguistically rich formalism that can express
complex and abstract compositional transfer rules - Rules can be written by human experts and also
acquired automatically from data - Easy integration of morphological analyzers and
generators - Word and basic phrase correspondences (i.e. base
NPs) can be automatically acquired from parallel
text when available - Search-based decoding from statistical MT adapted
to find the best translation within the search
space multi-feature scoring, beam-search,
parameter optimization, etc. - Framework suitable for both resource-rich and
resource-poor language scenarios
6Stat-XFER MT Approach
Semantic Analysis
Sentence Planning
Syntactic Parsing
Text Generation
Transfer Rules
Statistical-XFER
Source (e.g. Quechua)
Target (e.g. English)
Direct SMT, EBMT
7(No Transcript)
8Transfer Rule Formalism
SL the old man, TL ha-ish ha-zaqen NPNP
DET ADJ N -gt DET N DET ADJ ( (X1Y1) (X1Y3)
(X2Y4) (X3Y2) ((X1 AGR) 3-SING) ((X1 DEF
DEF) ((X3 AGR) 3-SING) ((X3 COUNT)
) ((Y1 DEF) DEF) ((Y3 DEF) DEF) ((Y2 AGR)
3-SING) ((Y2 GENDER) (Y4 GENDER)) )
- Type information
- Part-of-speech/constituent information
- Alignments
- x-side constraints
- y-side constraints
- xy-constraints,
- e.g. ((Y1 AGR) (X1 AGR))
9Transfer Rule Formalism (II)
SL the old man, TL ha-ish ha-zaqen NPNP
DET ADJ N -gt DET N DET ADJ ( (X1Y1) (X1Y3)
(X2Y4) (X3Y2) ((X1 AGR) 3-SING) ((X1 DEF
DEF) ((X3 AGR) 3-SING) ((X3 COUNT)
) ((Y1 DEF) DEF) ((Y3 DEF) DEF) ((Y2 AGR)
3-SING) ((Y2 GENDER) (Y4 GENDER)) )
- Value constraints
-
- Agreement constraints
10Hebrew Manual Transfer Grammar (human-developed)
- Initially developed in a couple of days, with
some later revisions by a CL post-doc - Current grammar has 36 rules
- 21 NP rules
- one PP rule
- 6 verb complexes and VP rules
- 8 higher-phrase and sentence-level rules
- Captures the most common (mostly local)
structural differences between Hebrew and English
11Hebrew Transfer GrammarExample Rules
NP1,2 SL MLH ADWMH TL A RED
DRESS NP1NP1 NP1 ADJ -gt ADJ
NP1 ( (X2Y1) (X1Y2) ((X1 def) -) ((X1
status) c absolute) ((X1 num) (X2 num)) ((X1
gen) (X2 gen)) (X0 X1) )
NP1,3 SL H MLWT H ADWMWT TL THE RED
DRESSES NP1NP1 NP1 "H" ADJ -gt ADJ
NP1 ( (X3Y1) (X1Y2) ((X1 def) ) ((X1
status) c absolute) ((X1 num) (X3 num)) ((X1
gen) (X3 gen)) (X0 X1) )
12The XFER Engine
- Input source-language input sentence, or
source-language confusion network - Output lattice representing collection of
translation fragments at all levels supported by
transfer rules - Basic Algorithm bottom-up integrated
parsing-transfer-generation guided by the
transfer rules - Start with translations of individual words and
phrases from translation lexicon - Create translations of larger constituents by
applying applicable transfer rules to previously
created lattice entries - Beam-search controls the exponential
combinatorics of the search-space, using multiple
scoring features
13Source-language Confusion Network Hebrew Example
- Input word BWRH
- 0 1 2 3 4
- --------BWRH--------
- -----B-----WR--H--
- --B---H----WRH---
-
14XFER Output Lattice
(28 28 "AND" -5.6988 "W" "(CONJ,0 'AND')") (29 29
"SINCE" -8.20817 "MAZ " "(ADVP,0 (ADV,5 'SINCE'))
") (29 29 "SINCE THEN" -12.0165 "MAZ " "(ADVP,0
(ADV,6 'SINCE THEN')) ") (29 29 "EVER SINCE"
-12.5564 "MAZ " "(ADVP,0 (ADV,4 'EVER SINCE'))
") (30 30 "WORKED" -10.9913 "BD " "(VERB,0 (V,11
'WORKED')) ") (30 30 "FUNCTIONED" -16.0023 "BD "
"(VERB,0 (V,10 'FUNCTIONED')) ") (30 30
"WORSHIPPED" -17.3393 "BD " "(VERB,0 (V,12
'WORSHIPPED')) ") (30 30 "SERVED" -11.5161 "BD "
"(VERB,0 (V,14 'SERVED')) ") (30 30 "SLAVE"
-13.9523 "BD " "(NP0,0 (N,34 'SLAVE')) ") (30 30
"BONDSMAN" -18.0325 "BD " "(NP0,0 (N,36
'BONDSMAN')) ") (30 30 "A SLAVE" -16.8671 "BD "
"(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NP0,0
(N,34 'SLAVE')) ) ) ) ") (30 30 "A BONDSMAN"
-21.0649 "BD " "(NP,1 (LITERAL 'A') (NP2,0
(NP1,0 (NP0,0 (N,36 'BONDSMAN')) ) ) ) ")
15The Lattice Decoder
- Simple Stack Decoder, similar in principle to
simple Statistical MT decoders - Searches for best-scoring path of non-overlapping
lattice arcs - No reordering during decoding
- Scoring based on log-linear combination of
scoring components, with weights trained using
MERT - Scoring components
- Statistical Language Model
- Fragmentation how many arcs to cover the entire
translation? - Length Penalty
- Rule Scores
- Lexical Probabilities
16XFER Lattice Decoder
0 0 ON THE FOURTH DAY THE LION ATE THE RABBIT
TO A MORNING MEAL Overall -8.18323, Prob
-94.382, Rules 0, Frag 0.153846, Length 0,
Words 13,13 235 lt 0 8 -19.7602 B H IWM RBII
(PP,0 (PREP,3 'ON')(NP,2 (LITERAL 'THE') (NP2,0
(NP1,1 (ADJ,2 (QUANT,0 'FOURTH'))(NP1,0 (NP0,1
(N,6 'DAY')))))))gt 918 lt 8 14 -46.2973 H ARIH
AKL AT H PN (S,2 (NP,2 (LITERAL 'THE') (NP2,0
(NP1,0 (NP0,1 (N,17 'LION')))))(VERB,0 (V,0
'ATE'))(NP,100 (NP,2 (LITERAL 'THE') (NP2,0
(NP1,0 (NP0,1 (N,24 'RABBIT')))))))gt 584 lt 14 17
-30.6607 L ARWXH BWQR (PP,0 (PREP,6 'TO')(NP,1
(LITERAL 'A') (NP2,0 (NP1,0 (NNP,3 (NP0,0 (N,32
'MORNING'))(NP0,0 (N,27 'MEAL')))))))gt
17Data Elicitation for Languages with Limited
Resources
- Rationale
- Large volumes of parallel text not available ?
create a small maximally-diverse parallel corpus
that directly supports the learning task - Bilingual native informant(s) can translate and
align a small pre-designed elicitation corpus,
using elicitation tool - Elicitation corpus designed to be typologically
and structurally comprehensive and compositional - Transfer-rule engine and new learning approach
support acquisition of generalized transfer-rules
from the data
18Elicitation Tool English-Chinese Example
19Elicitation ToolEnglish-Chinese Example
20Elicitation ToolEnglish-Hindi Example
21Elicitation ToolEnglish-Arabic Example
22Elicitation ToolSpanish-Mapudungun Example
23Designing Elicitation Corpora
- Goal Create a small representative parallel
corpus that contains examples of the most
important translation correspondences and
divergences between the two languages - Method
- Elicit translations and word alignments for a
broad diversity of linguistic phenomena and
constructions - Current Elicitation Corpus 3100 sentences and
phrases, constructed based on a broad
feature-based specification - Open Research Issues
- Feature Detection discover what features exist
in the language and where/how they are marked - Example does the language mark gender of nouns?
How and where are these marked? - Dynamic corpus navigation based on feature
detection no need to elicit for combinations
involving non-existent features
24Rule Learning - Overview
- Goal Acquire Syntactic Transfer Rules
- Use available knowledge from the source side
(grammatical structure) - Three steps
- Flat Seed Generation first guesses at transfer
rules flat syntactic structure - Compositionality Learning use previously learned
rules to learn hierarchical structure - Constraint Learning refine rules by learning
appropriate feature constraints
25Flat Seed Rule Generation
Learning Example NP Eng the big apple Heb ha-tapuax ha-gadol
Generated Seed Rule NPNP ART ADJ N ? ART N ART ADJ ((X1Y1) (X1Y3) (X2Y4) (X3Y2))
26Compositionality Learning
Initial Flat Rules SS ART ADJ N V ART N ? ART N ART ADJ V P ART N ((X1Y1) (X1Y3) (X2Y4) (X3Y2) (X4Y5) (X5Y7) (X6Y8)) NPNP ART ADJ N ? ART N ART ADJ ((X1Y1) (X1Y3) (X2Y4) (X3Y2)) NPNP ART N ? ART N ((X1Y1) (X2Y2))
Generated Compositional Rule SS NP V NP ? NP V P NP ((X1Y1) (X2Y2) (X3Y4))
27Constraint Learning
Input Rules and their Example Sets SS NP V NP ? NP V P NP ex1,ex12,ex17,ex26 ((X1Y1) (X2Y2) (X3Y4)) NPNP ART ADJ N ? ART N ART ADJ ex2,ex3,ex13 ((X1Y1) (X1Y3) (X2Y4) (X3Y2)) NPNP ART N ? ART N ex4,ex5,ex6,ex8,ex10,ex11 ((X1Y1) (X2Y2))
Output Rules with Feature Constraints SS NP V NP ? NP V P NP ((X1Y1) (X2Y2) (X3Y4) (X1 NUM X2 NUM) (Y1 NUM Y2 NUM) (X1 NUM Y1 NUM))
28Automated Rule Refinement
- Bilingual informants can identify translation
errors and pinpoint the errors - A sophisticated trace of the translation path can
identify likely sources for the error and do
Blame Assignment - Rule Refinement operators can be developed to
modify the underlying translation grammar (and
lexicon) based on characteristics of the error
source - Add or delete feature constraints from a rule
- Bifurcate a rule into two rules (general and
specific) - Add or correct lexical entries
- See Font-Llitjos, Carbonell Lavie, 2005
29Stat-XFER MT Prototypes
- General Statistical XFER framework under
development for past five years (funded by NSF
and DARPA) - Prototype systems so far
- Chinese-to-English
- Dutch-to-English
- French-to-English
- Hindi-to-English
- Hebrew-to-English
- Mapudungun-to-Spanish
- In progress or planned
- Brazilian Portuguese-to-English
- Native-Brazilian languages to Brazilian
Portuguese - Hebrew-to-Arabic
- Iñupiaq-to-English
- Urdu-to-English
- Turkish-to-English
30Chinese-English Stat-XFER System
- Bilingual lexicon over 1.1 million entries
(multiple resources, incl. ADSO, Wikipedia,
extracted base NPs) - Manual syntactic XFER grammar 76 rules! (mostly
NPs, a few PPs, and reordering of NPs/PPs within
VPs) - Multiple overlapping Chinese word segmentations
- English morphology generation
- Uses CMU SMT-groups Suffix-Array LM toolkit for
LM - Current Performance (GALE dev-test)
- NW
- XFER 10.89(B)/0.4509(M)
- Best (UMD) 15.58(B)/0.4769(M)
- NG
- XFER 8.92(B)/0.4229(M)
- Best (UMD) 12.96(B)/0.4455(M)
- In Progress
- Automatic extraction of clean base NPs from
parallel data - Automatic learning and extraction of high-quality
transfer-rules from parallel data
31Translation Example
- REFERENCE When responding to whether it is
possible to extend Russian fleet's stationing
deadline at the Crimean peninsula, Yanukovych
replied, "Without a doubt. - Stat-XFER (0.3989) In reply to whether the
possibility to extend the Russian fleet stationed
in Crimea Pen. left the deadline of the problem ,
Yanukovich replied " of course . - IBM-ylee (0.2203) In response to the
possibility to extend the deadline for the
presence in Crimea peninsula , the Queen Vic said
" of course . - CMU-SMT (0.2067) In response to a possible
extension of the fleet in the Crimean Peninsula
stay on the issue , Yanukovych vetch replied "
of course . - maryland-hiero (0.1878) In response to the
possibility of extending the mandate of the
Crimean peninsula in , replied "of course. - IBM-smt (0.1862) The answer is likely to be
extended the Crimean peninsula of the presence of
the problem, Yanukovych said " Of course. - CMU-syntax (0.1639) In response to the
possibility of extension of the presence in the
Crimean Peninsula , replied " of course .
32Major Research Directions
- Automatic Transfer Rule Learning
- From manually word-aligned elicitation corpus
- From large volumes of automatically word-aligned
wild parallel data - In the absence of morphology or POS annotated
lexica - Compositionality and generalization
- Identifying good rules from bad rules
- Effective models for rule scoring for
- Decoding using scores at runtime
- Pruning the large collections of learned rules
- Learning Unification Constraints
33Major Research Directions
- Extraction of Base-NP translations from parallel
data - Base-NPs are extremely important building
blocks for transfer-based MT systems - Frequent, often align 1-to-1, improve coverage
- Correctly identifying them greatly helps
automatic word-alignment of parallel sentences - Parsers (or NP-chunkers) available for both
languages Extract base-NPs independently on
both sides and find their correspondences - Parsers (or NP-chunkers) available for only one
language (i.e. English) Extract base-NPs on one
side, and find reliable correspondences for them
using word-alignment, frequency distributions,
other features - Promising preliminary results
34Major Research Directions
- Algorithms for XFER and Decoding
- Integration and optimization of multiple features
into search-based XFER parser - Complexity and efficiency improvements (i.e.
Cube Pruning) - Non-monotonicity issues (LM scores, unification
constraints) and their consequences on search
35Major Research Directions
- Discriminative Language Modeling for MT
- Current standard statistical LMs provide only
weak discrimination between good and bad
translation hypotheses - New Idea Use occurrence-based statistics
- Extract instances of lexical, syntactic and
semantic features from each translation
hypothesis - Determine whether these instances have been seen
before (at least once) in a large monolingual
corpus - The Conjecture more grammatical MT hypotheses
are likely to contain higher proportions of
feature instances that have been seen in a corpus
of grammatical sentences. - Goals
- Find the set of features that provides the best
discrimination between good and bad translations - Learn how to combine these into a LM-like
function for scoring alternative MT hypotheses
36Major Research Directions
- Building Elicitation Corpora
- Feature Detection
- Corpus Navigation
- Automatic Rule Refinement
- Translation for highly polysynthetic languages
such as Mapudungun and Iñupiaq
37Questions?