Statistical XFER: Hybrid Statistical Rule-based Machine Translation - PowerPoint PPT Presentation

About This Presentation
Title:

Statistical XFER: Hybrid Statistical Rule-based Machine Translation

Description:

SL: the old man, TL: ha-ish ha-zaqen. NP::NP [DET ADJ N] - [DET N DET ADJ] (X1::Y1) ... Automatic extraction of 'clean' base NPs from parallel data ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 38
Provided by: AlonL
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Statistical XFER: Hybrid Statistical Rule-based Machine Translation


1
Statistical XFERHybrid Statistical Rule-based
Machine Translation
  • Alon Lavie
  • Language Technologies Institute
  • Carnegie Mellon University
  • Joint work with
  • Jaime Carbonell, Lori Levin, Bob Frederking, Erik
    Peterson, Christian Monson, Vamshi Ambati, Greg
    Hanneman, Kathrin Probst, Ariadna Font-Llitjos,
    Alison Alvarez, Roberto Aranovich

2
Outline
  • Background and Rationale
  • Stat-XFER Framework Overview
  • Elicitation
  • Learning Transfer Rules
  • Automatic Rule Refinement
  • Example Prototypes
  • Major Research Challenges

3
Progression of MT
  • Started with rule-based systems
  • Very large expert human effort to construct
    language-specific resources (grammars, lexicons)
  • High-quality MT extremely expensive ? only for
    handful of language pairs
  • Along came EBMT and then Statistical MT
  • Replaced human effort with extremely large
    volumes of parallel text data
  • Less expensive, but still only feasible for a
    small number of language pairs
  • We traded human labor with data
  • Where does this take us in 5-10 years?
  • Large parallel corpora for maybe 25-50 language
    pairs
  • What about all the other languages?
  • Is all this data (with very shallow
    representation of language structure) really
    necessary?
  • Can we build MT approaches that learn deeper
    levels of language structure and how they map
    from one language to another?

4
Rule-based vs. Statistical MT
  • Traditional Rule-based MT
  • Expressive and linguistically-rich formalisms
    capable of describing complex mappings between
    the two languages
  • Accurate clean resources
  • Everything constructed manually by experts
  • Main challenge obtaining broad coverage
  • Phrase-based Statistical MT
  • Learn word and phrase correspondences
    automatically from large volumes of parallel data
  • Search-based decoding framework
  • Models propose many alternative translations
  • Effective search algorithms find the best
    translation
  • Main challenge obtaining high translation
    accuracy

5
Main Principles of Stat-XFER
  • Integrate the major strengths of rule-based and
    statistical MT within a common framework
  • Linguistically rich formalism that can express
    complex and abstract compositional transfer rules
  • Rules can be written by human experts and also
    acquired automatically from data
  • Easy integration of morphological analyzers and
    generators
  • Word and basic phrase correspondences (i.e. base
    NPs) can be automatically acquired from parallel
    text when available
  • Search-based decoding from statistical MT adapted
    to find the best translation within the search
    space multi-feature scoring, beam-search,
    parameter optimization, etc.
  • Framework suitable for both resource-rich and
    resource-poor language scenarios

6
Stat-XFER MT Approach
  • Interlingua

Semantic Analysis
Sentence Planning
Syntactic Parsing
Text Generation
Transfer Rules
Statistical-XFER
Source (e.g. Quechua)
Target (e.g. English)
Direct SMT, EBMT
7
(No Transcript)
8
Transfer Rule Formalism
SL the old man, TL ha-ish ha-zaqen NPNP
DET ADJ N -gt DET N DET ADJ ( (X1Y1) (X1Y3)
(X2Y4) (X3Y2) ((X1 AGR) 3-SING) ((X1 DEF
DEF) ((X3 AGR) 3-SING) ((X3 COUNT)
) ((Y1 DEF) DEF) ((Y3 DEF) DEF) ((Y2 AGR)
3-SING) ((Y2 GENDER) (Y4 GENDER)) )
  • Type information
  • Part-of-speech/constituent information
  • Alignments
  • x-side constraints
  • y-side constraints
  • xy-constraints,
  • e.g. ((Y1 AGR) (X1 AGR))

9
Transfer Rule Formalism (II)
SL the old man, TL ha-ish ha-zaqen NPNP
DET ADJ N -gt DET N DET ADJ ( (X1Y1) (X1Y3)
(X2Y4) (X3Y2) ((X1 AGR) 3-SING) ((X1 DEF
DEF) ((X3 AGR) 3-SING) ((X3 COUNT)
) ((Y1 DEF) DEF) ((Y3 DEF) DEF) ((Y2 AGR)
3-SING) ((Y2 GENDER) (Y4 GENDER)) )
  • Value constraints
  • Agreement constraints

10
Hebrew Manual Transfer Grammar (human-developed)
  • Initially developed in a couple of days, with
    some later revisions by a CL post-doc
  • Current grammar has 36 rules
  • 21 NP rules
  • one PP rule
  • 6 verb complexes and VP rules
  • 8 higher-phrase and sentence-level rules
  • Captures the most common (mostly local)
    structural differences between Hebrew and English

11
Hebrew Transfer GrammarExample Rules
NP1,2 SL MLH ADWMH TL A RED
DRESS NP1NP1 NP1 ADJ -gt ADJ
NP1 ( (X2Y1) (X1Y2) ((X1 def) -) ((X1
status) c absolute) ((X1 num) (X2 num)) ((X1
gen) (X2 gen)) (X0 X1) )
NP1,3 SL H MLWT H ADWMWT TL THE RED
DRESSES NP1NP1 NP1 "H" ADJ -gt ADJ
NP1 ( (X3Y1) (X1Y2) ((X1 def) ) ((X1
status) c absolute) ((X1 num) (X3 num)) ((X1
gen) (X3 gen)) (X0 X1) )
12
The XFER Engine
  • Input source-language input sentence, or
    source-language confusion network
  • Output lattice representing collection of
    translation fragments at all levels supported by
    transfer rules
  • Basic Algorithm bottom-up integrated
    parsing-transfer-generation guided by the
    transfer rules
  • Start with translations of individual words and
    phrases from translation lexicon
  • Create translations of larger constituents by
    applying applicable transfer rules to previously
    created lattice entries
  • Beam-search controls the exponential
    combinatorics of the search-space, using multiple
    scoring features

13
Source-language Confusion Network Hebrew Example
  • Input word BWRH
  • 0 1 2 3 4
  • --------BWRH--------
  • -----B-----WR--H--
  • --B---H----WRH---

14
XFER Output Lattice
(28 28 "AND" -5.6988 "W" "(CONJ,0 'AND')") (29 29
"SINCE" -8.20817 "MAZ " "(ADVP,0 (ADV,5 'SINCE'))
") (29 29 "SINCE THEN" -12.0165 "MAZ " "(ADVP,0
(ADV,6 'SINCE THEN')) ") (29 29 "EVER SINCE"
-12.5564 "MAZ " "(ADVP,0 (ADV,4 'EVER SINCE'))
") (30 30 "WORKED" -10.9913 "BD " "(VERB,0 (V,11
'WORKED')) ") (30 30 "FUNCTIONED" -16.0023 "BD "
"(VERB,0 (V,10 'FUNCTIONED')) ") (30 30
"WORSHIPPED" -17.3393 "BD " "(VERB,0 (V,12
'WORSHIPPED')) ") (30 30 "SERVED" -11.5161 "BD "
"(VERB,0 (V,14 'SERVED')) ") (30 30 "SLAVE"
-13.9523 "BD " "(NP0,0 (N,34 'SLAVE')) ") (30 30
"BONDSMAN" -18.0325 "BD " "(NP0,0 (N,36
'BONDSMAN')) ") (30 30 "A SLAVE" -16.8671 "BD "
"(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NP0,0
(N,34 'SLAVE')) ) ) ) ") (30 30 "A BONDSMAN"
-21.0649 "BD " "(NP,1 (LITERAL 'A') (NP2,0
(NP1,0 (NP0,0 (N,36 'BONDSMAN')) ) ) ) ")
15
The Lattice Decoder
  • Simple Stack Decoder, similar in principle to
    simple Statistical MT decoders
  • Searches for best-scoring path of non-overlapping
    lattice arcs
  • No reordering during decoding
  • Scoring based on log-linear combination of
    scoring components, with weights trained using
    MERT
  • Scoring components
  • Statistical Language Model
  • Fragmentation how many arcs to cover the entire
    translation?
  • Length Penalty
  • Rule Scores
  • Lexical Probabilities

16
XFER Lattice Decoder
0 0 ON THE FOURTH DAY THE LION ATE THE RABBIT
TO A MORNING MEAL Overall -8.18323, Prob
-94.382, Rules 0, Frag 0.153846, Length 0,
Words 13,13 235 lt 0 8 -19.7602 B H IWM RBII
(PP,0 (PREP,3 'ON')(NP,2 (LITERAL 'THE') (NP2,0
(NP1,1 (ADJ,2 (QUANT,0 'FOURTH'))(NP1,0 (NP0,1
(N,6 'DAY')))))))gt 918 lt 8 14 -46.2973 H ARIH
AKL AT H PN (S,2 (NP,2 (LITERAL 'THE') (NP2,0
(NP1,0 (NP0,1 (N,17 'LION')))))(VERB,0 (V,0
'ATE'))(NP,100 (NP,2 (LITERAL 'THE') (NP2,0
(NP1,0 (NP0,1 (N,24 'RABBIT')))))))gt 584 lt 14 17
-30.6607 L ARWXH BWQR (PP,0 (PREP,6 'TO')(NP,1
(LITERAL 'A') (NP2,0 (NP1,0 (NNP,3 (NP0,0 (N,32
'MORNING'))(NP0,0 (N,27 'MEAL')))))))gt
17
Data Elicitation for Languages with Limited
Resources
  • Rationale
  • Large volumes of parallel text not available ?
    create a small maximally-diverse parallel corpus
    that directly supports the learning task
  • Bilingual native informant(s) can translate and
    align a small pre-designed elicitation corpus,
    using elicitation tool
  • Elicitation corpus designed to be typologically
    and structurally comprehensive and compositional
  • Transfer-rule engine and new learning approach
    support acquisition of generalized transfer-rules
    from the data

18
Elicitation Tool English-Chinese Example
19
Elicitation ToolEnglish-Chinese Example
20
Elicitation ToolEnglish-Hindi Example
21
Elicitation ToolEnglish-Arabic Example
22
Elicitation ToolSpanish-Mapudungun Example
23
Designing Elicitation Corpora
  • Goal Create a small representative parallel
    corpus that contains examples of the most
    important translation correspondences and
    divergences between the two languages
  • Method
  • Elicit translations and word alignments for a
    broad diversity of linguistic phenomena and
    constructions
  • Current Elicitation Corpus 3100 sentences and
    phrases, constructed based on a broad
    feature-based specification
  • Open Research Issues
  • Feature Detection discover what features exist
    in the language and where/how they are marked
  • Example does the language mark gender of nouns?
    How and where are these marked?
  • Dynamic corpus navigation based on feature
    detection no need to elicit for combinations
    involving non-existent features

24
Rule Learning - Overview
  • Goal Acquire Syntactic Transfer Rules
  • Use available knowledge from the source side
    (grammatical structure)
  • Three steps
  • Flat Seed Generation first guesses at transfer
    rules flat syntactic structure
  • Compositionality Learning use previously learned
    rules to learn hierarchical structure
  • Constraint Learning refine rules by learning
    appropriate feature constraints

25
Flat Seed Rule Generation
Learning Example NP Eng the big apple Heb ha-tapuax ha-gadol
Generated Seed Rule NPNP ART ADJ N ? ART N ART ADJ ((X1Y1) (X1Y3) (X2Y4) (X3Y2))
26
Compositionality Learning
Initial Flat Rules SS ART ADJ N V ART N ? ART N ART ADJ V P ART N ((X1Y1) (X1Y3) (X2Y4) (X3Y2) (X4Y5) (X5Y7) (X6Y8)) NPNP ART ADJ N ? ART N ART ADJ ((X1Y1) (X1Y3) (X2Y4) (X3Y2)) NPNP ART N ? ART N ((X1Y1) (X2Y2))
Generated Compositional Rule SS NP V NP ? NP V P NP ((X1Y1) (X2Y2) (X3Y4))
27
Constraint Learning
Input Rules and their Example Sets SS NP V NP ? NP V P NP ex1,ex12,ex17,ex26 ((X1Y1) (X2Y2) (X3Y4)) NPNP ART ADJ N ? ART N ART ADJ ex2,ex3,ex13 ((X1Y1) (X1Y3) (X2Y4) (X3Y2)) NPNP ART N ? ART N ex4,ex5,ex6,ex8,ex10,ex11 ((X1Y1) (X2Y2))
Output Rules with Feature Constraints SS NP V NP ? NP V P NP ((X1Y1) (X2Y2) (X3Y4) (X1 NUM X2 NUM) (Y1 NUM Y2 NUM) (X1 NUM Y1 NUM))
28
Automated Rule Refinement
  • Bilingual informants can identify translation
    errors and pinpoint the errors
  • A sophisticated trace of the translation path can
    identify likely sources for the error and do
    Blame Assignment
  • Rule Refinement operators can be developed to
    modify the underlying translation grammar (and
    lexicon) based on characteristics of the error
    source
  • Add or delete feature constraints from a rule
  • Bifurcate a rule into two rules (general and
    specific)
  • Add or correct lexical entries
  • See Font-Llitjos, Carbonell Lavie, 2005

29
Stat-XFER MT Prototypes
  • General Statistical XFER framework under
    development for past five years (funded by NSF
    and DARPA)
  • Prototype systems so far
  • Chinese-to-English
  • Dutch-to-English
  • French-to-English
  • Hindi-to-English
  • Hebrew-to-English
  • Mapudungun-to-Spanish
  • In progress or planned
  • Brazilian Portuguese-to-English
  • Native-Brazilian languages to Brazilian
    Portuguese
  • Hebrew-to-Arabic
  • Iñupiaq-to-English
  • Urdu-to-English
  • Turkish-to-English

30
Chinese-English Stat-XFER System
  • Bilingual lexicon over 1.1 million entries
    (multiple resources, incl. ADSO, Wikipedia,
    extracted base NPs)
  • Manual syntactic XFER grammar 76 rules! (mostly
    NPs, a few PPs, and reordering of NPs/PPs within
    VPs)
  • Multiple overlapping Chinese word segmentations
  • English morphology generation
  • Uses CMU SMT-groups Suffix-Array LM toolkit for
    LM
  • Current Performance (GALE dev-test)
  • NW
  • XFER 10.89(B)/0.4509(M)
  • Best (UMD) 15.58(B)/0.4769(M)
  • NG
  • XFER 8.92(B)/0.4229(M)
  • Best (UMD) 12.96(B)/0.4455(M)
  • In Progress
  • Automatic extraction of clean base NPs from
    parallel data
  • Automatic learning and extraction of high-quality
    transfer-rules from parallel data

31
Translation Example
  • REFERENCE When responding to whether it is
    possible to extend Russian fleet's stationing
    deadline at the Crimean peninsula, Yanukovych
    replied, "Without a doubt.
  • Stat-XFER (0.3989) In reply to whether the
    possibility to extend the Russian fleet stationed
    in Crimea Pen. left the deadline of the problem ,
    Yanukovich replied " of course .
  • IBM-ylee (0.2203) In response to the
    possibility to extend the deadline for the
    presence in Crimea peninsula , the Queen Vic said
    " of course .
  • CMU-SMT (0.2067) In response to a possible
    extension of the fleet in the Crimean Peninsula
    stay on the issue , Yanukovych vetch replied "
    of course .
  • maryland-hiero (0.1878) In response to the
    possibility of extending the mandate of the
    Crimean peninsula in , replied "of course.
  • IBM-smt (0.1862) The answer is likely to be
    extended the Crimean peninsula of the presence of
    the problem, Yanukovych said " Of course.
  • CMU-syntax (0.1639) In response to the
    possibility of extension of the presence in the
    Crimean Peninsula , replied " of course .

32
Major Research Directions
  • Automatic Transfer Rule Learning
  • From manually word-aligned elicitation corpus
  • From large volumes of automatically word-aligned
    wild parallel data
  • In the absence of morphology or POS annotated
    lexica
  • Compositionality and generalization
  • Identifying good rules from bad rules
  • Effective models for rule scoring for
  • Decoding using scores at runtime
  • Pruning the large collections of learned rules
  • Learning Unification Constraints

33
Major Research Directions
  • Extraction of Base-NP translations from parallel
    data
  • Base-NPs are extremely important building
    blocks for transfer-based MT systems
  • Frequent, often align 1-to-1, improve coverage
  • Correctly identifying them greatly helps
    automatic word-alignment of parallel sentences
  • Parsers (or NP-chunkers) available for both
    languages Extract base-NPs independently on
    both sides and find their correspondences
  • Parsers (or NP-chunkers) available for only one
    language (i.e. English) Extract base-NPs on one
    side, and find reliable correspondences for them
    using word-alignment, frequency distributions,
    other features
  • Promising preliminary results

34
Major Research Directions
  • Algorithms for XFER and Decoding
  • Integration and optimization of multiple features
    into search-based XFER parser
  • Complexity and efficiency improvements (i.e.
    Cube Pruning)
  • Non-monotonicity issues (LM scores, unification
    constraints) and their consequences on search

35
Major Research Directions
  • Discriminative Language Modeling for MT
  • Current standard statistical LMs provide only
    weak discrimination between good and bad
    translation hypotheses
  • New Idea Use occurrence-based statistics
  • Extract instances of lexical, syntactic and
    semantic features from each translation
    hypothesis
  • Determine whether these instances have been seen
    before (at least once) in a large monolingual
    corpus
  • The Conjecture more grammatical MT hypotheses
    are likely to contain higher proportions of
    feature instances that have been seen in a corpus
    of grammatical sentences.
  • Goals
  • Find the set of features that provides the best
    discrimination between good and bad translations
  • Learn how to combine these into a LM-like
    function for scoring alternative MT hypotheses

36
Major Research Directions
  • Building Elicitation Corpora
  • Feature Detection
  • Corpus Navigation
  • Automatic Rule Refinement
  • Translation for highly polysynthetic languages
    such as Mapudungun and Iñupiaq

37
Questions?
Write a Comment
User Comments (0)
About PowerShow.com