MT for Languages with Limited Resources - PowerPoint PPT Presentation

1 / 82
About This Presentation
Title:

MT for Languages with Limited Resources

Description:

TL: THE RED DRESSES. NP1::NP1 [NP1 'H' ADJ] - [ADJ NP1] (X3: ... Chinese-to-English. French-to-English. Hebrew-to-English. Urdu-to-English. German-to-English ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 83
Provided by: AlonL
Category:

less

Transcript and Presenter's Notes

Title: MT for Languages with Limited Resources


1
MT for Languages with Limited Resources
  • 11-731
  • Machine Translation
  • April 20, 2009
  • Joint Work with Lori Levin, Jaime Carbonell,
    Stephan Vogel, Shuly Wintner, Danny Shacham,
    Katharina Probst, Erik Peterson, Christian
    Monson, Roberto Aranovich and Ariadna Font-Llitjos

2
Why Machine Translation for Minority and
Indigenous Languages?
  • Commercial MT economically feasible for only a
    handful of major languages with large resources
    (corpora, human developers)
  • Is there hope for MT for languages with limited
    resources?
  • Benefits include
  • Better government access to indigenous
    communities (Epidemics, crop failures, etc.)
  • Better indigenous communities participation in
    information-rich activities (health care,
    education, government) without giving up their
    languages.
  • Language preservation
  • Civilian and military applications (disaster
    relief)

3
MT for Minority and Indigenous Languages
Challenges
  • Minimal amount of parallel text
  • Possibly competing standards for
    orthography/spelling
  • Often relatively few trained linguists
  • Access to native informants possible
  • Need to minimize development time and cost

4
MT for Low Resource Languages
  • Possible Approaches
  • Phrase-based SMT, with whatever small amounts of
    parallel data that is available
  • Build a rule-based system need for bilingual
    experts and resources
  • The AVENUE approach
  • Incorporate acquired manual resources within a
    general statistical framework
  • Augment with targeted elicitation and resource
    acquisition from bilingual non-experts

5
CMU Statistical Transfer (Stat-XFER) MT Approach
  • Integrate the major strengths of rule-based and
    statistical MT within a common framework
  • Linguistically rich formalism that can express
    complex and abstract compositional transfer rules
  • Rules can be written by human experts and also
    acquired automatically from data
  • Easy integration of morphological analyzers and
    generators
  • Word and syntactic-phrase correspondences can be
    automatically acquired from parallel text
  • Search-based decoding from statistical MT adapted
    to find the best translation within the search
    space multi-feature scoring, beam-search,
    parameter optimization, etc.
  • Framework suitable for both resource-rich and
    resource-poor language scenarios

6
Stat-XFER Main Principles
  • Framework Statistical search-based approach with
    syntactic translation transfer rules that can be
    acquired from data but also developed and
    extended by experts
  • Automatic Word and Phrase translation lexicon
    acquisition from parallel data
  • Transfer-rule Learning apply ML-based methods to
    automatically acquire syntactic transfer rules
    for translation between the two languages
  • Elicitation use bilingual native informants to
    produce a small high-quality word-aligned
    bilingual corpus of translated phrases and
    sentences
  • Rule Refinement refine the acquired rules via a
    process of interaction with bilingual informants
  • XFER Decoder
  • XFER engine produces a lattice of possible
    transferred structures at all levels
  • Decoder searches and selects the best scoring
    combination

7
Stat-XFER Framework
Source Input
8
(No Transcript)
9
Transfer Rule Formalism
SL the old man, TL ha-ish ha-zaqen NPNP
DET ADJ N -gt DET N DET ADJ ( (X1Y1) (X1Y3)
(X2Y4) (X3Y2) ((X1 AGR) 3-SING) ((X1 DEF
DEF) ((X3 AGR) 3-SING) ((X3 COUNT)
) ((Y1 DEF) DEF) ((Y3 DEF) DEF) ((Y2 AGR)
3-SING) ((Y2 GENDER) (Y4 GENDER)) )
  • Type information
  • Part-of-speech/constituent information
  • Alignments
  • x-side constraints
  • y-side constraints
  • xy-constraints,
  • e.g. ((Y1 AGR) (X1 AGR))

10
Transfer Rule Formalism (II)
SL the old man, TL ha-ish ha-zaqen NPNP
DET ADJ N -gt DET N DET ADJ ( (X1Y1) (X1Y3)
(X2Y4) (X3Y2) ((X1 AGR) 3-SING) ((X1 DEF
DEF) ((X3 AGR) 3-SING) ((X3 COUNT)
) ((Y1 DEF) DEF) ((Y3 DEF) DEF) ((Y2 AGR)
3-SING) ((Y2 GENDER) (Y4 GENDER)) )
  • Value constraints
  • Agreement constraints

11
Translation Lexicon Hebrew-to-English
Examples(Semi-manually-developed)
PROPRO "ANI" -gt "I" ( (X1Y1) ((X0 per)
1) ((X0 num) s) ((X0 case) nom) ) PROPRO
"ATH" -gt "you" ( (X1Y1) ((X0 per)
2) ((X0 num) s) ((X0 gen) m) ((X0 case)
nom) )
NN "H" -gt "HOUR" ( (X1Y1) ((X0 NUM)
s) ((Y0 NUM) s) ((Y0 lex) "HOUR") ) NN
"H" -gt "hours" ( (X1Y1) ((Y0 NUM)
p) ((X0 NUM) p) ((Y0 lex) "HOUR") )
12
Translation Lexicon French-to-English
Examples(Automatically-acquired)
DETDET le" -gt the" ( (X1Y1) ) Prep
Prep dans -gt in ( (X1Y1) ) NN
principes" -gt principles" ( (X1Y1) ) NN
respect" -gt accordance" ( (X1Y1) )
NPNP le respect" -gt accordance" ( ) PP
PP dans le respect" -gt in
accordance" ( ) PPPP des principes" -gt
with the principles" ( )
13
Hebrew-English Transfer GrammarExample
Rules(Manually-developed)
NP1,2 SL MLH ADWMH TL A RED
DRESS NP1NP1 NP1 ADJ -gt ADJ
NP1 ( (X2Y1) (X1Y2) ((X1 def) -) ((X1
status) c absolute) ((X1 num) (X2 num)) ((X1
gen) (X2 gen)) (X0 X1) )
NP1,3 SL H MLWT H ADWMWT TL THE RED
DRESSES NP1NP1 NP1 "H" ADJ -gt ADJ
NP1 ( (X3Y1) (X1Y2) ((X1 def) ) ((X1
status) c absolute) ((X1 num) (X3 num)) ((X1
gen) (X3 gen)) (X0 X1) )
14
French-English Transfer GrammarExample
Rules(Automatically-acquired)
PP,24691 SL des principes TL with the
principles PPPP des N -gt with the
N ( (X1Y1) )
PP,312 SL dans le respect des
principes TL in accordance with the
principles PPPP Prep NP -gt Prep
NP ( (X1Y1) (X2Y2) )
15
The Transfer Engine
  • Input source-language input sentence, or
    source-language confusion network
  • Output lattice representing collection of
    translation fragments at all levels supported by
    transfer rules
  • Basic Algorithm bottom-up integrated
    parsing-transfer-generation chart-parser guided
    by the synchronous transfer rules
  • Start with translations of individual words and
    phrases from translation lexicon
  • Create translations of larger constituents by
    applying applicable transfer rules to previously
    created lattice entries
  • Beam-search controls the exponential
    combinatorics of the search-space, using multiple
    scoring features

16
The Transfer Engine
  • Some Unique Features
  • Works with either learned or manually-developed
    transfer grammars
  • Handles rules with or without unification
    constraints
  • Supports interfacing with servers for
    morphological analysis and generation
  • Can handle ambiguous source-word analyses and/or
    SL segmentations represented in the form of
    lattice structures

17
Hebrew Example(From Lavie et al., 2004)
  • Input word BWRH
  • 0 1 2 3 4
  • --------BWRH--------
  • -----B-----WR--H--
  • --B---H----WRH---

18
Hebrew Example (From Lavie et al., 2004)
  • Y0 ((SPANSTART 0) Y1 ((SPANSTART 0)
    Y2 ((SPANSTART 1)
  • (SPANEND 4) (SPANEND
    2) (SPANEND 3)
  • (LEX BWRH) (LEX B)
    (LEX WR)
  • (POS N) (POS
    PREP)) (POS N)
  • (GEN F)
    (GEN M)
  • (NUM S)
    (NUM S)
  • (STATUS ABSOLUTE))
    (STATUS ABSOLUTE))
  • Y3 ((SPANSTART 3) Y4 ((SPANSTART 0)
    Y5 ((SPANSTART 1)
  • (SPANEND 4) (SPANEND
    1) (SPANEND 2)
  • (LEX LH) (LEX
    B) (LEX H)
  • (POS POSS)) (POS
    PREP)) (POS DET))
  • Y6 ((SPANSTART 2) Y7 ((SPANSTART 0)
  • (SPANEND 4) (SPANEND
    4)
  • (LEX WRH) (LEX
    BWRH)
  • (POS N) (POS
    LEX))
  • (GEN F)
  • (NUM S)

19
XFER Output Lattice
(28 28 "AND" -5.6988 "W" "(CONJ,0 'AND')") (29 29
"SINCE" -8.20817 "MAZ " "(ADVP,0 (ADV,5 'SINCE'))
") (29 29 "SINCE THEN" -12.0165 "MAZ " "(ADVP,0
(ADV,6 'SINCE THEN')) ") (29 29 "EVER SINCE"
-12.5564 "MAZ " "(ADVP,0 (ADV,4 'EVER SINCE'))
") (30 30 "WORKED" -10.9913 "BD " "(VERB,0 (V,11
'WORKED')) ") (30 30 "FUNCTIONED" -16.0023 "BD "
"(VERB,0 (V,10 'FUNCTIONED')) ") (30 30
"WORSHIPPED" -17.3393 "BD " "(VERB,0 (V,12
'WORSHIPPED')) ") (30 30 "SERVED" -11.5161 "BD "
"(VERB,0 (V,14 'SERVED')) ") (30 30 "SLAVE"
-13.9523 "BD " "(NP0,0 (N,34 'SLAVE')) ") (30 30
"BONDSMAN" -18.0325 "BD " "(NP0,0 (N,36
'BONDSMAN')) ") (30 30 "A SLAVE" -16.8671 "BD "
"(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NP0,0
(N,34 'SLAVE')) ) ) ) ") (30 30 "A BONDSMAN"
-21.0649 "BD " "(NP,1 (LITERAL 'A') (NP2,0
(NP1,0 (NP0,0 (N,36 'BONDSMAN')) ) ) ) ")
20
The Lattice Decoder
  • Stack Decoder, similar to standard Statistical MT
    decoders
  • Searches for best-scoring path of non-overlapping
    lattice arcs
  • No reordering during decoding
  • Scoring based on log-linear combination of
    scoring features, with weights trained using
    Minimum Error Rate Training (MERT)
  • Scoring components
  • Statistical Language Model
  • Bi-directional MLE phrase and rule scores
  • Lexical Probabilities
  • Fragmentation how many arcs to cover the entire
    translation?
  • Length Penalty how far from expected target
    length?

21
XFER Lattice Decoder
0 0 ON THE FOURTH DAY THE LION ATE THE RABBIT
TO A MORNING MEAL Overall -8.18323, Prob
-94.382, Rules 0, Frag 0.153846, Length 0,
Words 13,13 235 lt 0 8 -19.7602 B H IWM RBII
(PP,0 (PREP,3 'ON')(NP,2 (LITERAL 'THE') (NP2,0
(NP1,1 (ADJ,2 (QUANT,0 'FOURTH'))(NP1,0 (NP0,1
(N,6 'DAY')))))))gt 918 lt 8 14 -46.2973 H ARIH
AKL AT H PN (S,2 (NP,2 (LITERAL 'THE') (NP2,0
(NP1,0 (NP0,1 (N,17 'LION')))))(VERB,0 (V,0
'ATE'))(NP,100 (NP,2 (LITERAL 'THE') (NP2,0
(NP1,0 (NP0,1 (N,24 'RABBIT')))))))gt 584 lt 14 17
-30.6607 L ARWXH BWQR (PP,0 (PREP,6 'TO')(NP,1
(LITERAL 'A') (NP2,0 (NP1,0 (NNP,3 (NP0,0 (N,32
'MORNING'))(NP0,0 (N,27 'MEAL')))))))gt
22
Stat-XFER MT Systems
  • General Stat-XFER framework under development for
    past seven years
  • Systems so far
  • Chinese-to-English
  • French-to-English
  • Hebrew-to-English
  • Urdu-to-English
  • German-to-English
  • Hindi-to-English
  • Dutch-to-English
  • Turkish-to-English
  • Mapudungun-to-Spanish
  • In progress or planned
  • Arabic-to-English
  • Brazilian Portuguese-to-English
  • English-to-Arabic
  • Hebrew-to-Arabic

23
Learning Transfer-Rules for Languages with
Limited Resources
  • Rationale
  • Large bilingual corpora not available
  • Bilingual native informant(s) can translate and
    align a small pre-designed elicitation corpus,
    using elicitation tool
  • Elicitation corpus designed to be typologically
    comprehensive and compositional
  • Transfer-rule engine and rule learning approach
    support acquisition of generalized transfer-rules
    from the data

24
English-Chinese Example
25
English-Hindi Example
26
Spanish-Mapudungun Example
27
English-Arabic Example
28
The Typological Elicitation Corpus
  • Translated, aligned by bilingual informant
  • Corpus consists of linguistically diverse
    constructions
  • Based on elicitation and documentation work of
    field linguists (e.g. Comrie 1977, Bouquiaux
    1992)
  • Organized compositionally elicit simple
    structures first, then use them as building
    blocks
  • Goal minimize size, maximize linguistic coverage

29
The Structural Elicitation Corpus
  • Designed to cover the most common phrase
    structures of English ? learn how these
    structures map onto their equivalents in other
    languages
  • Constructed using the constituent parse trees
    from the Penn TreeBank
  • Extracted and frequency ranked all rules in parse
    trees
  • Selected top 200 rules, filtered idiosyncratic
    cases
  • Revised lexical choices within examples
  • Goal minimize size, maximize linguistic coverage
    of structures

30
The Structural Elicitation Corpus
Examples srcsent in the forest tgtsent B H
IR aligned ((1,1),(2,2),(3,3)) context
C-Structure(ltPPgt (PREP in-1) (ltNPgt (DET the-2)
(N forest-3))) srcsent steps tgtsent
MDRGWT aligned ((1,1)) context
C-Structure(ltNPgt (N steps-1)) srcsent the boy
ate the apple tgtsent H ILD AKL AT H
TPWX aligned ((1,1),(2,2),(3,3),(4,5),(5,6)) cont
ext C-Structure(ltSgt (ltNPgt (DET the-1) (N
boy-2))
(ltVPgt (V ate-3) (ltNPgt (DET the-4)(N
apple-5)))) srcsent the first year tgtsent H
NH H RAWNH aligned ((1,1 3),(2,4),(3,2)) contex
t C-Structure(ltNPgt (DET the-1) (ltADJPgt (ADJ
first-2)) (N year-3))
31
A Limited Data Scenario for Hindi-to-English
  • Conducted during a DARPA Surprise Language
    Exercise (SLE) in June 2003
  • Put together a scenario with miserly data
    resources
  • Elicited Data corpus 17589 phrases
  • Cleaned portion (top 12) of LDC dictionary
    2725 Hindi words (23612 translation pairs)
  • Manually acquired resources during the SLE
  • 500 manual bigram translations
  • 72 manually written phrase transfer rules
  • 105 manually written postposition rules
  • 48 manually written time expression rules
  • No additional parallel text!!

32
Examples of Learned Rules (Hindi-to-English)
33
Manual Transfer Rules Hindi Example
PASSIVE OF SIMPLE PAST (NO AUX) WITH LIGHT
VERB passive of 43 (7b) VP,28 VPVP V V
V -gt Aux V ( (X1Y2) ((x1 form) root)
((x2 type) c light) ((x2 form) part) ((x2
aspect) perf) ((x3 lexwx) 'jAnA') ((x3
form) part) ((x3 aspect) perf) (x0 x1)
((y1 lex) be) ((y1 tense) past) ((y1 agr
num) (x3 agr num)) ((y1 agr pers) (x3 agr
pers)) ((y2 form) part) )
34
Manual Transfer Rules Example
NP PP NP1 NP P Adj N
N1 ke eka aXyAya N
jIvana
NP NP1 PP Adj N
P NP one chapter of N1
N life
NP1 ke NP2 -gt NP2 of NP1 Ex jIvana ke
eka aXyAya life of (one) chapter
gt a chapter of life NP,12 NPNP PP
NP1 -gt NP1 PP ( (X1Y2) (X2Y1) ((x2
lexwx) 'kA') ) NP,13 NPNP NP1 -gt
NP1 ( (X1Y1) ) PP,12 PPPP NP Postp
-gt Prep NP ( (X1Y2) (X2Y1) )
35
Manual Grammar Development
  • Covers mostly NPs, PPs and VPs (verb complexes)
  • 70 grammar rules, covering basic and recursive
    NPs and PPs, verb complexes of main tenses in
    Hindi (developed in two weeks)

36
Testing Conditions
  • Tested on section of JHU provided data 258
    sentences with four reference translations
  • SMT system (stand-alone)
  • EBMT system (stand-alone)
  • XFER system (naïve decoding)
  • XFER system with strong decoder
  • No grammar rules (baseline)
  • Manually developed grammar rules
  • Automatically learned grammar rules
  • XFERSMT with strong decoder (MEMT)

37
Results on JHU Test Set
38
Effect of Reordering in the Decoder

39
Observations and Lessons (I)
  • XFER with strong decoder outperformed SMT even
    without any grammar rules in the miserly data
    scenario
  • SMT Trained on elicited phrases that are very
    short
  • SMT has insufficient data to train more
    discriminative translation probabilities
  • XFER takes advantage of Morphology
  • Token coverage without morphology 0.6989
  • Token coverage with morphology 0.7892
  • Manual grammar was somewhat better than
    automatically learned grammar
  • Learned rules were very simple
  • Large room for improvement on learning rules

40
Observations and Lessons (II)
  • MEMT (XFER and SMT) based on strong decoder
    produced best results in the miserly scenario.
  • Reordering within the decoder provided very
    significant score improvements
  • Much room for more sophisticated grammar rules
  • Strong decoder can carry some of the reordering
    burden

41
Modern Hebrew
  • Native language of about 3-4 Million in Israel
  • Semitic language, closely related to Arabic and
    with similar linguistic properties
  • RootPattern word formation system
  • Rich verb and noun morphology
  • Particles attach as prefixed to the following
    word definite article (H), prepositions
    (B,K,L,M), coordinating conjuction (W),
    relativizers (,K)
  • Unique alphabet and Writing System
  • 22 letters represent (mostly) consonants
  • Vowels represented (mostly) by diacritics
  • Modern texts omit the diacritic vowels, thus
    additional level of ambiguity bare word ? word
  • Example MHGR ? mehager, mhagar, mhger

42
Modern Hebrew Spelling
  • Two main spelling variants
  • KTIV XASER (difficient) spelling with the
    vowel diacritics, and consonant words when the
    diacritics are removed
  • KTIV MALEH (full) words with I/O/U vowels are
    written with long vowels which include a letter
  • KTIV MALEH is predominant, but not strictly
    adhered to even in newspapers and official
    publications ? inconsistent spelling
  • Example
  • niqud (spelling) NIQWD, NQWD, NQD
  • When written as NQD, could also be niqed, naqed,
    nuqad

43
Challenges for Hebrew MT
  • Puacity in existing language resources for Hebrew
  • No publicly available broad coverage
    morphological analyzer
  • No publicly available bilingual lexicons or
    dictionaries
  • No POS-tagged corpus or parse tree-bank corpus
    for Hebrew
  • No large Hebrew/English parallel corpus
  • Scenario well suited for CMU transfer-based MT
    framework for languages with limited resources

44
Morphological Analyzer
  • We use a publicly available morphological
    analyzer distributed by the Technions Knowledge
    Center, adapted for our system
  • Coverage is reasonable (for nouns, verbs and
    adjectives)
  • Produces all analyses or a disambiguated analysis
    for each word
  • Output format includes lexeme (base form), POS,
    morphological features
  • Output was adapted to our representation needs
    (POS and feature mappings)

45
Morphology Example
  • Input word BWRH
  • 0 1 2 3 4
  • --------BWRH--------
  • -----B-----WR--H--
  • --B---H----WRH---

46
Morphology Example
  • Y0 ((SPANSTART 0) Y1 ((SPANSTART 0)
    Y2 ((SPANSTART 1)
  • (SPANEND 4) (SPANEND
    2) (SPANEND 3)
  • (LEX BWRH) (LEX B)
    (LEX WR)
  • (POS N) (POS
    PREP)) (POS N)
  • (GEN F)
    (GEN M)
  • (NUM S)
    (NUM S)
  • (STATUS ABSOLUTE))
    (STATUS ABSOLUTE))
  • Y3 ((SPANSTART 3) Y4 ((SPANSTART 0)
    Y5 ((SPANSTART 1)
  • (SPANEND 4) (SPANEND
    1) (SPANEND 2)
  • (LEX LH) (LEX
    B) (LEX H)
  • (POS POSS)) (POS
    PREP)) (POS DET))
  • Y6 ((SPANSTART 2) Y7 ((SPANSTART 0)
  • (SPANEND 4) (SPANEND
    4)
  • (LEX WRH) (LEX
    BWRH)
  • (POS N) (POS
    LEX))
  • (GEN F)
  • (NUM S)

47
Translation Lexicon
  • Constructed our own Hebrew-to-English lexicon,
    based primarily on existing Dahan H-to-E and
    E-to-H dictionary made available to us, augmented
    by other public sources
  • Coverage is not great but not bad as a start
  • Dahan H-to-E is about 15K translation pairs
  • Dahan E-to-H is about 7K translation pairs
  • Base forms, POS information on both sides
  • Converted Dahan into our representation, added
    entries for missing closed-class entries
    (pronouns, prepositions, etc.)
  • Had to deal with spelling conventions
  • Recently augmented with 50K translation pairs
    extracted from Wikipedia (mostly proper names and
    named entities)

48
Manual Transfer Grammar (human-developed)
  • Initially developed by Alon in a couple of days,
    extended and revised by Nurit over time
  • Current grammar has 36 rules
  • 21 NP rules
  • one PP rule
  • 6 verb complexes and VP rules
  • 8 higher-phrase and sentence-level rules
  • Captures the most common (mostly local)
    structural differences between Hebrew and English

49
Transfer GrammarExample Rules
NP1,2 SL MLH ADWMH TL A RED
DRESS NP1NP1 NP1 ADJ -gt ADJ
NP1 ( (X2Y1) (X1Y2) ((X1 def) -) ((X1
status) c absolute) ((X1 num) (X2 num)) ((X1
gen) (X2 gen)) (X0 X1) )
NP1,3 SL H MLWT H ADWMWT TL THE RED
DRESSES NP1NP1 NP1 "H" ADJ -gt ADJ
NP1 ( (X3Y1) (X1Y2) ((X1 def) ) ((X1
status) c absolute) ((X1 num) (X3 num)) ((X1
gen) (X3 gen)) (X0 X1) )
50
Hebrew-to-English MT Prototype
  • Initial prototype developed within a two month
    intensive effort
  • Accomplished
  • Adapted available morphological analyzer
  • Constructed a preliminary translation lexicon
  • Translated and aligned Elicitation Corpus
  • Learned XFER rules
  • Developed (small) manual XFER grammar
  • System debugging and development
  • Evaluated performance on unseen test data using
    automatic evaluation metrics

51
Example Translation
  • Input
  • ???? ?????? ???? ?????? ?????? ????? ???? ??
    ????? ??????
  • After debates many decided the government to hold
    referendum in issue the withdrawal
  • Output
  • AFTER MANY DEBATES THE GOVERNMENT DECIDED TO HOLD
    A REFERENDUM ON THE ISSUE OF THE WITHDRAWAL

52
Noun Phrases Construct State
????? ????? ??????
HXL_at_T HNSIA HRAWNdecision.3SF-CS the-president
.3SM the-first.3SM
THE DECISION OF THE FIRST PRESIDENT
????? ????? ???????
HXL_at_T HNSIA HRAWNHdecision.3SF-CS the-presiden
t.3SM the-first.3SF
THE FIRST DECISION OF THE PRESIDENT
53
Noun Phrases - Possessives
????? ????? ??????? ??????? ??? ???? ????? ?????
?????? ???????
HNSIA HKRIZ HMIMH HRAWNH LW THIHthe-president
announced that-the-task.3SF the-first.3SF of-him
will.3SF
LMCWA PTRWN LSKSWK BAZWRNWto-find solution to-the
-conflict in-region-POSS.1P
Without transfer grammar THE PRESIDENT ANNOUNCED
THAT THE TASK THE BEST OF HIM WILL BE TO FIND
SOLUTION TO THE CONFLICT IN REGION OUR
With transfer grammar THE PRESIDENT ANNOUNCED
THAT HIS FIRST TASK WILL BE TO FIND A SOLUTION TO
THE CONFLICT IN OUR REGION
54
Subject-Verb Inversion
????? ?????? ?????? ??????? ?????? ????? ???
ATMWL HWDIH HMMLH yesterday announced.3SF the-g
overnment.3SF
TRKNH BXIRWT BXWD HBAthat-will-be-held.3PF ele
ctions.3PF in-the-month the-next
Without transfer grammar YESTERDAY ANNOUNCED THE
GOVERNMENT THAT WILL RESPECT OF THE FREEDOM OF
THE MONTH THE NEXT
With transfer grammar YESTERDAY THE GOVERNMENT
ANNOUNCED THAT ELECTIONS WILL ASSUME IN THE NEXT
MONTH
55
Subject-Verb Inversion
???? ??? ?????? ?????? ????? ????? ?????? ????
???? ????
LPNI KMH BWWT HWDIH HNHLT HMLWNbefore several
weeks announced.3SF management.3SF.CS the-hotel
HMLWN ISGR BSWF HNH that-the-hotel.3SM will-be
-closed.3SM at-end.3SM.CS the-year
Without transfer grammar IN FRONT OF A FEW WEEKS
ANNOUNCED ADMINISTRATION THE HOTEL THAT THE HOTEL
WILL CLOSE AT THE END THIS YEAR
With transfer grammar SEVERAL WEEKS AGO THE
MANAGEMENT OF THE HOTEL ANNOUNCED THAT THE HOTEL
WILL CLOSE AT THE END OF THE YEAR
56
Evaluation Results
  • Test set of 62 sentences from Haaretz newspaper,
    2 reference translations

57
Current and Future Work
  • Issues specific to the Hebrew-to-English system
  • Coverage further improvements in the translation
    lexicon and morphological analyzer
  • Manual Grammar development
  • Acquiring/training of word-to-word translation
    probabilities
  • Acquiring/training of a Hebrew language model at
    a post-morphology level that can help with
    disambiguation
  • General Issues related to XFER framework
  • Discriminative Language Modeling for MT
  • Effective models for assigning scores to transfer
    rules
  • Improved grammar learning
  • Merging/integration of manual and acquired
    grammars

58
Conclusions
  • Test case for the CMU XFER framework for rapid MT
    prototyping
  • Preliminary system was a two-month, three person
    effort we were quite happy with the outcome
  • Core concept of XFER Decoding is very powerful
    and promising for low-resource MT
  • We experienced the main bottlenecks of knowledge
    acquisition for MT morphology, translation
    lexicons, grammar...

59
Mapudungun-to-Spanish Example
English I didnt see Maria
Mapudungun pelafiñ Maria
Spanish No vi a María
60
Mapudungun-to-Spanish Example
English I didnt see Maria
Mapudungun pelafiñ Maria pe -la -fi -ñ Maria see
-neg -3.obj -1.subj.indicative Maria
Spanish No vi a María No vi a María neg see.1.sub
j.past.indicative acc Maria
61
pe-la-fi-ñ Maria
V
pe
62
pe-la-fi-ñ Maria
V
pe
VSuff
Negation
la
63
pe-la-fi-ñ Maria
V
pe
VSuffG
Pass all features up
VSuff
la
64
pe-la-fi-ñ Maria
V
pe
VSuffG
VSuff
object person 3
fi
VSuff
la
65
pe-la-fi-ñ Maria
V
VSuffG
pe
Pass all features up from both children
VSuffG
VSuff
fi
VSuff
la
66
pe-la-fi-ñ Maria
V
VSuffG
VSuff
pe
person 1 number sg mood ind
VSuffG
VSuff
ñ
fi
VSuff
la
67
pe-la-fi-ñ Maria
V
VSuffG
VSuffG
VSuff
pe
Pass all features up from both children
VSuffG
VSuff
ñ
fi
VSuff
la
68
pe-la-fi-ñ Maria
Pass all features up from both children
V
Check that 1) negation 2) tense is undefined
V
VSuffG
VSuffG
VSuff
pe
VSuffG
VSuff
ñ
fi
VSuff
la
69
pe-la-fi-ñ Maria
NP
V
VSuffG
person 3 number sg human
VSuffG
VSuff
N
pe
VSuffG
VSuff
Maria
ñ
fi
VSuff
la
70
pe-la-fi-ñ Maria
S
Check that NP is human
Pass features up from
VP
NP
V
VSuffG
VSuffG
VSuff
N
pe
VSuffG
VSuff
ñ
Maria
fi
VSuff
la
71
Transfer to Spanish Top-Down
S
S
VP
VP
NP
V
VSuffG
VSuffG
VSuff
N
pe
VSuffG
VSuff
ñ
Maria
fi
VSuff
la
72
Transfer to Spanish Top-Down
Pass all features to Spanish side
S
S
VP
VP
NP
NP
a
V
VSuffG
VSuffG
VSuff
N
pe
VSuffG
VSuff
ñ
Maria
fi
VSuff
la
73
Transfer to Spanish Top-Down
S
S
Pass all features down
VP
VP
NP
NP
a
V
VSuffG
VSuffG
VSuff
N
pe
VSuffG
VSuff
ñ
Maria
fi
VSuff
la
74
Transfer to Spanish Top-Down
S
S
Pass object features down
VP
VP
NP
NP
a
V
VSuffG
VSuffG
VSuff
N
pe
VSuffG
VSuff
ñ
Maria
fi
VSuff
la
75
Transfer to Spanish Top-Down
S
S
VP
VP
NP
NP
a
V
VSuffG
Accusative marker on objects is introduced
because human
VSuffG
VSuff
N
pe
VSuffG
VSuff
ñ
Maria
fi
VSuff
la
76
Transfer to Spanish Top-Down
S
S
VP
VP
VPVP VBar NP -gt VBar "a" NP ( (X1Y1) (X2
Y3) ((X2 type) (NOT personal)) ((X2
human) c ) (X0 X1) ((X0 object) X2)
(Y0 X0) ((Y0 object) (X0 object)) (Y1
Y0) (Y3 (Y0 object)) ((Y1 objmarker person)
(Y3 person)) ((Y1 objmarker number) (Y3
number)) ((Y1 objmarker gender) (Y3 ender)))
NP
NP
a
V
VSuffG
VSuffG
VSuff
N
pe
VSuffG
VSuff
ñ
Maria
fi
VSuff
la
77
Transfer to Spanish Top-Down
S
S
Pass person, number, and mood features to Spanish
Verb
VP
VP
NP
NP
a
Assign tense past
V
VSuffG
V
no
VSuffG
VSuff
N
pe
VSuffG
VSuff
ñ
Maria
fi
VSuff
la
78
Transfer to Spanish Top-Down
S
S
VP
VP
NP
NP
a
V
VSuffG
V
no
VSuffG
VSuff
N
pe
VSuffG
VSuff
ñ
Maria
Introduced because negation
fi
VSuff
la
79
Transfer to Spanish Top-Down
S
S
VP
VP
NP
NP
a
V
VSuffG
V
no
VSuffG
VSuff
N
pe
ver
VSuffG
VSuff
ñ
Maria
fi
VSuff
la
80
Transfer to Spanish Top-Down
S
S
VP
VP
NP
NP
a
V
VSuffG
V
no
VSuffG
VSuff
N
pe
ver
vi
VSuffG
VSuff
ñ
Maria
person 1 number sg mood indicative tense
past
fi
VSuff
la
81
Transfer to Spanish Top-Down
S
S
Pass features over to Spanish side
VP
VP
NP
NP
a
V
VSuffG
V
no
VSuffG
VSuff
N
pe
vi
N
VSuffG
VSuff
ñ
Maria
María
fi
VSuff
la
82
I Didnt see Maria
S
S
VP
VP
NP
NP
a
V
VSuffG
V
no
VSuffG
VSuff
N
pe
vi
N
VSuffG
VSuff
ñ
Maria
María
fi
VSuff
la
Write a Comment
User Comments (0)
About PowerShow.com