Improving a Statistical MT System with - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Improving a Statistical MT System with

Description:

Extractor. parallel. data. Clump. Library. Preprocessor. Target ... Extractor. Parallel. data. Rewrite Rule. Applier. Source sentence. in target word order ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 66
Provided by: jsmc7
Category:

less

Transcript and Presenter's Notes

Title: Improving a Statistical MT System with


1
Improving a Statistical MT System with
Automatically Learned Rewrite Rules
Fei Xia and Michael McCord
IBM T. J. Watson Research Center Yorktown
Heights, New York
2
Previous attempt at using syntax
  • 2003 NSF/JHU MT Summer Workshop
  • (Och et. al., 2004)
  • Method run SMT to generate topN translations,
    and use syntactic info to rerank the candidates.
  • gt Parsing MT output is problematic.
  • Result no gain from syntax.

3
Outline
  • Current phrase-based SMT systems
  • (a.k.a. clump-based systems)
  • Overview of the new approach
  • Learning and applying rewrite rules
  • Experimental results
  • Conclusion and future work

4
Clump-based SMT
  • The unit of translation is a clump, rather than a
    word. A clump is simply a word n-gram.
  • Ex P (est le premier is the first)
  • Baseline system (Tillmann Xia, 2003)

5
Clump-based system training stage
Source Sentence
Preprocessor
parallel data
Clump Extractor
Preprocessor
Clump Library
Target sentence
6
Clump-based system translation stage
Source sentence
Preprocessor
Target translation
Decoder
Clump Library
Language Model
7
Baseline system clump extraction
France is the first western country
La France est le premier pays occidental
France gt France
France is gt France est
8
Baseline system clump extraction
France is the first western country
La France est le premier pays occidental
France gt France
France is gt France est
France is the gt France est le
9
Baseline system decoding
He is the first international student
he gt il, he is gt il est, he is the gt il est
le first gt premier, is the first gt est le
premier international gt international
student gt étudiant
He is the first international student
10
Monotonic vs. non-monotonic decoding
He is the first international student
Monotonic decoding
(S1, S2, S3, S4)
il est le premier international étudiant
Non-monotonic decoding
(S1, S2, S4, S3)
il est le premier étudiant international
(S2, S1, S3, S4)
premier il est le international étudiant
...
11
Challenges for currentclump-based systems
(1) Non-monotonic decoding is expensive (n!), and
it can hurt performance.
premier il est le international étudiant
(S2, S1, S3, S4)
premier international il est le étudiant
(S2, S3, S1, S4)
premier international étudiant il est le
(S2, S3, S4, S1)
12
Challenges for current clump-based systems (ctd)
  • (2) No phrase-level generalizations are
  • learned and used.
  • France is the first western country.
  • He is the first international student.
  • Rewrite rules are useful
  • word-level rule Adj N gt N Adj
  • phrase-level rule Subj V Obj gt V
    Subj Obj

13
New approach
He is the first international student
Applying rewrite rules Adj N gt N Adj
He is the first student international
Monotonic decoding
il est le premier étudiant international
14
Rewrite rules
NP V NP gt V NP NP
15
Defaults and exceptions
Adj N gt N Adj
Adj (first) N gt Adj (premier) N
NP0 V NP1 gt NP0 V NP1
NP0 V NP1 (iobj, pron) gt NP0 NP1 (iobj, pron)
V
? Learn both defaults and exceptions
16
New approach training stage 1
Source Sentence
Parser
Parallel data
Phrase Aligner
Rewrite Rule Extractor
Parser
Target sentence
Rewrite rules
17
New approach training stage 2
Source Sentence
Source sentence in target word order
Rewrite Rule Applier
Preprocessor
Parser
Parallel data
Rewrite rules
Clump Extractor
Preprocessor
Clump Library
Target sentence
18
New approach translation stage
Source sentence
source sentence in target word order
Preprocessor
Rewrite rule applier
Parser
Rewrite rules
Target translation
Decoder
Clump Library
Language Model
19
Tasks
  • Learn rewrite rules automatically from data.
  • Apply rewrite rules to source parse trees.

20
Learning rewrite rules
  • Parse source and target sentences
  • Align linguistic phrases
  • Extract rewrite rules
  • Organize rewrite rules into a hierarchy

21
Parsing
Slot grammar (McCord 1980, 1993, )
France is the first western country
22
Parse trees in Penn Treebank style
S
NP-PRD
NP-SBJ
V
Det
Adj
Adj
N
N
is
western
France
the
first
country
23
Aligning phrases
24
Extracting rewrite rules
S
NP-SBJ
V
NP-PRD
N
Det
est
Det
Adj
N
Adj
pays
la
France
le
premier
occidental
NP0 (France) V(is) NP1 (country) gt NP0
V NP1
25
Extracting rewrite rules
S
NP-SBJ
V
NP-PRD
N
Det
est
Det
Adj
N
Adj
pays
la
France
le
premier
occidental
NP0 (France) V(is) NP1 (country) gt NP0
V NP1
N (France) gt Det (la) N
26
Extracting rewrite rules
S
NP-SBJ
V
NP-PRD
N
Det
est
Det
Adj
N
Adj
pays
la
France
le
premier
occidental
NP0 (France) V(is) NP1 (country) gt NP0
V NP1
N (France) gt Det (la) N
Det (the) Adj 1 (first) Adj2 (western) N
(country) gt Det Adj1 N Adj2
27
Creating more generalized rules
  • Det (the) Adj1 (first) Adj2 (western) N (country)
    gt Det Adj1 N Adj2
  • Adj (first) N (country) gt Adj N
  • Adj N gt Adj N
  • Adj (first) N gt Adj N
  • Adj (western) N (country) gt N Adj
  • Adj N gt N Adj
  • Adj (western) N gt N Adj

28
Merging the counts and normalize
  • ADJ N gt N ADJ 539328
    0.64
  • ADJ N gt ADJ N 278091
    0.33
  • ADJ (first) N gt ADJ N 10245 0.99
  • ADJ (first) N gt N ADJ 103
    0.01
  • ADJ (first) N (country) gt ADJ N 27 1.0

29
Organizing rewrite rules
N
N
N PP
Adj (first) N
gt Adj N 0.99
Adj (first) N
Adj N PP
gt N Adj PP 0.61
Adj N PP
gt N Adj 0.01
gt Adj N PP 0.30
Adj(first) N(country)
Adj(first) N(country)
gt Adj N 1.0
30
Organizing rewrite rules
Adj (first) N
gt Adj N 0.99
Adj (first) N
gt N Adj 0.26
gt Det Adj N 0.35
31
Applying rewrite rules
Adj N gt N Adj
Adj1 Adj2 N gt N Adj2 Adj1
Adj1 (first) Adj2 N gt Adj1 N Adj2
32
Decoding
He is the first international student
Applying rewrite rules Adj (first) Adj N gt Adj
N Adj
He is the first student international
Monotonic decoding
il est le premier étudiant international
33
Experimental Result
  • Training data 90M-word Eng-Fr Canadian Hansard
  • Test data 500 sentences in news domain
  • Metrics Bleu score (Papineni et. al., 2002)
  • 1-reference translation
  • Parser English and French slot grammars
  • Baseline system (Tillmann and Xia, 2003)

34
Extracted rewrite rules
  • Extracted rules 15.0 M
  • After removing singleton 2.9 M
  • After filtering with hierarchy 56 K
  • -- 1K unlexicalized rules
  • -- 55K lexicalized rules, represented as
  • 760 compact rule schemes.
  • Ex Adj (w) N gt Adj N
  • w new, first, prime, many, other, .

35
Most commonly used rules
  • of rules applied per sentence 1.4 times
  • Adj N gt N Adj
    0.32
  • Adj (w) N gt Adj N
    0.15
  • NP1 s NP2 gt NP2 de NP1 0.05
  • NP Adv V gt NP V Adv 0.03
  • NP1 V NP2 (pron) gt NP1 NP2 V 0.03

36
Monotonic vs. non-monotonic decoding
0.215
0.21
0.205
0.2
0.195
non-monotonic
0.19
monotonic
0.185
0.18
0.175
0.17
baseline
w/ rewrite rules
37
Monotonic decoding results
Bleu Score (1-ref)
baseline
Max source clump size
38
Conclusion
  • Use automatically learned rewrite rules to
    reorder source sentences
  • Rewrite rules allow generalizations
  • Monotonic decoding speeds up translation
  • Gain 10 improvement in Bleu.
  • (0.196 gt 0.215)

39
Future work
  • Try other language pairs (Ar-Eng, Ch-Eng).
  • Inject rewriting lattice into statistical
    models.
  • Use rewrite rules directly in the decoder.

40
Backup slides
41
An example of filtering rules
gt N 0.9
N (109)
(gain0)
  • (gain4105)
  • N Adj 0.7
  • Adj N 0.3

Adj N (106)
Adj(prime) N (103)
gt Adj N 1.0
Det Adj N
(gain103)
Det Adj(prime) N (102)
gt Det Adj N 0.85
(gain0)
42
An example
Eng the prime minister s press office issued
the following press release
Fr le premier ministre de presse service
diffusé le suivant communiqué
Fr le presse service de le premier
ministre
service de presse
du
a diffusé
le communiqué suivant
43
Main issues in MT
  • Word choice
  • office gt bureau, cabinet, , service
  • release gt libération, sortie, disque, ..
    communiqué
  • Inserting glue words
  • e.g., preposition de, aux verb a
  • Ordering target words service de presse
  • Morphing target words subject-verb agreement,
    contraction (de le gt du), etc.

44
Two approaches to MT
  • Syntax-based MT
  • Statistical MT (SMT)

45
Syntax-based MT
  • Major steps
  • Parse the source sentence
  • Translate source words into target words
  • Reshape source parse tree with rewrite rules
  • Read target sentence off the tree.

46
NP
de
press office
presse service
the prime minister
le premier ministre
Rewrite rules NP1 de NP2 gt NP2 de NP1
Translation lexicon prime gt premier, s gt de
office gt service if modified by press
N1 N2 gt N2 de N1
Translation service de presse de le premier
ministre
du
47
Syntax-based approach
  • It requires
  • a parser for the source language
  • a translation lexicon
  • a set of rewrite rules
  • Normally, these components are created by hand.

48
Statistical Machine Translation
E-gtF Translator
F-gtE Translator
  • Learn from parallel corpus
  • Easier to create translation systems for new
    language pairs
  • Phrase-based models outperforms word-based
    models.

49
Advantages of phrase pairs
  • Translating source word with extended context
    press office gt service de presse
  • Glue word insertion e.g., de
  • Ordering of target words
  • Morphing target words
  • the prime minster s gt du premier ministre
  • de le gt du

50
NSF workshop experiments
  • Data 150M-word Chinese-English parallel corpora.
    Top 1000 candidates, 4 references
  • Baseline (SMT) 0.316 in bleu
  • Oracle result 0.398 in bleu
  • Each method range from 0.304 to 0.325
  • Adding all good methods 0.332
  • Typical improvements
  • no syntax gt shallow tricky gt deep syntax
  • Sadly, no gain from syntax

51
Syntax-based rewrite rules
  • (X0 gt X1 Xn) gt (Y0 gt Y1 Yn)
  • Xi, Yi head word, thematic role (e.g., subj,
    obj),
  • syntax label (POS tag of the head word),
    etc.
  • VP gt V iobj NP w_it gt
  • VP gt iobj noun w_le V
  • V NP (iobj, it) gt NP (iobj, le) V

52
Parsing ESG parser
  • Slot grammar is a lexicalized, dependency-oriented
    system (McCord 1980)
  • Languages covered English, German, French,
    Spanish, Italian, and Portuguese.

53
(No Transcript)
54
Training (1) parse and learn rewrite rules
NP
1
2
2
1
N N
NP1 s NP2 gt NP2 de NP1
N1 N2 gt le N2 de N1
Det Adj N gt Det Adj N
55
Training (2) put Eng sentences into Fr order
N N
the prime minister
NP1 s NP2 gt NP2 de NP1
N1 N2 gt le N2 de N1
Det Adj N gt Det Adj N
gt le office de press de the prime minister
56
Training (3) learn phrases from training data
Eng le office de press de the prime minister
Fr le service de presse du premier ministre
Phrase pairs learned le office de press gt
le service de presse press de the prime
minister gt presse du premier minstre
le gt le de gt de, de the gt du
57
Translating (1) put Eng sentences into Fr order
economic policy
the government
NP1 s NP2 gt NP2 de NP1
Adj N gt N Adj
gt policy ecomomic de the goverment
58
Translating (2) translate with SMT decoder
Eng policy economic de the
government Phrase pairs learned at training
time policy gt politique economic gt
economique de the government gt du
gouvernement SMT output translating in linear
order politique economique du gourvernement
59
Test the idea
  • Training data 90M English-French Candide data
  • Test data 500 sentences, 1 reference translation
  • Parser English and French slot grammars (ESG and
    FSG)
  • Rewrite rules 10 hand-written rewrite rules
  • Adj N gt N Adj

60
Experimental results
Not reorder source Reorder source
Not reorder target
Reorder target
0.196
0.214
0.187
0.184
Improvement so far from 0.196 to 0.214 (9) NSF
workshop no gain from syntax
61
Learning rewrite rules from data
There are many rules and many exceptions
ADJ N gt N ADJ
0.47
ADJ N gt ADJ N
0.27
Ex small, recent, past, former, next, last,
good, previous, serious, certain, large,
great, various, ..
62
Algorithm
  • Parse source and target sentences
  • Align linguistic phrases
  • Extract rewrite rules

63
Filtering rewrite rules
  • Why?
  • Too many rules
  • Most are redundant
  • How?
  • Put rules into a hierarchy
  • Calculate gains w.r.t. parents

64
Translation results
  • Baseline (no rewrite rules) 0.196
  • with 10 hand-written rules 0.214
  • with 1K unlexicalized rules 0.211
  • with 1K unlexicalized rules and 760
  • meta rules 0.215

65
Details of filtering algorithm
  • Remove redundant unlexicalized rules
  • Remove redundant lexicalized rules w.r.t. the
    corresponding unlexicalized rules
  • Put
Write a Comment
User Comments (0)
About PowerShow.com