Title: Improving a Statistical MT System with
1Improving a Statistical MT System with
Automatically Learned Rewrite Rules
Fei Xia and Michael McCord
IBM T. J. Watson Research Center Yorktown
Heights, New York
2Previous attempt at using syntax
- 2003 NSF/JHU MT Summer Workshop
- (Och et. al., 2004)
- Method run SMT to generate topN translations,
and use syntactic info to rerank the candidates. - gt Parsing MT output is problematic.
- Result no gain from syntax.
3Outline
- Current phrase-based SMT systems
- (a.k.a. clump-based systems)
- Overview of the new approach
- Learning and applying rewrite rules
- Experimental results
- Conclusion and future work
4Clump-based SMT
- The unit of translation is a clump, rather than a
word. A clump is simply a word n-gram. - Ex P (est le premier is the first)
- Baseline system (Tillmann Xia, 2003)
5Clump-based system training stage
Source Sentence
Preprocessor
parallel data
Clump Extractor
Preprocessor
Clump Library
Target sentence
6Clump-based system translation stage
Source sentence
Preprocessor
Target translation
Decoder
Clump Library
Language Model
7Baseline system clump extraction
France is the first western country
La France est le premier pays occidental
France gt France
France is gt France est
8Baseline system clump extraction
France is the first western country
La France est le premier pays occidental
France gt France
France is gt France est
France is the gt France est le
9Baseline system decoding
He is the first international student
he gt il, he is gt il est, he is the gt il est
le first gt premier, is the first gt est le
premier international gt international
student gt étudiant
He is the first international student
10Monotonic vs. non-monotonic decoding
He is the first international student
Monotonic decoding
(S1, S2, S3, S4)
il est le premier international étudiant
Non-monotonic decoding
(S1, S2, S4, S3)
il est le premier étudiant international
(S2, S1, S3, S4)
premier il est le international étudiant
...
11Challenges for currentclump-based systems
(1) Non-monotonic decoding is expensive (n!), and
it can hurt performance.
premier il est le international étudiant
(S2, S1, S3, S4)
premier international il est le étudiant
(S2, S3, S1, S4)
premier international étudiant il est le
(S2, S3, S4, S1)
12Challenges for current clump-based systems (ctd)
- (2) No phrase-level generalizations are
- learned and used.
- France is the first western country.
- He is the first international student.
-
- Rewrite rules are useful
- word-level rule Adj N gt N Adj
- phrase-level rule Subj V Obj gt V
Subj Obj -
13New approach
He is the first international student
Applying rewrite rules Adj N gt N Adj
He is the first student international
Monotonic decoding
il est le premier étudiant international
14Rewrite rules
NP V NP gt V NP NP
15Defaults and exceptions
Adj N gt N Adj
Adj (first) N gt Adj (premier) N
NP0 V NP1 gt NP0 V NP1
NP0 V NP1 (iobj, pron) gt NP0 NP1 (iobj, pron)
V
? Learn both defaults and exceptions
16New approach training stage 1
Source Sentence
Parser
Parallel data
Phrase Aligner
Rewrite Rule Extractor
Parser
Target sentence
Rewrite rules
17New approach training stage 2
Source Sentence
Source sentence in target word order
Rewrite Rule Applier
Preprocessor
Parser
Parallel data
Rewrite rules
Clump Extractor
Preprocessor
Clump Library
Target sentence
18New approach translation stage
Source sentence
source sentence in target word order
Preprocessor
Rewrite rule applier
Parser
Rewrite rules
Target translation
Decoder
Clump Library
Language Model
19Tasks
- Learn rewrite rules automatically from data.
- Apply rewrite rules to source parse trees.
20Learning rewrite rules
- Parse source and target sentences
- Align linguistic phrases
- Extract rewrite rules
- Organize rewrite rules into a hierarchy
21Parsing
Slot grammar (McCord 1980, 1993, )
France is the first western country
22Parse trees in Penn Treebank style
S
NP-PRD
NP-SBJ
V
Det
Adj
Adj
N
N
is
western
France
the
first
country
23Aligning phrases
24Extracting rewrite rules
S
NP-SBJ
V
NP-PRD
N
Det
est
Det
Adj
N
Adj
pays
la
France
le
premier
occidental
NP0 (France) V(is) NP1 (country) gt NP0
V NP1
25Extracting rewrite rules
S
NP-SBJ
V
NP-PRD
N
Det
est
Det
Adj
N
Adj
pays
la
France
le
premier
occidental
NP0 (France) V(is) NP1 (country) gt NP0
V NP1
N (France) gt Det (la) N
26Extracting rewrite rules
S
NP-SBJ
V
NP-PRD
N
Det
est
Det
Adj
N
Adj
pays
la
France
le
premier
occidental
NP0 (France) V(is) NP1 (country) gt NP0
V NP1
N (France) gt Det (la) N
Det (the) Adj 1 (first) Adj2 (western) N
(country) gt Det Adj1 N Adj2
27Creating more generalized rules
- Det (the) Adj1 (first) Adj2 (western) N (country)
gt Det Adj1 N Adj2 -
- Adj (first) N (country) gt Adj N
- Adj N gt Adj N
- Adj (first) N gt Adj N
-
- Adj (western) N (country) gt N Adj
- Adj N gt N Adj
- Adj (western) N gt N Adj
-
28Merging the counts and normalize
- ADJ N gt N ADJ 539328
0.64 - ADJ N gt ADJ N 278091
0.33 - ADJ (first) N gt ADJ N 10245 0.99
- ADJ (first) N gt N ADJ 103
0.01 - ADJ (first) N (country) gt ADJ N 27 1.0
29Organizing rewrite rules
N
N
N PP
Adj (first) N
gt Adj N 0.99
Adj (first) N
Adj N PP
gt N Adj PP 0.61
Adj N PP
gt N Adj 0.01
gt Adj N PP 0.30
Adj(first) N(country)
Adj(first) N(country)
gt Adj N 1.0
30Organizing rewrite rules
Adj (first) N
gt Adj N 0.99
Adj (first) N
gt N Adj 0.26
gt Det Adj N 0.35
31Applying rewrite rules
Adj N gt N Adj
Adj1 Adj2 N gt N Adj2 Adj1
Adj1 (first) Adj2 N gt Adj1 N Adj2
32Decoding
He is the first international student
Applying rewrite rules Adj (first) Adj N gt Adj
N Adj
He is the first student international
Monotonic decoding
il est le premier étudiant international
33Experimental Result
- Training data 90M-word Eng-Fr Canadian Hansard
- Test data 500 sentences in news domain
- Metrics Bleu score (Papineni et. al., 2002)
- 1-reference translation
- Parser English and French slot grammars
- Baseline system (Tillmann and Xia, 2003)
34Extracted rewrite rules
- Extracted rules 15.0 M
- After removing singleton 2.9 M
- After filtering with hierarchy 56 K
- -- 1K unlexicalized rules
- -- 55K lexicalized rules, represented as
- 760 compact rule schemes.
-
- Ex Adj (w) N gt Adj N
- w new, first, prime, many, other, .
35Most commonly used rules
- of rules applied per sentence 1.4 times
- Adj N gt N Adj
0.32 - Adj (w) N gt Adj N
0.15 - NP1 s NP2 gt NP2 de NP1 0.05
-
- NP Adv V gt NP V Adv 0.03
- NP1 V NP2 (pron) gt NP1 NP2 V 0.03
36Monotonic vs. non-monotonic decoding
0.215
0.21
0.205
0.2
0.195
non-monotonic
0.19
monotonic
0.185
0.18
0.175
0.17
baseline
w/ rewrite rules
37Monotonic decoding results
Bleu Score (1-ref)
baseline
Max source clump size
38Conclusion
- Use automatically learned rewrite rules to
reorder source sentences - Rewrite rules allow generalizations
- Monotonic decoding speeds up translation
- Gain 10 improvement in Bleu.
- (0.196 gt 0.215)
39Future work
- Try other language pairs (Ar-Eng, Ch-Eng).
- Inject rewriting lattice into statistical
models. - Use rewrite rules directly in the decoder.
40Backup slides
41An example of filtering rules
gt N 0.9
N (109)
(gain0)
Adj N (106)
Adj(prime) N (103)
gt Adj N 1.0
Det Adj N
(gain103)
Det Adj(prime) N (102)
gt Det Adj N 0.85
(gain0)
42An example
Eng the prime minister s press office issued
the following press release
Fr le premier ministre de presse service
diffusé le suivant communiqué
Fr le presse service de le premier
ministre
service de presse
du
a diffusé
le communiqué suivant
43 Main issues in MT
- Word choice
- office gt bureau, cabinet, , service
- release gt libération, sortie, disque, ..
communiqué - Inserting glue words
- e.g., preposition de, aux verb a
- Ordering target words service de presse
- Morphing target words subject-verb agreement,
contraction (de le gt du), etc.
44Two approaches to MT
- Syntax-based MT
- Statistical MT (SMT)
45Syntax-based MT
- Major steps
- Parse the source sentence
- Translate source words into target words
- Reshape source parse tree with rewrite rules
- Read target sentence off the tree.
46NP
de
press office
presse service
the prime minister
le premier ministre
Rewrite rules NP1 de NP2 gt NP2 de NP1
Translation lexicon prime gt premier, s gt de
office gt service if modified by press
N1 N2 gt N2 de N1
Translation service de presse de le premier
ministre
du
47Syntax-based approach
- It requires
- a parser for the source language
- a translation lexicon
- a set of rewrite rules
- Normally, these components are created by hand.
48Statistical Machine Translation
E-gtF Translator
F-gtE Translator
- Learn from parallel corpus
- Easier to create translation systems for new
language pairs - Phrase-based models outperforms word-based
models.
49Advantages of phrase pairs
- Translating source word with extended context
press office gt service de presse - Glue word insertion e.g., de
- Ordering of target words
- Morphing target words
- the prime minster s gt du premier ministre
- de le gt du
50NSF workshop experiments
- Data 150M-word Chinese-English parallel corpora.
Top 1000 candidates, 4 references - Baseline (SMT) 0.316 in bleu
- Oracle result 0.398 in bleu
- Each method range from 0.304 to 0.325
- Adding all good methods 0.332
- Typical improvements
- no syntax gt shallow tricky gt deep syntax
- Sadly, no gain from syntax
51Syntax-based rewrite rules
- (X0 gt X1 Xn) gt (Y0 gt Y1 Yn)
- Xi, Yi head word, thematic role (e.g., subj,
obj), - syntax label (POS tag of the head word),
etc. - VP gt V iobj NP w_it gt
- VP gt iobj noun w_le V
- V NP (iobj, it) gt NP (iobj, le) V
52Parsing ESG parser
- Slot grammar is a lexicalized, dependency-oriented
system (McCord 1980) - Languages covered English, German, French,
Spanish, Italian, and Portuguese.
53(No Transcript)
54Training (1) parse and learn rewrite rules
NP
1
2
2
1
N N
NP1 s NP2 gt NP2 de NP1
N1 N2 gt le N2 de N1
Det Adj N gt Det Adj N
55Training (2) put Eng sentences into Fr order
N N
the prime minister
NP1 s NP2 gt NP2 de NP1
N1 N2 gt le N2 de N1
Det Adj N gt Det Adj N
gt le office de press de the prime minister
56Training (3) learn phrases from training data
Eng le office de press de the prime minister
Fr le service de presse du premier ministre
Phrase pairs learned le office de press gt
le service de presse press de the prime
minister gt presse du premier minstre
le gt le de gt de, de the gt du
57Translating (1) put Eng sentences into Fr order
economic policy
the government
NP1 s NP2 gt NP2 de NP1
Adj N gt N Adj
gt policy ecomomic de the goverment
58Translating (2) translate with SMT decoder
Eng policy economic de the
government Phrase pairs learned at training
time policy gt politique economic gt
economique de the government gt du
gouvernement SMT output translating in linear
order politique economique du gourvernement
59Test the idea
- Training data 90M English-French Candide data
- Test data 500 sentences, 1 reference translation
- Parser English and French slot grammars (ESG and
FSG) - Rewrite rules 10 hand-written rewrite rules
- Adj N gt N Adj
60Experimental results
Not reorder source Reorder source
Not reorder target
Reorder target
0.196
0.214
0.187
0.184
Improvement so far from 0.196 to 0.214 (9) NSF
workshop no gain from syntax
61Learning rewrite rules from data
There are many rules and many exceptions
ADJ N gt N ADJ
0.47
ADJ N gt ADJ N
0.27
Ex small, recent, past, former, next, last,
good, previous, serious, certain, large,
great, various, ..
62Algorithm
- Parse source and target sentences
- Align linguistic phrases
- Extract rewrite rules
63Filtering rewrite rules
- Why?
- Too many rules
- Most are redundant
- How?
- Put rules into a hierarchy
- Calculate gains w.r.t. parents
64Translation results
- Baseline (no rewrite rules) 0.196
- with 10 hand-written rules 0.214
- with 1K unlexicalized rules 0.211
- with 1K unlexicalized rules and 760
- meta rules 0.215
65Details of filtering algorithm
- Remove redundant unlexicalized rules
- Remove redundant lexicalized rules w.r.t. the
corresponding unlexicalized rules - Put