Title: Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation
1Reordering Model UsingSyntactic Information of a
Source Treefor Statistical Machine Translation
1
2,3
- Kei Hashimoto , Hirohumi Yamamoto ,
- Hideo Okuma , Eiichiro Sumita ,
- and Keiichi Tokuda
2,4
2,4
1,2
1
Nagoya Institute of Technology National
Institute of Information and Communications
Technology Kinki University ATR Spoken Language
Communication Research Labs.
2
3
4
2Background (1/2)
- Phrase-based statistical machine translation
- Can model local word reordering
- Short idioms
- Insertions and deletions of words
- Errors in global word reordering
- Word reordering constraint technique
- Linguistically syntax based approach
- Source tree, target tree, both tree structures
- Formal constraints on word permutations
- IBM distortion, lexical reordering model, ITG
3Background (2/2)
- Imposing a source tree on ITG (IST-ITG)
- Extension of ITG constraints
- Introduce a source sentence tree structure
- Cannot evaluate the accuracy of the target word
orders - Reordering model using syntactic information
- Extension of IST-ITG constraints
- Rotation of source-side parse-tree
- Can be briefly introduce to the phrase-based
translation system
4Outline
- Background
- ITG IST-ITG constraints
- Proposed reordering model
- Training of the proposed model
- Decoding using the proposed model
- Experiments
- Conclusions and future work
5Inversion transduction grammar
- ITG constraints
- All possible binary tree structures are generated
from the source word sequence - The target sentence is obtained by rotating any
node of the generated binary trees - Can reduce the number of target word orders
- Not consider the tree structure instance
6Imposing source tree on ITG
- Directly introduce a source sentence tree
structure to ITG
Source sentence tree structure
The target sentence is obtained by rotating any
node of source sentence tree structure
7Non-binary tree
- The parsing results sometimes produce non-binary
trees
Any reordering of child nodes in non-binary
subtree is allowed
A
B
8Problem of IST-ITG
- Cannot evaluate the accuracy of the target word
reordering - ? Assign an equal probability to all rotations
Equal probability
Propose reordering model using syntactic
information
9Outline
- Background
- ITG IST-ITG constraints
- Proposed reordering model
- Training of the proposed model
- Decoding using the proposed model
- Experiments
- Conclusions and future work
10Abstract of proposed method
- Rotation of each subtree type is modeled
Source-side parse-tree
Subtree type
Reordering probability
Reordering model using syntactic information
11Related work 1
- Statistical syntax-directed translation with
extended domain of locality Liang Huang et al.
2006 - Extract rules for tree-to-string translation
- Consider syntactic information
- Consider multi-level trees on the source-side
S
NP
VP
NP
VB
12Related work 2
- Proposed reordering model
- Used in phrase-based translation
- Estimation of proposed model is independently
conducted from phrase extraction - Child node reordering in one-level subtree
- Cannot represent complex reordering
- Reordering using syntactic information can be
briefly introduced to phrase-based translation
13Training algorithm (1/3)
- Reordering model training
- 1. Word alignment
- 2. Parsing source sentence
1.
2.
S
source
VP
NP
target
AUX
NP
NN
DT
14Training algorithm (2/3)
- 3. Word alignments and source-side parse-trees
are combined - 4. Rotation position is checked (monotone or
swap)
3.
4.
15Training algorithm (3/3)
- 5. Reordering probability of the subtree is
estimated by counting each rotation position - Non-binary subtree
- Any orderings for child nodes are allowed
- Rotation positions are categorized into only two
type - ? Monotone or other (swap)
16Remove subtree samples
- Target word orders which are not derived from
rotating nodes of source-side parse-tree - Linguistic reasons
- Difference of sentence structures
- Non-linguistic reasons
- Errors of word alignments and syntactic analysis
17Clustering of subtree type
- Number of possible subtree types is large
- Unseen subtree type
- Subtree type observed a few times
- ? Cannot model exactly
- Clustering of subtree type
- The number of training samples is less than a
heuristic threshold - Estimate clustered model from the counts of
clustered subtree types
18Decode using proposed model
- Phrase-based decoder
- Constrained by IST-ITG constraints
- Target sentence is generated by rotating any node
of the source-side parse-tree - Target word ordering that destroys a source
phrase is not allowed - Check the rotation positions of subtrees
- Calculate the reordering probabilities
19Decode using proposed model
- Calculate reordering probability
Subtree Rotation position
monotone
swap
monotone
A
B
C
D
E
Source sentence
b
a
c
d
e
Target sentence
20Decode using proposed model
- Calculate reordering probability
Subtree Rotation position
swap
monotone
monotone
A
B
C
D
E
Source sentence
c
d
e
a
b
Target sentence
21Rotation position included in a phrase
- Cannot determine the rotation position
- Word alignments included a phrase are not clear
- ? Assign the higher probability, monotone or swap
Subtree Rotation position
swap
higher
higher
A
B
C
D
E
Phrase
Phrase
22Outline
- Background
- ITG IST-ITG constraints
- Proposed reordering model
- Training of the proposed model
- Decoding using the proposed model
- Experiments
- Conclusions and future work
23Experimental conditions
- Compared methods
- Baseline IBM distortion, lexical reordering
models - IST-ITG Baseline IST-ITG constraint
- Proposed Baseline proposed reordering model
- Training
- GIZA toolkit
- SRI language model toolkit
- Minimum error rate training (BLEU-4)
- Charniak parser
24Experimental conditions (E-J)
- English-to-Japanese translation experiment
- JST Japanese-English paper abstract corpus
English Japanese
Training data Sentences 1.0M 1.0M
Training data Words 24.6M 28.8M
Development data Sentences 2.0K 2.0K
Development data Words 50.1K 58.7K
Test data Sentences 2.0K 2.0K
Test data Words 49.5K 58.0K
Dev. and test data single reference
25Experimental results (E-J)
- Proposed reordering model
- Results of test set
Subtree sample 13M
Remove sample 3M (25.38)
Subtree type 54K
Threshold 10
Number of models 6K clustered
Coverage 99.29
Baseline IST-ITG Proposed
BLEU-4 27.87 29.31 29.80
Improved 0.49 points from IST-ITG
26Experimental conditions (E-C)
- English-to-Chinese translation experiment
- NIST MT08 English-to-Chinese translation track
English Chinese
Training data Sentences 4.6M 4.6M
Training data Words 79.6M 73.4M
Development data Sentences 1.6K 1.6K
Development data Words 46.4K 39.0K
Test data Sentences 1.9K 1.9K
Test data Words 45.7K 47.0K (Ave.)
Test data 4 references
Dev. data single references
27Experimental results (E-C)
- Proposed reordering model
- Results of test set
Subtree sample 50M
Remove sample 10M (20.36)
Subtree type 2M
Threshold 10
Number of models 19K clustered
Coverage 99.45
Baseline IST-ITG Proposed
BLEU-4 17.54 18.60 18.93
Improved 0.33 points from IST-ITG
28Conclusions and future work
- Conclusions
- Extension of the IST-ITG constraints
- Reordering using syntactic information can be
briefly introduced to the phrase-based
translation - Improve 0.49 points in BLEU from IST-ITG
- Future work
- Simultaneous training of translation and
reordering models - Deal with the complex reordering which is due to
difference of sentence tree structures
29Thank you very much!
30Number of target word orders
- Number of target word orders in a target word
sequence (binary tree)
of words IST-ITG ITG No Constraint
1 1 1 1
2 2 2 2
4 8 22 24
8 128 8,558 40,320
10 512 206,098 3,628,800
15 16,384 745,387,038 1,307,674,368,000
31Example of subtree model
Subtree type s
SPP,NPVP. 0.764
NPDTNNNN 0.816
VPAUXVP 0.664
VPVBNPP 0.864
NPNPPP 0.837
NPDPJJNN 0.805
Swap probability 1.0 Monotone probability