Title: A noncontiguous Tree Sequence Alignmentbased Model for Statistical Machine Translation Jun Sun , Min
1A non-contiguous Tree Sequence Alignment-based
Model for Statistical Machine Translation
Jun Sun, Min Zhang, Chew Lim
Tan
2Outline
- Introduction
- Non-contiguous Tree Sequence Modeling
- Rule Extraction
- Non-contiguous Decoding the Pisces Decoder
- Experiments
- Conclusion
3Contiguous and Non-contiguousBilingual Phrases
Non-contiguous translational equivalence
Contiguous translational equivalences
4Previous Work on Non-contiguous phrases
- (-) Zhang et al. (2008) acquire the
non-contiguous phrasal rules from the contiguous
tree sequence pairs, and find them useless via
real syntax-based translation systems. - () Wellington et al. (2006) statistically
report that discontinuities are very useful for
translational equivalence analysis using binary
branching structures under word alignment and
parse tree constraints. - () Bod (2007) also finds that discontinues
phrasal rules make significant improvement in
linguistically motivated STSG-based translation
model.
5Previous Work on Non-contiguous phrases (cont.)
VP(VV(?),NP(CP0,NN(??))) ? SBAR(WRB(when),S0)
Non-contiguous
Contiguous tree sequence pair
Contiguous tree sequence pair
6Previous Work on Non-contiguous phrases (cont.)
No match in rule set
7Proposed Non-contiguous phrases Modeling
. . .
Extracted from non-contiguous tree sequence pairs
8Contributions
- The proposed model extracts the translation rules
not only from the contiguous tree sequence pairs
but also from the non-contiguous tree sequence
pairs (with gaps). With the help of the
non-contiguous tree sequence, the proposed model
can well capture the non-contiguous phrases in
avoidance of the constraints of large
applicability of context and enhance the
non-contiguous constituent modeling. - A decoding algorithm for non-contiguous phrase
modeling
9Outline
- Introduction
- Non-contiguous Tree Sequence Modeling
- Rule Extraction
- Non-contiguous Decoding the Pisces Decoder
- Experiments
- Conclusion
10SncTSSG
- Synchronous Tree Substitution Grammar (STSG,
Chiang, 2006) -
- Synchronous Tree Sequence Substitution Grammar
(STSSG, Zhang et al. 2008) - Synchronous non-contiguous Tree Sequence
Substitution Grammar (SncTSSG)
11Word Aligned Parse Tree and Two Parse Tree
Sequence
VBA
VO
e
e
r
t
b
u
s
P
R
VG
NG
?
?
?
??
a
b
s
t
r
a
c
t
S
u
b
s
t
r
u
VBA
c
t
u
r
e
VO
R
VG
P
NG
?
?
1. Word-aligned bi-parsed Tree
2. Two Structure 3. Two Tree
Sequences
12Contiguous Translation Rules
r1. Contiguous Tree-to-Tree Rule r2.
Contiguous Tree Sequence Rule
13Non-contiguous Translation Rules
r1. Non-contiguous Tree-to-Tree Rule r2.
Non-contiguous Tree Sequence Rule
14Outline
- Introduction
- Non-contiguous Tree Sequence Modeling
- Rule Extraction
- Non-contiguous Decoding the Pisces Decoder
- Experiments
- Conclusion
15A word-aligned parse tree pairs
16Example for contiguous rule extraction(1)
17Example for contiguous rule extraction(2)
18Example for contiguous rule extraction(3)
19Example for contiguous rule extraction(4)
Abstract into substructures
20Example for non-contiguous rule extraction(1)
Extracted from non-contiguous tree sequence pairs
21Example for non-contiguous rule extraction(2)
Abstract into substructures from non-contiguous
tree sequence pairs
22Outline
- Introduction
- Non-contiguous Tree Sequence Modeling
- Rule Extraction
- Non-contiguous Decoding the Pisces Decoder
- Experiments
- Conclusion
23The Pisces Decoder
- Pisces conducts searching by the following two
modules - The first one is a CFG-based chart parser as a
pre-processor for mapping an input sentence to a
parse tree Ts (for details of chart parser,
please refer to Charniak (1997)) - The second one is a span-based tree decoder (3
phases) - Contiguous decoding (same with Zhang et al. 2008)
- Source side non-contiguous translation
- Tree sequence reordering in Target side
24Source side non-contiguous translation
Right insertion
Left insertion
IN(in)
NP(...)
NP(...)
25Tree sequence reordering in Target side
- Binarize each span into the left one and the
right one. - Generating the new translation hypothesis for
this span by inserting the candidate translations
of the right span to each gap in the ones of the
left span. - Generating the translation hypothesis for this
span by inserting the candidate translations of
the left span to each gap in the ones of the
right span.
- A candidate hypo
- taget span
26Modeling
- source/target sentence
- source/target parse tree
- a non-contiguous
source/target tree sequence - source/target spans
- hm the feature function
27Features
- The bi-phrasal translation probabilities
- The bi-lexical translation probabilities
- The target language model
- The of words in the target sentence
- The of rules utilized
- The average tree depth in the source side of the
rules adopted - The of non-contiguous rules utilized
- The of reordering times caused by the
utilization of the non-contiguous rules
28Outline
- Introduction
- Non-contiguous Tree Sequence Modeling
- Rule Extraction
- Non-contiguous Decoding the Pisces Decoder
- Experiments
- Conclusion
29Experimental settings
- Training Corpus
- Chinese-English FBIS corpus
- Development Set
- NIST MT 2002 test set
- Test Set
- NIST MT 2005 test set
- Evaluation Metrics
- case-sensitive BLEU-4
- Parser
- Stanford Parser (Chinese/English)
- Evaluation
- mteval-v11b.pl
- Language Model
- SRILM 4-gram
- Minimum error rate training
- (Och, 2003)
- Model Optimization
- Only allow gaps in one side
30Model comparison in BLEU
- Table 1 Translation results of different models
(cBP refers to contiguous bilingual phrases
without syntactic structural information, as used
in Moses)
31Rule combination
cR rules derived from contiguous tree sequence
pairs (i.e., all STSSG rules) ncPR
non-contiguous rules derived from contiguous tree
sequence pairs with at least one non-terminal
leaf node between two lexicalized leaf
nodes srcncR non-contiguous rules with gaps in
the source side tgtncR non-contiguous rules with
gaps in the target side srctgtncR
non-contiguous rules with gaps in either side
- Table 2 Performance of different rule combination
32Bilingual Phrasal Rules
cR rules derived from contiguous tree sequence
pairs (i.e., all STSSG rules) ncPR
non-contiguous rules derived from contiguous tree
sequence pairs with at least one non-terminal
leaf node between two lexicalized leaf
nodes srcncBP non-contiguous phrasal rules with
gaps in the source side tgtncBP non-contiguous
phrasal rules with gaps in the target
side srctgtncBP non-contiguous phrasal rules
with gaps in either side
- Table 3 Performance of bilingual phrasal rules
33Maximal number of gaps
- Table 4 Performance and rule size changing with
different maximal number of gaps
34Sample translations
35Conclusion
- Able to attain better ability of non-contiguous
phrase modeling and the reordering caused by
non-contiguous constituents with large gaps from - Non-contiguous tree sequence alignment model
based on SncTSSG - Observations
- In Chinese-English translation task, gaps are
more effective in Chinese side than in the
English side. - Allowing one gap only is effective
- Future Work
- Redundant non-contiguous rules
- Optimization of the large rule set
36