Title: An Overview of Syntaxbased SMT
1An Overview of Syntax-based SMT
2Outline
- Why syntax-based SMT (SSMT)
- Types of SSMT
- SSMT based on formal structures
- SSMT based on phrase structures
- SSMT based on dependency structures
- Conclusions
3Why syntax-based SMT
- Weakness of phrase-based SMT
- Long-distance reordering phrase-level reordering
- Discontinuous phrases
- Generalization
-
- Other methods using syntactic knowledge
- Word alignment integrating syntactic constraints
- Pre-order source sentences
- Rerank n-best output of translation models
4Types of SSMT
- Three categories
- Formal structures
- Phrase structures
- Dependency structures
5SSMT based on formal stuctures
- Compared with phrase-based SMT
- Translated hierarchically
- The target structures finally generated are not
necessarily real linguistic structures, but - Make long-distance reorderings more feasible
- Introduce non-terminals/variables
- Discontinuous phrases put x on, ? x ?
- Generalization
6Work based on formal structures
- Inversion Transduction Grammar (ITG) proposed by
Wu (1997, 1998) - Hierarchical phrase-based model proposed by
Chiang (2005) - Both are synchronous context-free grammar (SCFG)
7SCFG
- Formulated
- Two CFGs and there correspondences
- Or
- P
8SCFG an example
9SCFG derivation
10ITG
- synchronous CFGs in which the links between
nonterminals in a production are restricted to
two possible configurations - Inverted
- Straight
- Any ITG can be converted into a synchronous CFG
of rank two.
11BTG
12ITG as reordering constraint
- Two kinds of reorderings
- Inverted
- straight
- Coverage
- Wu(1997) been unable to find real examples of
cases where alignments would fail under this
constraint, at least in lightly inflected
languages, such as English and Chinese. - Wellington(2006) we found examples, at least
5 of the Chinese/English sentence pairs. - Weakness
- No strong mechanism determining which order is
better, inverted or straight.
13Chiang05 Hierarchical Phrase-based Model (HPM)
- Rules
- Glue rule
- Model log-linear
- Decoder CKY
14Chiang05 rule extraction
15Chiang05 rule extraction restrictions
- Initial base rule at most 15 on French side
- Final rule at most 5 on French side
- At most two nonterminals on each side,
nonadjacent - At least one aligned terminal pair
16Chiang05 Model
17Chiang05 decoder
18Outline
- Why syntax-based SMT (SSMT)
- Types of SSMT
- SSMT based on formal structures
- SSMT based on phrase structures
- SSMT based on dependency structures
- Conclusions
19SSMT based on phrase structures
- Using grammars with linguistic knowledge
- The grammars are based on SCFG
- Two categories
- Tree-string
- Tree-to-string
- String-to-tree
- Tree-tree
20String-to-tree Models
- ISI family models
- Yamada Knight 2001, 2003
- Galley et al. 2004, 2006
- Marcu et al. 2006
21Yamada Knight 2001, 2003
22Yamadas work vs. SCFG
- Insertion operation
- A ? (wA1, A1)
- Reordering operation
- A ?(A1A2A3, A1A3A2)
- Translating operation
- A ?(x, y)
23Yamada weakness
- Single-level mapping
- Multi-level reordering
- Yamada flatten
- Word-based
- Yamada phrasal leaf
24Galley et al. 2004, 2006
- translation model incorporates syntactic
structure on the target language side - trained by learning translation rules from
bilingual data - the decoder uses a parser-like method to create
syntactic trees as output hypotheses
25Translation rules
- Translation rules
- Target multi-level subtrees
- Source continuous or discontinuous phrases
- Types of translation rules
- Translating source phrases into target chunks
- NPB(PRP/I) ??
- NP-C(NPB(DT/this NN/address)) ??? ??
26Types of translation rules
- Have variables
- NP-C(NPB(PRP/my x0NN)) ?? ? x0
- PP(TO/to NP-C(NPB(x0NNS NNP/park))) ? ? x0 ??
- Combine previously translated results together
- VP(x0VBZ x1NP-C) ? x1 x0
- takes a noun phrase followed by a verb, switches
their order, then combines them into a new verb
phrase
27Rules extraction
- Word-align a parallel corpus
- Parse the target side
- Extract translation rules
- Minimal rules can not be decomposed
- Composed rules composed by minimal rules
- Estimate probalities
28Rule extraction
Minimal rule
29Composed rules
30Format is Expressive
Non-constituent Phrases
Phrasal Translation
Non-contiguous Phrases
S
VP
VP
poner, x0
hay, x0
está, cantando
PRO
VP
VB
x0NP
PRT
VBZ
VBG
VB
x0NP
there
on
is
singing
put
are
Multilevel Re-Ordering
Lexicalized Re-Ordering
Context-Sensitive Word Insertion
NP
S
NPB
x0
x0NP
PP
x1, , x0
x1, x0, x2
x0NP
VP
DT
x0NNS
P
x1NP
x1VB
x2NP2
the
of
Knight Graehl, 2005
31decoder
- probabilistic CYK-style parsing algorithm with
beams - results in an English syntax tree corresponding
to the Chinese sentence - guarantees the output to have some kind of
globally coherent syntactic structure
32Decoding example
33Decoding example
34Decoding example
35Decoding example
36Decoding example
37Marcu et al. 2006
- SPMT
- Integrating non-syntactifiable phrases
- Multiple features for each rule
- Decoding with multiple models
38SSMT based on phrase structures
- Two categories
- Tree-string
- String-to-tree
- Tree-to-string
- Tree-tree
39Tree-to-string
- Liu et al. 2006
- Tree-to-string alignment template model
40TAT
41TAT extraction
- Constraints
- Source trees have to be Subtree
- Have to be consistent with word alignment
- Restrictions on extraction
- both the first and last symbols in the target
string must be aligned to some source symbols - The height of T(z) is limited to no greater than
h - The number of direct descendants of a node of
T(z) is limited to no greater than c
42TAT Model
43Decoding
44Tree-to-string vs. string-to-tree
- Tree-to-string
- Integrating source structures into translation
and reordering - The output can not be grammatical
- string-to-tree
- guarantees the output to have some kind of
globally coherent syntactic structure - Can not use any knowledge from source structures
45SSMT based on phrase structures
- Two categories
- Tree-string
- String-to-tree
- Tree-to-string
- Tree-tree
46Tree-Tree
- Synchronous tree-adjoining grammar (STAG)
- Synchronous tree substitution grammar (STSG)
47STAG
48STAG derivation
49STSG
50STSG elementary trees
51Outline
- Why syntax-based SMT (SSMT)
- Types of SSMT
- SSMT based on formal structures
- SSMT based on phrase structures
- SSMT based on dependency structures
- Conclusions
52Dependency structures
IP
VP
NP
??
NP
NP
NP
ADJP
NP
??
NN
NN
NN
VV
NR
NN
JJ
NN
???
??
??
??
??
??
?? ?? ?? ?? ?? ?? ?? ???
(b)
(a)
53For MT dependency structures vs. phrase
structures
- Advantages of dependency structures over phrase
structures for machine translation - Inherent lexicalization
- Meaning-relative
- Better representation of divergences across
languages
54SSMT based on dependency structures
- Lin 2004
- A Path-based Transfer Model for Machine
Translation - Quirk et al. 2005
- Dependency Treelet Translation Syntactically
Informed Phrasal SMT - Ding et al. 2005
- Machine Translation Using Probabilistic
Synchronous Dependency Insertion Grammars
55Lin 2004
- Translation model trained by learning transfer
rules from bilingual corpus where the source
language sentences are parsed. - decoding finding the minimum path covering of
the source language dependency tree
56Lin 2004 path
57Lin 2004 transfer rule
58Quirk et al. 2005
- Translation model trained by learning treelet
pairs from bilingual corpus where the source
language sentences are parsed. - Decoding CKY-style
59Treelet pairs
60Quirk 2005 decoding
61Ding 2005
62Outline
- Why syntax-based SMT (SSMT)
- Types of SSMT
- SSMT based on formal structures
- SSMT based on phrase structures
- SSMT based on dependency structures
- Conclusions
63Conclusions