Title: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)
1Dependency Treelet Translation Syntactically
Informed Phrasal SMT(ACL 2005)
- Chris Quirk, Arul Menezes
- and Colin Cherry
2Outline
- Limitations of SMT and previous work
- Modeling and training
- Decoding
- Experiments
- Conclusion
3Limitations of string-based phrasal SMT
- It allows only limit phrase reordering.
- Ex max jump, max skip
- It cannot express linguistic generalization
- Ex they cannot express SOV ? SVO
- Source and target phrases have to be contiguous
- Ex it cannot handle ne pas
4Previous work on syntactic SMT Simultaneous
parsing
- Inversion Transduction Grammars (Wu, 1997)
- Using simplifying assumptions X ? AB
- Head transducer (Alshawi et al., 2000)
- Simultaneous induction of src and tgt dependency
trees
5Previous work on syntactic SMT parsing transfer
- Tree-to-string (Yamada and Knight, 2001)
- Parse tgt sentence, and convert the tgt tree to a
src string - Path-based transfer model (Lin, 2004)
- Translate paths in src dependency trees
- LF-level transfer (Menezes and Richardson, 2001)
- Parse both sr and tgt.
6Previous work on syntactic SMTpre- or
post-processing
- Post-processing (JHU 2003) re-ranking the n-best
list of SMT output using syntactic models. - Parse MT output
- No improvement, even when n16,000
- Pre-processing (Xia McCord, 2004 Colins et al,
2005 .) - Reorder src sents before SMT
- Some improvement
7Outline
- Limitations of SMT and previous work
- Modeling and training
- Decoding
- Experiments
- Conclusion
8Whats new?
- The union of translation a treelet pair.
- A treelet is an arbitrary connected subgraph (not
necessarily a subtree) of a dependency tree. - In comparison
- Src n-grams phrase-based SMT
- Path (Lin, 2004)
- Context-free rules many transfer-based MT
systems - ? Decoding is more complicated.
9Required modules
- Source dependency parser
- Target word segmenter / tokenizer
- Word aligner GIZA
10Major steps for training
- Align src and tgt words
- Parse source side
- Project dependency trees
- Extract treelet translation pairs
- Train an order model
- Train other models
11Step 1 Word alignment
- Use GIZA to get alignments in both directions,
and combine the results with heuristics. - One constraint for n-to-1 alignments, the n src
words have to be adjacent in the src dependency
tree.
12Heuristics used to accept alignments from the
union
? It does not accept m-to-n alignments
13Step 2 parsing source side
- It requires a source dependency parser that
- produces unlabeled, ordered dependency trees, and
- annotates each src word with a POS tag
- Their system does not allow crossing
dependencies - h(i)k ? for any j between i and k, h(j) is also
between i and k.
14Step 3 Projecting dependency trees
- Add links in the tgt dependency tree according to
word alignment types - 1-to-1 trivial
- n-to-1 trivial
- 1-to-n use heuristics
- Unaligned tgt words use heuristics
- Unaligned src words ignore them
151-to-1 and n-to-1 alignments
sk
sl
Sl
ti
tj
161-to-n alignment
a
b
b2
a
b1
The n tgt words should move as a unit -
treat the rightmost one as the head - all
other words depend on it.
17Unaligned target words
Given unaligned tgt word at position j, find
the closest positions (i,k), s.t. j is between
i and k and ti depends on tk (or vice versa).
tk
tj
ti
Such (i,k) might not exist. Because no crossing
is allowed, if (i,k) exists, it is unique.
18An example
startup
properties
and
options
proprietes
et
options
de
demarrage
19The reattachment pass to ensure phrasal cohesion
demarrage
et
20Reattachment pass
- For each node in the wrong order (relative to
its siblings), we reattach it to the lowest of
its ancestors s.t. it is in the correct place
relative to its siblings and parent. - Question how does the reattachment work?
- In what order are tree nodes checked?
- Once a node is moved, can it be moved again?
- How many levels do we have to check to decide
where to attach a node?
21An example
22Step 3 Projecting dependency trees(Recap)
- Before reattachment, the src and tgt dependency
trees are almost isomorphic - n-to-1 treat many src words as one node
- 1-to-n treat many tgt words as one node.
- Unaligned tgt words
- Unaligned src words
- After reattachment, the two trees can look very
different.
23Step 4 Extracting treelet translation pairs
- We extract all pairs of aligned src and tgt
treelets along with word-level alignment
linkages, up to a configurable max size. - Due to the reattachment step, a src treelet might
not align to a tgt treelet.
24Extraction algorithm
- Enumerate all possible source treelets.
- Look at the union of the target nodes aligned to
source nodes. If it is a treelet, keep the
treelet pair. - Allow treelets with wildcard roots.
- Ex doesnt ? ne pas
- Max size of treelets in practice, up to 4 src
words. - Question how many source treelets are there?
25An example
startup
properties
and
options
proprietes
et
options
de
demarrage
26Step 5 training an order model
27Another representation
28Learning a dependents position w.r.t. its head
P(pos(m,t) S, T) S src dependency tree
T unordered tgt dependency tree t (a.k.a.
h) a node in T m a child of t
? Use a decision tree to decide pos(m)
29(No Transcript)
30The prob of the order of tgt tree
c(t) is the set of nodes modifying t. (i.e.,
the children of t in the dependency tree)
Assumption the position of each child can be
modeled independently in terms of head-relative
position
31The order model (cont)
Comment this model is both straightforward
and Kind of counter-intuitive since treelets are
subgraphs.
32Step 6 train other models
(si, ti) is a treelet pair.
It assumes the uniform dist over all possible
Decompositions of a tree into treelets.
Two models - MLE - IBM Model 1
33Step 6 train other models (cont)
- Target LM n-gram LM
- Other features
- Target word number word penalty
- The number of phrases used.
- .
34Treelet vs. string-based SMT
- Similarities
- Use the log-linear framework.
- Similar features LM, word penalty,
- Differences
- Use treelet TM, instead of string-based TM.
- The order model is w.r.t. dependency trees.
35Outline
- Limitations of SMT and previous work
- Modeling and training
- Decoding
- Experiments
- Conclusion
36Challenges
- Traditional left-to-right decoding approach is
inapplicable. - The need to handle treelets perhaps
discontiguous or overlapping
37Ordering strategies
- Exhaustive search
- Greedy ordering
- No ordering
38Exhaustive search
- For each input node s, find the set of all
treelet pairs that match S and are rooted at s. - Move bottom up through the src dependency tree,
computing a list of possible tgt trees for each
src subtree. - When attaching one subtree to another, try all
possible permutations of children of root node.
39Definitions
40Exhaustive decoding algorithm
41Greedy ordering
- Too many permutations to consider in exhaustive
search. - In the greedy ordering
- Given a fixed pre- and post-modifier count, we
choose the best modifier for each position.
42Greedy ordering algorithm
43Numbers of candidatesconsidered at each node
- c of children specified in treelet pair
- r of subtrees needed to be attached.
- Exhaustive search (cr1)! / (c1)!
- Greedy search (cr)r2
44Dynamic Programming
- In string-based SMT, hyps for the same covered
src word vector - The last two target words in the hyp for LM
- List size is O(V2)
-
- In treelet translation, hyps for the same src
subtree - The head word for the order model
- The first two target words for LM
- The last two target words for LM
- List size is O(V5)
- DP does not allow for great saving because of the
context we have to keep.
45Duplicate elimination
- To eliminate unnecessary ordering operations,
they use a hash table to check whether an
unordered T has appeared before.
46Pruning
- Prune treelet pairs (before the search starts)
- Keep pairs whose MLE prob gt threshold
- Given a src treelet, keep those whose prob within
a ratio r of the best pair. - N-best lists
- Keep the N-best for each node in src dep tree.
47Outline
- Limitations of SMT and previous work
- Modeling and training
- Decoding
- Experiments
- Conclusion
48Setting
- Eng-Fr corpus of Microsoft technical data
- Eng parser (NLPWIN) rule-based in-house parser.
49Main results
Max phrase size 4
50Effect of max phrase size
51Effect of training set size
1K 3K 10K 30K 100K 300K
Pharaoh 17.20 22.51 27.70 33.73 38.83 42.75
Treelet 18.70 25.39 30.96 35.81 40.66 44.32
diff 1.50 2.88 3.26 2.08 1.83 1.57
52Effect of ordering strategies
53Effect of allowing discontiguous phrases
54Effect of optimization
55Conclusion
- Modeling
- Treelet translation
- Order model based on dependency structure
- Training
- Projecting tgt dependency tree using heuristics
- Learn treelet pairs
- Decoding
- Exhaustive search
- Greedy ordering
- Results better performance than SMT, specially
for small max phrase size.
56Advantages
- Over SMT
- Src phrases do not have to be contiguous n-grams.
- It can express linguistic generations.
- Over previous transfer-based approaches
- Treelets are more expressive than paths or
context-free rules.
57Discussion
- Projecting tgt dependency tree
- Reattachment how and why?
- Extracting treelet pairs
- How many subgraphs?
- Order model
- Decoding when hyps are extended, updating the
score is more complicated.