Title: Syntax for MT
1Syntax for MT
2Outline
- Motivation
- Syntax-based translation model
- Formalization
- Training
- Using syntax in MT
- Using multiple features
- Syntax-based features
3The IBM Models
- Word reordering
- Single words, not groups
- Conditioned on position of words
- Null-word insertion
- Uniform across position
4The Alignment Template Model
- Word Reordering
- Phrases can be reordered in any way, but tend to
stay in same order as source. - Reordering within phrases defined by templates
- Word Translations
- Must match up No null
5Implied Assumptions
- Word Order
- Similar to source sentence
- Translation
- Near 1-1 correspondence
6What goes wrong?
- We see many errors in machine translation when we
only look at the word level - Missing content words
- MT Condemns US interference in its internal
affairs. - Human Ukraine condemns US interference in its
internal affairs. - Verb phrase
- MT Indonesia that oppose the presence of foreign
troops. - Human Indonesia reiterated its opposition to
foreign military presence.
WS 2003 Syntax for Statistical Machine
Translation Final Presentation
7What goes wrong cont.
- Wrong dependencies
- MT , particularly those who cheat the audience
the players. - Human , particularly those players who cheat
the audience. - Missing articles
- MT , he is fully able to activate team.
- Human , he is fully able to activate the team.
WS 2003 Syntax for Statistical Machine
Translation Final Presentation
8What goes wrong cont.
- Word salad
- the world arena on top of the u . s . sampla
competitors , and since mid july has not
appeared in sports field , the wounds heal go
back to the situation is very good , less than a
half hours in the same score to eliminate 62 in
light of the south african athletes to the second
round .
WS 2003 Syntax for Statistical Machine
Translation Final Presentation
9How can we improve?
- Relying on language model to produce more
accurate sentences is not enough - Many of the problems can be considered
syntactic - Perhaps MT-systems dont know enough about what
is important to people - So, include syntax into MT
- Build a model around syntax
- Include syntax-based features in a model
WS 2003 Syntax for Statistical Machine
Translation Final Presentation
10A New Translation Story
- You have a sentence and its parse tree
- The children at each node in the tree are
rearranged - New nodes may be inserted before or after a child
node - These new nodes are assigned a translation
- Each of the leaf lexical nodes is then translated
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
11A Syntax-based model
- Assume word order is based on a reordering of
source syntax tree. - Assume null-generated words happen at syntactical
boundaries. - (For now) Assume a word translates into a single
word.
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
12Reorder
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
13Insert
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
14Translate
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
15Parameters
- Reorder (R) child node reordering
- Can take any possible child node reordering
- Defines word order in translation sentence
- Conditioned on original child node order
- Only applies to non-leaf nodes
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
16Parameters cont.
- Insertion (N) placement and translation
- Left, right, or none
- Defines word to be inserted
- Place conditioned on current and parent labels
- Word choice is unconditioned
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
17Parameters cont.
- Translation (T) 1 to 1
- Conditioned only on source word
- Can take on null
- Translation (T) N to N
- Consider word fertility (for 1-to-N mapping)
- Consider phrase translation at each node
- Limit size of possible phrases
- Mix phrasal w/ word-to-word translation
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
18Formalization
Set of nodes in parse tree
Total probability
Assume node independence
Assume random variables are Independent of one
another and only dependent on certain features
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
19Training (EM)
- Initialize all probability tables (uniform)
- Reset all counters
- For each pair in the training corpus
- Try all possible mappings of N,R, and T
- Update the counts as seen in the mappings
- Normalize the probability tables with the new
counts - Repeat 2-4 several times
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
20Decoding
- Modify original CFG with new reordering and their
probabilities - Add in VP-VP X and X - word rules from N
- Add lexical rules englishWord-foreignWord
- Use the noisy-channel approach starting with a
translated sentence - Proceed through the parse tree using a bottom-up
beam search keeping an N-best list of good
partial translations for each subtree
YamadaKnight A Decoder for Syntax-based
Statistical MT 2002
21Decoding cont.
YamadaKnight A Decoder for Syntax-based
Statistical MT 2002
22Performance (Alignment)
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
23Performance (Alignment) cont.
- Counting number of individual alignments
- Perfect means all alignments in a pair are correct
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
24Performance cont.
- Chinese-English BLEU scores
YamadaKnight A Decoder for Syntax-based
Statistical MT 2002
25Do we need the entire model to be based on syntax?
- Good performance increase
- Large computational cost
- Many permutations to CFG rules (120K non-lexical)
- How about trying something else?
- Add syntax-based features that look for more
specific things
26Using Syntax in MT
- Multiple Features
- Formalization
- Baseline
- Training
- Syntax-based Features
- Shallow
- Deep
27Multiple Features (log-linear)
Calculate probability using a variety of features
parameterized by an associated weight
Find the translated sentence which maximizes the
feature function with your foreign sentence
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
28Baseline System
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
29Baseline System
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
30Baseline Features
- Alignment template feature
- Uses simple counts
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
31Baseline Features
- Word selection feature
- Uses lexicon probability estimated by relative
frequency
Additional feature capturing word reordering
within phrasal alignments
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
32Baseline Features
- Phrase alignment feature
- Measure of deviation from monotone alignment
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
33Baseline Features
- Language model feature
- Standard backing-off trigram probability
- Word/Phrase penalty feature
- Feature counting number of words in translated
sentence - Feature counting number of phrases in translated
sentence - Alignment lexicon feature
- Feature counting the number of time something
from a given alignment lexicon is used
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
34A possible training method
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
35Use reranking of N-best lists
- Feature functions do not need to be integrated in
dynamic programming search - A feature function can arbitrarily condition
itself on any part of English/Chinese
sentece/parse tree/chunks - Provides a simple software architecture
- Using a fixed set of translations allows feature
functions to be a vector of numbers - You are limited to improvements you see within
the N-best lists
WS 2003 Syntax for Statistical Machine
Translation Final Presentation
36Syntax-based Features
- Shallow
- POS and Chunk Tag counts
- Projected POS language model
- Deep
- Tree-to-string
- Tree-to-tree
- Verb arguments
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
37Shallow Syntax-Based Features
- POS and chunk tag count
- Low-level syntactic problems with baseline
system. Too many articles, commas and singular
nouns. Too few pronouns, past tense verbs, and
plural nouns. - Reranker can learn balanced distributions of tags
from various features - Examples
- Number of NPs in English
- Difference in number of NPs between English and
Chinese - Number of Chinese N tags translated to only non-N
tags in English.
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
38Shallow Syntax-Based Features
- Projected POS language model
- Use word-level alignments to project Chinese POS
tags onto the English words - Possibly keeping relative position within Chinese
phrase - Possibly keeping NULLs in POS sequence
- Possibly using lexicalized NULLs from English
word - Use the POS tags to train a language model based
on POS N-grams
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
39Deep Syntax-Based Features
- Tree to string
- Uses the Syntax-based model we saw previously
- Reduces computational cost by limiting size of
reorderings - Add in a feature for probability as defined by
the model and the probability of the viterbi
alignment defined by the model
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
40Deep Syntax-Based Features
- Tree to Tree
- Uses tree transformation functions similar to
those in the tree-to-string model - The probability of transforming a source tree
into a target tree is modeled as a sequence of
steps starting from the root of the target tree
down.
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
41Tree to Tree cont.
- At each level of the tree
- At most one of the current nodes children is
grouped with the current node into a single
elementary tree with its probability conditioned
on the current node and its children. - An alignment of the children of the current
elementary tree is chosen with its probability
conditioned on the current node an the children
of child in the elementary tree. This is similar
to the reorder operation in the tree-to-string
model, but allows for node addition and removal. - Leaf-level parameters are ignored when
calculating probability of tree-to-tree.
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
42Verb Arguments
- Idea A feature that counts the difference in the
number of arguments to the main verb between the
Chinese and English sentences - Perform a breadth-first search traversal of the
dependency trees - Mark the first verb encountered as the main verb
- The number of arguments is equal to the number of
its children
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
43Performance
- Some things helped, some things didnt
- Is syntax useful? Necessary?
44References
- K. Yamada and K. Knight. 2001. A syntax-based
statistical translation model. In ACL-01. - K. Yamada. 2002. A Syntax-Based Statistical
Translation Model. Ph.D. thesis, University of
Southern California. - Yamada, Kenji and Kevin Knight. 2002. A decoder
for syntaxbased MT. In Proc. of the 40th Annual
Meeting of the Association for Computational
Linguistics (ACL), Philadelphia, PA. - Franz Josef Och, Daniel Gildea, Sanjeev
Khudanpur, Anoop Sarkar, Kenji Yamada, Alex
Fraser, Shankar Kumar, Libin Shen, David Smith,
Katherine Eng, Viren Jain, Zhen Jin, and Dragomir
Radev. A smorgasbord of features for statistical
machine translation. In Proceedings of the Human
Language Technology Conference.North American
chapter of the Association for Computational
Linguistics Annual Meeting, pages 161-168, 2004.
MIT Press. - Franz Josef Och, Daniel Gildea, Sanjeev
Khudanpur, Anoop Sarkar, Kenji Yamada, Alex
Fraser, Shankar Kumar, Libin Shen, David Smith,
Katherine Eng, Viren Jain, Zhen Jin, and Dragomir
Radev. Final Report of the Johns Hopkins 2003
summer workshop on Syntax for Statistical Machine
Translation. - Philipp Koehn, Franz Josef Och, and Daniel Marcu.
Statistical phrase-based translation. In
Proceedings of the Human Language Technology
Conference/North American Chapter of the
Association for Computational Linguistics Annual
Meeting, 2003. MIT Press.