Syntax for MT

About This Presentation

Title:

Syntax for MT

Description:

Title: Syntax for MT Author: Angust Faradin Last modified by: Angust Faradin Created Date: 1/31/2006 9:58:18 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:121

Avg rating:3.0/5.0

Slides: 45

Provided by: AngustF6

Category:

more less

Transcript and Presenter's Notes

Title: Syntax for MT

1
Syntax for MT

EECS 767
Feb. 1, 2006

2
Outline

Motivation
Syntax-based translation model
Formalization
Training
Using syntax in MT
Using multiple features
Syntax-based features

3
The IBM Models

Word reordering
Single words, not groups
Conditioned on position of words
Null-word insertion
Uniform across position

4
The Alignment Template Model

Word Reordering
Phrases can be reordered in any way, but tend to
stay in same order as source.
Reordering within phrases defined by templates
Word Translations
Must match up No null

5
Implied Assumptions

Word Order
Similar to source sentence
Translation
Near 1-1 correspondence

6
What goes wrong?

We see many errors in machine translation when we
only look at the word level
Missing content words
MT Condemns US interference in its internal
affairs.
Human Ukraine condemns US interference in its
internal affairs.
Verb phrase
MT Indonesia that oppose the presence of foreign
troops.
Human Indonesia reiterated its opposition to
foreign military presence.

WS 2003 Syntax for Statistical Machine
Translation Final Presentation
7
What goes wrong cont.

Wrong dependencies
MT , particularly those who cheat the audience
the players.
Human , particularly those players who cheat
the audience.
Missing articles
MT , he is fully able to activate team.
Human , he is fully able to activate the team.

WS 2003 Syntax for Statistical Machine
Translation Final Presentation
8
What goes wrong cont.

Word salad
the world arena on top of the u . s . sampla
competitors , and since mid july has not
appeared in sports field , the wounds heal go
back to the situation is very good , less than a
half hours in the same score to eliminate 62 in
light of the south african athletes to the second
round .

WS 2003 Syntax for Statistical Machine
Translation Final Presentation
9
How can we improve?

Relying on language model to produce more
accurate sentences is not enough
Many of the problems can be considered
syntactic
Perhaps MT-systems dont know enough about what
is important to people
So, include syntax into MT
Build a model around syntax
Include syntax-based features in a model

WS 2003 Syntax for Statistical Machine
Translation Final Presentation
10
A New Translation Story

You have a sentence and its parse tree
The children at each node in the tree are
rearranged
New nodes may be inserted before or after a child
node
These new nodes are assigned a translation
Each of the leaf lexical nodes is then translated

Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
11
A Syntax-based model

Assume word order is based on a reordering of
source syntax tree.
Assume null-generated words happen at syntactical
boundaries.
(For now) Assume a word translates into a single
word.

Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
12
Reorder
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
13
Insert
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
14
Translate
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
15
Parameters

Reorder (R) child node reordering
Can take any possible child node reordering
Defines word order in translation sentence
Conditioned on original child node order
Only applies to non-leaf nodes

Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
16
Parameters cont.

Insertion (N) placement and translation
Left, right, or none
Defines word to be inserted
Place conditioned on current and parent labels
Word choice is unconditioned

Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
17
Parameters cont.

Translation (T) 1 to 1
Conditioned only on source word
Can take on null
Translation (T) N to N
Consider word fertility (for 1-to-N mapping)
Consider phrase translation at each node
Limit size of possible phrases
Mix phrasal w/ word-to-word translation

Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
18
Formalization
Set of nodes in parse tree
Total probability
Assume node independence
Assume random variables are Independent of one
another and only dependent on certain features
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
19
Training (EM)

Initialize all probability tables (uniform)
Reset all counters
For each pair in the training corpus
Try all possible mappings of N,R, and T
Update the counts as seen in the mappings
Normalize the probability tables with the new
counts
Repeat 2-4 several times

Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
20
Decoding

Modify original CFG with new reordering and their
probabilities
Add in VP-gtVP X and X -gt word rules from N
Add lexical rules englishWord-gtforeignWord
Use the noisy-channel approach starting with a
translated sentence
Proceed through the parse tree using a bottom-up
beam search keeping an N-best list of good
partial translations for each subtree

YamadaKnight A Decoder for Syntax-based
Statistical MT 2002
21
Decoding cont.
YamadaKnight A Decoder for Syntax-based
Statistical MT 2002
22
Performance (Alignment)
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
23
Performance (Alignment) cont.

Counting number of individual alignments
Perfect means all alignments in a pair are correct

Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
24
Performance cont.

Chinese-English BLEU scores

YamadaKnight A Decoder for Syntax-based
Statistical MT 2002
25
Do we need the entire model to be based on syntax?

Good performance increase
Large computational cost
Many permutations to CFG rules (120K non-lexical)
How about trying something else?
Add syntax-based features that look for more
specific things

26
Using Syntax in MT

Multiple Features
Formalization
Baseline
Training
Syntax-based Features
Shallow
Deep

27
Multiple Features (log-linear)
Calculate probability using a variety of features
parameterized by an associated weight
Find the translated sentence which maximizes the
feature function with your foreign sentence
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
28
Baseline System
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
29
Baseline System
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
30
Baseline Features

Alignment template feature
Uses simple counts

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
31
Baseline Features

Word selection feature
Uses lexicon probability estimated by relative
frequency

Additional feature capturing word reordering
within phrasal alignments
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
32
Baseline Features

Phrase alignment feature
Measure of deviation from monotone alignment

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
33
Baseline Features

Language model feature
Standard backing-off trigram probability
Word/Phrase penalty feature
Feature counting number of words in translated
sentence
Feature counting number of phrases in translated
sentence
Alignment lexicon feature
Feature counting the number of time something
from a given alignment lexicon is used

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
34
A possible training method

Line optimization method

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
35
Use reranking of N-best lists

Feature functions do not need to be integrated in
dynamic programming search
A feature function can arbitrarily condition
itself on any part of English/Chinese
sentece/parse tree/chunks
Provides a simple software architecture
Using a fixed set of translations allows feature
functions to be a vector of numbers
You are limited to improvements you see within
the N-best lists

WS 2003 Syntax for Statistical Machine
Translation Final Presentation
36
Syntax-based Features

Shallow
POS and Chunk Tag counts
Projected POS language model
Deep
Tree-to-string
Tree-to-tree
Verb arguments

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
37
Shallow Syntax-Based Features

POS and chunk tag count
Low-level syntactic problems with baseline
system. Too many articles, commas and singular
nouns. Too few pronouns, past tense verbs, and
plural nouns.
Reranker can learn balanced distributions of tags
from various features
Examples
Number of NPs in English
Difference in number of NPs between English and
Chinese
Number of Chinese N tags translated to only non-N
tags in English.

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
38
Shallow Syntax-Based Features

Projected POS language model
Use word-level alignments to project Chinese POS
tags onto the English words
Possibly keeping relative position within Chinese
phrase
Possibly keeping NULLs in POS sequence
Possibly using lexicalized NULLs from English
word
Use the POS tags to train a language model based
on POS N-grams

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
39
Deep Syntax-Based Features

Tree to string
Uses the Syntax-based model we saw previously
Reduces computational cost by limiting size of
reorderings
Add in a feature for probability as defined by
the model and the probability of the viterbi
alignment defined by the model

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
40
Deep Syntax-Based Features

Tree to Tree
Uses tree transformation functions similar to
those in the tree-to-string model
The probability of transforming a source tree
into a target tree is modeled as a sequence of
steps starting from the root of the target tree
down.

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
41
Tree to Tree cont.

At each level of the tree
At most one of the current nodes children is
grouped with the current node into a single
elementary tree with its probability conditioned
on the current node and its children.
An alignment of the children of the current
elementary tree is chosen with its probability
conditioned on the current node an the children
of child in the elementary tree. This is similar
to the reorder operation in the tree-to-string
model, but allows for node addition and removal.
Leaf-level parameters are ignored when
calculating probability of tree-to-tree.

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
42
Verb Arguments

Idea A feature that counts the difference in the
number of arguments to the main verb between the
Chinese and English sentences
Perform a breadth-first search traversal of the
dependency trees
Mark the first verb encountered as the main verb
The number of arguments is equal to the number of
its children

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
43
Performance

Some things helped, some things didnt
Is syntax useful? Necessary?

44
References

K. Yamada and K. Knight. 2001. A syntax-based
statistical translation model. In ACL-01.
K. Yamada. 2002. A Syntax-Based Statistical
Translation Model. Ph.D. thesis, University of
Southern California.
Yamada, Kenji and Kevin Knight. 2002. A decoder
for syntaxbased MT. In Proc. of the 40th Annual
Meeting of the Association for Computational
Linguistics (ACL), Philadelphia, PA.
Franz Josef Och, Daniel Gildea, Sanjeev
Khudanpur, Anoop Sarkar, Kenji Yamada, Alex
Fraser, Shankar Kumar, Libin Shen, David Smith,
Katherine Eng, Viren Jain, Zhen Jin, and Dragomir
Radev. A smorgasbord of features for statistical
machine translation. In Proceedings of the Human
Language Technology Conference.North American
chapter of the Association for Computational
Linguistics Annual Meeting, pages 161-168, 2004.
MIT Press.
Franz Josef Och, Daniel Gildea, Sanjeev
Khudanpur, Anoop Sarkar, Kenji Yamada, Alex
Fraser, Shankar Kumar, Libin Shen, David Smith,
Katherine Eng, Viren Jain, Zhen Jin, and Dragomir
Radev. Final Report of the Johns Hopkins 2003
summer workshop on Syntax for Statistical Machine
Translation.
Philipp Koehn, Franz Josef Och, and Daniel Marcu.
Statistical phrase-based translation. In
Proceedings of the Human Language Technology
Conference/North American Chapter of the
Association for Computational Linguistics Annual
Meeting, 2003. MIT Press.