Title: Improvements in PhraseBased Statistical Machine Translation
1Improvements in Phrase-Based Statistical Machine
Translation
- Authors Richard Zens and Hermann Ney
- Presented by Ovidiu Fortu
2Outline
- Short introduction to machine translation
- The base model
- Additional information introduced in the model
- Evaluation and results
3Introduction
- Problem setting
- Notation a sentence in a given language is
represented as a sequence of words - e1I (e1, e2, e3, , eI)
- Usually use letter e for the target language, f
for source language - Fundamental equation
4Introduction, continued
- Why is it difficult?
- Hard to estimate the probabilities
- The argument maximization is done over a very
large space - Word sense ambiguity
- The spirit is strong, but the flesh is weak
- Translated in Russian and then back
- The vodka is good but the meat is rotten
5Fundamental equation
- There is no straightforward way to incorporate
additional dependencies - It gives optimal solutions only if the two models
are accurately estimated - Authors have found that
- gives similar results in practice (hard to
justify why)
6Direct Max Entropy Translation Model
- We try to estimate directly P(e1If1J) (allows
more efficient search) - We use a set of M feature functions, hm(e1I,f1J),
m1,,M - For each function we have a parameter ?m
- P(e1If1J) expS ?m hm(e1I,f1J)/Z(f1J), where
Z(f1J) is the normalization factor
7Direct Max Entropy Translation Model, continued
- Now the objective is to find
- For m2, ?11, ?21 and h1(e1I,f1J) p(e1I),
h2(e1I,f1J) p(e1If1J) we obtain
8Phrase-based Translation
- Translating word by word fails to take into
account context - Translating the whole phrase at once may prove
too difficult - The middle way split the sentences (in both
source and target) in phrases and translate the
phrases
9Translation model
- Assume we have a way to identify phrases
- Use the general model on phrases
- One to one phrase alignment
- Phrases are contiguous
- Introduce a hidden variable S as a way to give a
segmentation of pair (f1J e1I) in K phrases
10Translation model, continued
- The sum over all segmentations is approximated
with the maximum
11Translation model, continued
- Now, which phrase maps into which?
- Assume monotone translation, i.e. translates in
- Assume a zero order model at phrase level
- This leads to
12Translation model, estimation
- The computation reduces to estimating
- Which is done based on frequency
- N(f,e) instances where f is translated e
13Translation model, estimation(II)
- Using a bigram language model and Bayes decision
rule, the search criterion becomes - by considering constant
14Equivalence to the direct approach
- The previous formula fits the general format of
the direct model - We have designed a h-function of the general
model additional dependencies will be
incorporated in the model by adding other
h-functions
15Translation model, additional dependencies
- Word penalty
- This function penalizes long sentences
- Phrase penalty
- This one penalizes long phrases
16Probability estimation for long phrases
- Problem long phrases are rare, therefore the
relative frequencies tend to overestimate their
probabilities - Solution add a h-function
- Where
17Probability estimation for long phrases, continued
- We need an estimator for p(fe) (to compute hlex)
use smoothed relative frequencies - d is a positive discounting factor, is a
normalization constant and is for
backing-off
18The search
- The search space is the space of all possible
sequences e1I in the target language - Efficiency becomes problematic
- Simplifications the model was constructed
monotonic - Solution uses dynamic programming
19The search, continued
- Define Q(j,e)maximum probability of a phrase
sentence ending with e and covering positions 1j
of source - Q(J1,) probability of optimum translation (
marks the end of sentence) - This generates a DP matrix
20The search, recursion formula
Where M is a maximum phrase length in the source
language The worst case complexity is Ve is the
vocabulary size in target language, E is a
maximum for phrase translation candidates for a
source phrase
21Evaluation and Results
- Criteria
- WER (word error rate) minimum number of
insertions/deletions/substitutions required to
correct the sentence - PWE (position independent word error rate)
- BLEU (measures precision of n-grams up to 4th
order with respect to a reference translation)
larger scores are better - NIST similar to BLEU, weighted n-gram precision
22Results
23Questions?