Improvements in PhraseBased Statistical Machine Translation

About This Presentation

Title:

Improvements in PhraseBased Statistical Machine Translation

Description:

Improvements in Phrase-Based Statistical Machine Translation ... Q(J 1,$) = probability of optimum translation ($ marks the end of sentence) ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 24

Provided by: ovidiu4

more less

Transcript and Presenter's Notes

Title: Improvements in PhraseBased Statistical Machine Translation

1
Improvements in Phrase-Based Statistical Machine
Translation

Authors Richard Zens and Hermann Ney
Presented by Ovidiu Fortu

2
Outline

Short introduction to machine translation
The base model
Additional information introduced in the model
Evaluation and results

3
Introduction

Problem setting
Notation a sentence in a given language is
represented as a sequence of words
e1I (e1, e2, e3, , eI)
Usually use letter e for the target language, f
for source language
Fundamental equation

4
Introduction, continued

Why is it difficult?
Hard to estimate the probabilities
The argument maximization is done over a very
large space
Word sense ambiguity
The spirit is strong, but the flesh is weak
Translated in Russian and then back
The vodka is good but the meat is rotten

5
Fundamental equation

There is no straightforward way to incorporate
additional dependencies
It gives optimal solutions only if the two models
are accurately estimated
Authors have found that
gives similar results in practice (hard to
justify why)

6
Direct Max Entropy Translation Model

We try to estimate directly P(e1If1J) (allows
more efficient search)
We use a set of M feature functions, hm(e1I,f1J),
m1,,M
For each function we have a parameter ?m
P(e1If1J) expS ?m hm(e1I,f1J)/Z(f1J), where
Z(f1J) is the normalization factor

7
Direct Max Entropy Translation Model, continued

Now the objective is to find
For m2, ?11, ?21 and h1(e1I,f1J) p(e1I),
h2(e1I,f1J) p(e1If1J) we obtain

8
Phrase-based Translation

Translating word by word fails to take into
account context
Translating the whole phrase at once may prove
too difficult
The middle way split the sentences (in both
source and target) in phrases and translate the
phrases

9
Translation model

Assume we have a way to identify phrases
Use the general model on phrases
One to one phrase alignment
Phrases are contiguous
Introduce a hidden variable S as a way to give a
segmentation of pair (f1J e1I) in K phrases

10
Translation model, continued

The sum over all segmentations is approximated
with the maximum

11
Translation model, continued

Now, which phrase maps into which?
Assume monotone translation, i.e. translates in
Assume a zero order model at phrase level
This leads to

12
Translation model, estimation

The computation reduces to estimating
Which is done based on frequency
N(f,e) instances where f is translated e

13
Translation model, estimation(II)

Using a bigram language model and Bayes decision
rule, the search criterion becomes
by considering constant

14
Equivalence to the direct approach

The previous formula fits the general format of
the direct model
We have designed a h-function of the general
model additional dependencies will be
incorporated in the model by adding other
h-functions

15
Translation model, additional dependencies

Word penalty
This function penalizes long sentences
Phrase penalty
This one penalizes long phrases

16
Probability estimation for long phrases

Problem long phrases are rare, therefore the
relative frequencies tend to overestimate their
probabilities
Solution add a h-function
Where

17
Probability estimation for long phrases, continued

We need an estimator for p(fe) (to compute hlex)
use smoothed relative frequencies
d is a positive discounting factor, is a
normalization constant and is for
backing-off

18
The search

The search space is the space of all possible
sequences e1I in the target language
Efficiency becomes problematic
Simplifications the model was constructed
monotonic
Solution uses dynamic programming

19
The search, continued

Define Q(j,e)maximum probability of a phrase
sentence ending with e and covering positions 1j
of source
Q(J1,) probability of optimum translation (
marks the end of sentence)
This generates a DP matrix

20
The search, recursion formula
Where M is a maximum phrase length in the source
language The worst case complexity is Ve is the
vocabulary size in target language, E is a
maximum for phrase translation candidates for a
source phrase
21
Evaluation and Results

Criteria
WER (word error rate) minimum number of
insertions/deletions/substitutions required to
correct the sentence
PWE (position independent word error rate)
BLEU (measures precision of n-grams up to 4th
order with respect to a reference translation)
larger scores are better
NIST similar to BLEU, weighted n-gram precision

22
Results
23
Questions?

Write a Comment

User Comments (0)