Improvements in PhraseBased Statistical Machine Translation

1 / 23
About This Presentation
Title:

Improvements in PhraseBased Statistical Machine Translation

Description:

Improvements in Phrase-Based Statistical Machine Translation ... Q(J 1,$) = probability of optimum translation ($ marks the end of sentence) ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 24
Provided by: ovidiu4

less

Transcript and Presenter's Notes

Title: Improvements in PhraseBased Statistical Machine Translation


1
Improvements in Phrase-Based Statistical Machine
Translation
  • Authors Richard Zens and Hermann Ney
  • Presented by Ovidiu Fortu

2
Outline
  • Short introduction to machine translation
  • The base model
  • Additional information introduced in the model
  • Evaluation and results

3
Introduction
  • Problem setting
  • Notation a sentence in a given language is
    represented as a sequence of words
  • e1I (e1, e2, e3, , eI)
  • Usually use letter e for the target language, f
    for source language
  • Fundamental equation

4
Introduction, continued
  • Why is it difficult?
  • Hard to estimate the probabilities
  • The argument maximization is done over a very
    large space
  • Word sense ambiguity
  • The spirit is strong, but the flesh is weak
  • Translated in Russian and then back
  • The vodka is good but the meat is rotten

5
Fundamental equation
  • There is no straightforward way to incorporate
    additional dependencies
  • It gives optimal solutions only if the two models
    are accurately estimated
  • Authors have found that
  • gives similar results in practice (hard to
    justify why)

6
Direct Max Entropy Translation Model
  • We try to estimate directly P(e1If1J) (allows
    more efficient search)
  • We use a set of M feature functions, hm(e1I,f1J),
    m1,,M
  • For each function we have a parameter ?m
  • P(e1If1J) expS ?m hm(e1I,f1J)/Z(f1J), where
    Z(f1J) is the normalization factor

7
Direct Max Entropy Translation Model, continued
  • Now the objective is to find
  • For m2, ?11, ?21 and h1(e1I,f1J) p(e1I),
    h2(e1I,f1J) p(e1If1J) we obtain

8
Phrase-based Translation
  • Translating word by word fails to take into
    account context
  • Translating the whole phrase at once may prove
    too difficult
  • The middle way split the sentences (in both
    source and target) in phrases and translate the
    phrases

9
Translation model
  • Assume we have a way to identify phrases
  • Use the general model on phrases
  • One to one phrase alignment
  • Phrases are contiguous
  • Introduce a hidden variable S as a way to give a
    segmentation of pair (f1J e1I) in K phrases

10
Translation model, continued
  • The sum over all segmentations is approximated
    with the maximum

11
Translation model, continued
  • Now, which phrase maps into which?
  • Assume monotone translation, i.e. translates in
  • Assume a zero order model at phrase level
  • This leads to

12
Translation model, estimation
  • The computation reduces to estimating
  • Which is done based on frequency
  • N(f,e) instances where f is translated e

13
Translation model, estimation(II)
  • Using a bigram language model and Bayes decision
    rule, the search criterion becomes
  • by considering constant

14
Equivalence to the direct approach
  • The previous formula fits the general format of
    the direct model
  • We have designed a h-function of the general
    model additional dependencies will be
    incorporated in the model by adding other
    h-functions

15
Translation model, additional dependencies
  • Word penalty
  • This function penalizes long sentences
  • Phrase penalty
  • This one penalizes long phrases

16
Probability estimation for long phrases
  • Problem long phrases are rare, therefore the
    relative frequencies tend to overestimate their
    probabilities
  • Solution add a h-function
  • Where

17
Probability estimation for long phrases, continued
  • We need an estimator for p(fe) (to compute hlex)
    use smoothed relative frequencies
  • d is a positive discounting factor, is a
    normalization constant and is for
    backing-off

18
The search
  • The search space is the space of all possible
    sequences e1I in the target language
  • Efficiency becomes problematic
  • Simplifications the model was constructed
    monotonic
  • Solution uses dynamic programming

19
The search, continued
  • Define Q(j,e)maximum probability of a phrase
    sentence ending with e and covering positions 1j
    of source
  • Q(J1,) probability of optimum translation (
    marks the end of sentence)
  • This generates a DP matrix

20
The search, recursion formula
Where M is a maximum phrase length in the source
language The worst case complexity is Ve is the
vocabulary size in target language, E is a
maximum for phrase translation candidates for a
source phrase
21
Evaluation and Results
  • Criteria
  • WER (word error rate) minimum number of
    insertions/deletions/substitutions required to
    correct the sentence
  • PWE (position independent word error rate)
  • BLEU (measures precision of n-grams up to 4th
    order with respect to a reference translation)
    larger scores are better
  • NIST similar to BLEU, weighted n-gram precision

22
Results
23
Questions?
Write a Comment
User Comments (0)