Parameter%20estimate%20in%20IBM%20Models: - PowerPoint PPT Presentation

About This Presentation
Title:

Parameter%20estimate%20in%20IBM%20Models:

Description:

... word position to Eng word position: a(j)=i means that the fj is generated by ei. ... ei: the i-th Eng word. F: the Fr sentence: f1 ... fm. fj: the j-th Fr word. ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 28
Provided by: facultyWa8
Category:

less

Transcript and Presenter's Notes

Title: Parameter%20estimate%20in%20IBM%20Models:


1
Parameter estimate in IBM Models
  • Ling 572
  • Fei Xia
  • Week ??

2
Outline
  • IBM Model 1 Review (from LING571)
  • Word alignment
  • Modeling
  • Training formula
  • Formulae

3
IBM Model Basics
  • Classic paper Brown et. al. (1993)
  • Translation F ? E (or Fr ? Eng)
  • Resource required
  • Parallel data (a set of sentence pairs)
  • Main concepts
  • Source channel model
  • Hidden word alignment
  • EM training

4
Intuition
  • Sentence pairs word mapping is one-to-one.
  • (1) S a b c d e
  • T l m n o p
  • (2) S c a e
  • T p n m
  • (3) S d a c
  • T n p l
  • ? (b, o), (d, l), (e, m), and
  • (a, p), (c, n), or
  • (a, n), (c, p)

5
Source channel model for MT
P(E)
P(F E)
Fr sent
Eng sent
Noisy channel
  • Two types of parameters
  • Language model P(E)
  • Translation model P(F E)

6
Word alignment
  • a(j)i ? aj i
  • a (a1, , am)
  • Ex
  • F f1 f2 f3 f4 f5
  • E e1 e2 e3 e4
  • a43
  • a (0, 1, 1, 3, 2)

7
Word alignment
  • An alignment, a, is a function from Fr word
    position to Eng word position a(j)i means that
    the fj is generated by ei.
  • The constraint each fr word is generated by
    exactly one Eng word (including e0)

8
Modeling p(F E) with alignment
9
Notation
  • E the Eng sentence E e1 el
  • ei the i-th Eng word.
  • F the Fr sentence f1 fm
  • fj the j-th Fr word.
  • e0 the Eng NULL word
  • F0 the Fr NULL word.
  • aj the position of Eng word that generates
    fj.

10
Notation (cont)
  • l Eng sent leng
  • m Fr sent leng
  • i Eng word position
  • j Fr word position
  • e an Eng word
  • f a Fr word

11
Generative process
  • To generate F from E
  • Pick a length m for F, with prob P(m l)
  • Choose an alignment a, with prob P(a E, m)
  • Generate Fr sent given the Eng sent and the
    alignment, with prob P(F E, a, m).
  • Another way to look at it
  • Pick a length m for F, with prob P(m l).
  • For j1 to m
  • Pick an Eng word index aj, with prob P(aj j, m,
    l).
  • Pick a Fr word fj according to the Eng word ei,
    where ajI, with prob P(fj ei ).

12
Decomposition
13
Approximation
  • Fr sent length depends only on Eng sent length
  • Fr word depends only on the Eng word that
    generates it
  • Estimating P(a E, m) All alignments are
    equally likely

14
Decomposition
15
Final formula and parameters for Model 1
  • Two types of parameters
  • Length prob P(m l)
  • Translation prob P(fj ei), or t(fj ei),

16
Training
  • Mathematically motivated
  • Having an objective function to optimize
  • Using several clever tricks
  • The resulting formulae
  • are intuitively expected
  • can be calculated efficiently
  • EM algorithm
  • Hill climbing, and each iteration guarantees to
    improve objective function
  • It does not guaranteed to reach global optimal.

17
Training Fractional counts
  • Let Ct(f, e) be the fractional count of (f, e)
    pair in the training data, given alignment prob
    P.

Alignment prob
Actual count of times e and f are linked in
(E,F) by alignment a
18
Estimating P(aE,F)
  • We could list all the alignments, and estimate
    P(a E, F).

19
Formulae so far
? New estimate for t(fe)
20
The algorithm
  • Start with an initial estimate of t(f e) e.g.,
    uniform distribution
  • Calculate P(a F, E)
  • Calculate Ct (f, e), Normalize to get t(fe)
  • Repeat Steps 2-3 until the improvement is too
    small.

21
No need to enumerate all word alignments
  • Luckily, for Model 1, there is a way to calculate
    Ct(f, e) efficiently.

22
The algorithm
  • Start with an initial estimate of t(f e) e.g.,
    uniform distribution
  • Calculate P(a F, E)
  • Calculate Ct (f, e), Normalize to get t(fe)
  • Repeat Steps 2-3 until the improvement is too
    small.

23
Summary of Model 1
  • Modeling
  • Pick the length of F with prob P(m l).
  • For each position j
  • Pick an English word position aj, with prob P(aj
    j, m, l).
  • Pick a Fr word fj according to the Eng word ei,
    with t(fj ei), where iaj
  • The resulting formula can be calculated
    efficiently.
  • Training EM algorithm. The update can be done
    efficiently.
  • Finding the best alignment can be easily done.

24
New stuff
25
EM algorithm
  • EM expectation maximization
  • In a model with hidden states (e.g., word
    alignment), how can we estimate model parameters?
  • EM does the following
  • E-step Take an initial model parameterization
    and calculate the expected values of the hidden
    data.
  • M-step Use the expected values to maximize the
    likelihood of the training data.

26
Objective function
27
Training Summary
  • Mathematically motivated
  • Having an objective function to optimize
  • Using several clever tricks
  • The resulting formulae
  • are intuitively expected
  • can be calculated efficiently
  • EM algorithm
  • Hill climbing, and each iteration guarantees to
    improve objective function
  • It does not guaranteed to reach global optimal.
Write a Comment
User Comments (0)
About PowerShow.com