Title: Machine Translation and MT tools: Giza and Moses
1Machine Translation and MT tools Giza and Moses
2Outline
- Problem statement in SMT
- Translation models
- Using Giza and Moses
3Introduction to SMT
- Given a sentence in foreign language F, find most
appropriate translation in English E - P(FE) Translation model
- P(E) Language model
4The Generation Process
- Partition Think of all possible partitions of
the source language - Lexicalization For a give partition, translate
each phrase into the foreign language - Reordering permute the set of all foreign words
- words possibly moving across phrase boundaries - We need the notion of alignment to better explain
mathematic behind the generation process
5Alignment
6Word-based alignment
- For each word in source language, align words
from target language that this word possibly
produces - Based on IBM models 1-5
- Model 1 simplest
- As we go from models 1 to 5, models get more
complex but more realistic - This is all that Giza does
7Alignment
- A function from target position to source
position
The alignment sequence is 2,3,4,5,6,6,6 Alignment
function A A(1) 2, A(2) 3 .. A different
alignment function will give the
sequence1,2,1,2,3,4,3,4 for A(1), A(2)..
To allow spurious insertion, allow alignment with
word 0 (NULL) No. of possible alignments (I1)J
8IBM Model 1 Generative Process
9IBM Model 1 Details
- No assumptions. Above formula is exact.
- Choosing length P(JE) P(JE,I) P(JI)
- Choosing Alignment all alignments equiprobable
-
- Translation Probability
10Training Alignment Models
- Given a parallel corpora, for each (F,E) learn
the best alignment A and the component
probabilities - t(fe) for Model 1
- lexicon probability P(fe) and alignment
probability P(aiai-1,I) - How to compute these probabilities if all you
have is a parallel corpora
11Intuition Interdependence of Probabilities
- If you knew which words are probable translation
of each other then you can guess which alignment
is probable and which one is improbable - If you were given alignments with probabilities
then you can compute translation probabilities - Looks like a chicken and egg problem
- EM algorithm comes to the rescue
12Expectation Maximization (EM) Algorithm
- Used when we want maximum likelihood estimate of
the parameters of a model when the model depends
on hidden variables - In present case, parameters are Translation
Probabilities, and hidden Variables are alignment
probabilities
- Init Start with an arbitrary estimate of
parameters - E-step compute the expected value of hidden
variables - M-Step Recompute the parameters that maximize
the likelihood of data given the expected value
of the hidden variables from E-step
13Example of EM Algorithm
Green house Casa verde
The house La case
Init Assume that any word can generate any word
with equal prob
P(lahouse) 1/3
14E-Step
E-Step
15M-Step
16E-Step again
1/3
2/3
2/3
1/3
Repeat till convergence
17Limitation Only 1-gtMany Alignments allowed
18Phrase-based alignment
- More natural
- Many-to-one mappings allowed
19Generating Bi-directional Alignments
- Existing models only generate uni-directional
alignments - Combine two uni-directional alignments to get
many-to-many bi-directional alignments
20Hindi-Eng Alignment
????????? ?? ??? ???? ?? ?????? ??????-???? ?????? ??
Goa
is
a
premier
beach
vacation
destination
21Eng-Hindi Alignment
????????? ?? ??? ???? ?? ?????? ??????-???? ?????? ??
Goa
is
a
premier
beach
vacation
destination
22Combining Alignments
????????? ?? ??? ???? ?? ?????? ??????-???? ?????? ??
Goa
is
a
premier
beach
vacation
destination
P4/5.8,R4/7.6
P2/3.67, R2/7.3
P5/6.83,R5/7.7
P6/9.67,R6/7.85
23A Different Heuristic from Moses-Site
GROW-DIAG-FINAL(e2f,f2e) neighboring
((-1,0),(0,-1),(1,0),(0,1),(-1,-1),(-1,1),(1,-1),(
1,1)) alignment intersect(e2f,f2e)
GROW-DIAG() FINAL(e2f) FINAL(f2e)
GROW-DIAG() iterate until no new points added
for english word e 0 ... en for
foreign word f 0 ... fn if ( e aligned
with f ) for each neighboring point (
e-new, f-new ) if (( e-new, f-new )
in union( e2f, f2e ) and ( e-new not aligned
and f-new not aligned )) add
alignment point ( e-new, f-new ) FINAL(a) for
english word e-new 0 ... en for foreign
word f-new 0 ... fn if ( ( ( e-new,
f-new ) in alignment a) and ( e-new not aligned
or f-new not aligned ) ) add alignment
point ( e-new, f-new )
Proposed Changes After growing diagonal Align
the shorter sentence first And use alignments
only from corresponding directional alignment
24Generating Phrase Alignments
????????? ?? ??? ???? ?? ?????? ??????-???? ?????? ??
Goa
is
a
premier
beach
vacation
destination
premier beach vacation ?????? ??????-????
a premier beach vacation destination ?? ??????
??????-???? ?????? ??
25Using Moses and Giza
- Refer to http//www.statmt.org/moses_steps.html
26Steps
- Install all packages in Moses
- Input - sentence aligned parallel corpus
- Training
- Tuning
- Generate output on test corpus (decoding)
27Example
- train.pr
- hh eh l ow
- hh ah l ow
- w er l d
- k aa m p aw n d w er d
- hh ay f ah n ey t ih d
- ow eh n iy
- b uw m
- k w iy z l ah b aa t ah r
- train.en
- h e l l o
- h e l l o
- w o r l d
- c o m p o u n d w o r d
- h y p h e n a t e d
- o n e
- b o o m
- k w e e z l e b o t t e r
28Sample from Phrase-table
- b o b aa (0) (1) (0) (1) 1
0.666667 1 0.181818 2.718 - b b (0) (0) 1 1 1 1 2.718
- c o m p o aa m p (2) (0,1) (1) (0) (1)
(1,3) (1,2,4) (0) 1 0.0486111 1 0.154959
2.718 - c p (0) (0) 1 1 1 1 2.718
- d w d w (0) (1) (0) (1) 1 0.75 1
1 2.718 - d d (0) (0) 1 1 1 1 2.718
- e b ah b (0) (1) (0) (1) 1 1 1
0.6 2.718 - e l l ah l (0) (1) (1) (0) (1,2)
1 1 0.5 0.5 2.718 - e l l eh l (0) (0) (1) (0,1) (2)
1 0.111111 0.5 0.111111 2.718 - e l eh (0) (0) (0,1) 1 0.111111 1
0.133333 2.718 - e ah (0) (0) 1 1 0.666667 0.6
2.718 - h e hh ah (0) (1) (0) (1) 1 1 1
0.6 2.718 - h hh (0) (0) 1 1 1 1 2.718
- l e b l ah b (0) (1) (2) (0) (1) (2)
1 1 1 0.5 2.718 - l e l ah (0) (1) (0) (1) 1 1 1
0.5 2.718
l l o l ow (0) (0) (1) (0,1) (2)
0.5 1 1 0.227273 2.718 l l l (0) (0)
(0,1) 0.25 1 1 0.833333 2.718 l o l ow
(0) (1) (0) (1) 0.5 1 1 0.227273
2.718 l l (0) (0) 0.75 1 1
0.833333 2.718 m m (0) (0) 1 0.5
1 1 2.718 n d n d (0) (1) (0) (1)
1 1 1 1 2.718 n e eh n iy (1) (2) ()
(0) (1) 1 1 0.5 0.3 2.718 n e n iy
(0) (1) (0) (1) 1 1 0.5 0.3 2.718 n
eh n (1) () (0) 1 1 0.25 1 2.718 o o
m uw m (0) (0) (1) (0,1) (2) 1
0.5 1 0.181818 2.718 o o uw (0) (0)
(0,1) 1 1 1 0.181818 2.718 o aa (0)
(0) 1 0.666667 0.2 0.181818 2.718 o
ow eh (0) (0) () 1 1 0.2 0.272727
2.718 o ow (0) (0) 1 1 0.6
0.272727 2.718 w o r w er (0) (1) (1)
(0) (1,2) 1 0.1875 1 0.424242 2.718 w w
(0) (0) 1 0.75 1 1 2.718
29Testing output
- h o t ? hh aa t
- p h o n e ? pUNK hh ow eh n iy
- b o o k ? b uw k