Title: Statistical Machine Translation
1Statistical Machine Translation
- Translation without Understanding
- Colin Cherry
2Who is this guy?
- One of Dr. Lins PhD students
- Did my Masters degree at U of A
- Research Area Machine Translation
- Home town Halifax, Nova Scotia
- Please ask questions!
3Machine Translation
- Translation is easy for (bilingual) people
- Process
- Read the text in English
- Understand it
- Write it down in French
4Machine Translation
- Translation is easy for (bilingual) people
- Process
- Read the text in English
- Understand it
- Write it down in French
- Hard for computers
- The human process is invisible, intangible
5One approach Babelfish
- A rule-based approach to machine translation
- A 30-year-old feat in Software Eng.
- Programming knowledge in by hand is difficult and
expensive
6Alternate Approach Statistics
- What if we had a model for P(FE) ?
- We could use Bayes rule
7Why Bayes rule at all?
- Why not model P(EF) directly?
- P(FE)P(E) decomposition allows us to be sloppy
- P(E) worries about good English
- P(FE) worries about French that matches English
- The two can be trained independently
8Crime Scene Analogy
- F is a crime scene. E is a person who may have
committed the crime - P(EF) - look at the scene - who did it?
- P(E) - who had a motive? (Profiler)
- P(FE) - could they have done it? (CSI -
transportation, access to weapons, alabi) - Some people might have great motives, but no
means - you need both!
9On voit Jon à la télévision
Table borrowed from Jason Eisner
10Where will we get P(FE)?
Machine Learning Magic
Books in English
Same books, in French
P(FE) model
We call collections stored in two languages
parallel corpora or parallel texts Want to update
your system? Just add more text!
11Our Inspiration
- The Canadian Parliamentary Debates!
- Stored electronically in both French and English
and available over the Internet
12Problem
- How are we going to generalize from examples of
translations? - Ill spend the rest of this lecture telling you
- What makes a useful P(FE)
- How to obtain the statistics needed for P(FE)
from parallel texts
13Strategy Generative Story
- When modeling P(XY)
- Assume you start with Y
- Decompose the creation of X from Y into some
number of operations - Track statistics of individual operations
- For a new example X,Y P(XY) can be calculated
based on the probability of the operations needed
to get X from Y
14What if?
15New Information
- Call this new info a word alignment (A)
- With A, we can make a good story
The quick fox jumps over the lazy dog
Le renard rapide saut par - dessus le chien
parasseux
16P(F,AE) Story
null The quick fox jumps over the lazy dog
17P(F,AE) Story
null The quick fox jumps over the lazy dog
f1
f2
f3
f10
18P(F,AE) Story
null The quick fox jumps over the lazy dog
f1
f2
f3
f10
19P(F,AE) Story
null The quick fox jumps over the lazy dog
Le renard rapide saut par - dessus le chien
parasseux
20P(F,AE) Story
null The quick fox jumps over the lazy dog
Le renard rapide saut par - dessus le chien
parasseux
21Getting Pt(fe)
- We need numbers for Pt(fe)
- Example Pt(lethe)
- Count lines in a large collection of aligned text
22Where do we get the lines?
- That sure looked like a lot of monkeys
- Remember POS tagging w/ HMMs
- You didnt need a tagged corpus to train a tagger
- Well get alignments out of unaligned text by
treating the alignment as a hidden variable - Generalization of ideas in HMM training called EM
23Wheres heaven in Vietnamese?
English In the beginning God created the heavens
and the earth. Vietnamese Ban dâu Dúc Chúa Tròi
dung nên tròi dât. English God called the
expanse heaven. Vietnamese Dúc Chúa Tròi dat tên
khoang không la tròi. English you are this
day like the stars of heaven in
number. Vietnamese các nguoi dông nhu sao
trên tròi.
Example borrowed from Jason Eisner
24Wheres heaven in Vietnamese?
English In the beginning God created the heavens
and the earth. Vietnamese Ban dâu Dúc Chúa Tròi
dung nên tròi dât. English God called the
expanse heaven. Vietnamese Dúc Chúa Tròi dat tên
khoang không la tròi. English you are this
day like the stars of heaven in
number. Vietnamese các nguoi dông nhu sao
trên tròi.
Example borrowed from Jason Eisner
25EM Estimation Maximization
- Assume a probability distribution (weights) over
hidden events - Take counts of events based on this distribution
- Use counts to estimate new parameters
- Use parameters to re-weight examples.
- Rinse and repeat
26Alignment Hypotheses
27Weighted Alignments
- What well do is
- Consider every possible alignment
- Give each alignment a weight - indicating how
good it is - Count weighted alignments as normal
28Good grief! We forgot about P(FE)!
- No worries, a little more stats gets us what we
need
29Big Example Corpus
fast car
1
voiture rapide
fast
2
rapide
30Possible Alignments
1a
1b
2
fast car
fast
fast car
voiture rapide
rapide
voiture rapide
31Parameters
1a
1b
2
fast car
fast
fast car
voiture rapide
rapide
voiture rapide
32Weight Calculations
1a
1b
2
fast car
fast
fast car
voiture rapide
rapide
voiture rapide
33Count Lines
1a
1b
2
fast car
fast
fast car
1/2
1/2
1
voiture rapide
rapide
voiture rapide
34Count Lines
1a
1b
2
fast car
fast
fast car
1/2
1/2
1
voiture rapide
rapide
voiture rapide
35Count Lines
1a
1b
2
fast car
fast
fast car
1/2
1/2
1
voiture rapide
rapide
voiture rapide
Normalize
36Parameters
1a
1b
2
fast car
fast
fast car
voiture rapide
rapide
voiture rapide
37Weight Calculations
1a
1b
2
fast car
fast
fast car
voiture rapide
rapide
voiture rapide
38Count Lines
1a
1b
2
fast car
fast
fast car
1/4
3/4
1
voiture rapide
rapide
voiture rapide
39Count Lines
1a
1b
2
fast car
fast
fast car
1/4
3/4
1
voiture rapide
rapide
voiture rapide
40Count Lines
1a
1b
2
fast car
fast
fast car
1/4
3/4
1
voiture rapide
rapide
voiture rapide
Normalize
41After many iterations
1a
1b
2
fast car
fast
fast car
0
1
1
voiture rapide
rapide
voiture rapide
42Seems too easy?
- What if you have no 1-word sentence?
- Words in shorter sentences will get more weight -
fewer possible alignments - Weight is additive throughout the corpus if a
word e shows up frequently with some other word
f, P(fe) will go up
43Some things I skipped
- Enumerating all possible alignments
- Very easy with this model The independence
assumptions save us - Model could be a lot better
- Word positions
- Multiple fs generated by the same e
- Can actually use an HMM!
44The Final Product
- Now we have a model for P(FE)
- Test it by aligning a corpus!
- IE Find argmaxAP(AF,E)
- Use it for translation
- Combine with favorite model for P(E)
- Search space of English sentences for one that
maximizes P(E)P(FE) for a given F
45Questions?