Title: Machine Translation
1Machine Translation
- Dekang Lin
- Department of Computing Science
- University of Alberta
2Overview
- Approaches
- Sentence Alignment
- Word Alignment
3Bitext
- Bitexts same contents in two languages
- Canadian parliamentary proceedings
- Hong Kong Hansards
- The Bible (parallel texts)
4Different Approached to Text Alignment
- Length-Based Approaches
- short sentences will be translated as short
sentences and long sentences as long sentences. - Lexical Methods
- Use lexical information to align beads of
sentences.
5Length-Based Methods
- Goal
- Find alignment A with highest probability given
the two parallel texts S and T
arg maxA P(AS, T)argmaxA P(A, S, T)
- Assumption
- Each bead is independent of the others.
- P(A, S, T) ? ?k P(Bk).
6Probability Estimation
- How to estimate the probability of a certain type
of alignment bead given the sentences in that
bead. - The length-based alignment algorithm ignores
everything else except the lengths (in terms of
characters or words) of sentences.
7- ?the average ratio between the lengths of
sentence in L1 and L2. - ? can be estimated by the ratio between the
document length, since the majority of the
sentences are 1 to 1. - German/ English ?1.1
- French/ English ?1.06
- Gale and Church 93 used ?1 for all language
pairs
8- s the standard deviation of the ratios of the
lengths of corresponding sentences in L1 and L2. - The squares of the differences of the lengths of
paragraphs can be used to estimate s2. - English-German s27.3
- English-French s2 5.6
- Gale and Church 93 used s2 6.8 for all
language pairs
9- Define ? as
- ? is a random variable with expected value being
0 and standard deviation being 1 - P(match, ?) P(?match)P(match)
- P(match) is estimated from manually annotated
data
10- P(?match) is computed as 2(1-P(?))
- where
- P(?) can be approximated
- d ?
- t1/(10.2316419 d
- pd 1-0.3989423exp(-dd/2)((((1.330274429t-1.8
21255978)t1.781477937)t-0.356563782)t0.319338
1530)t
11Dynamic Programming
12Length-Based Methods
- The algorithm uses a Dynamic Programming
technique that allows the system to efficiently
consider all possible alignments and find the
minimum cost alignment. - The method performs well (at least on related
languages). It gets a 4 error rate. It works
best on 11 alignments only 2 error rate. It
has a high error rate on more difficult
alignments.
13Word Alignment
- Input
- Pairs of sentences in two different languages
- Goal
- determine the translation of each word
14ltPAIRgt ltENGLISHgt I am very pleased to see that
happening. lt/ENGLISHgt ltFRENCHgt Je suis très
heureux que cela se produise. lt/FRENCHgt lt/PAIRgt
ltPAIRgt ltENGLISHgt As I mentioned earlier, my
riding is very diverse. lt/ENGLISHgt ltFRENCHgt Comme
je l'ai dit tout à l'heure, ma circonscription
est très diversifiée. lt/FRENCHgt lt/PAIRgt
ltPAIRgt ltENGLISHgt Dauphin-Swan River is located in
west central Manitoba, the second largest settled
area riding. lt/ENGLISHgt ltFRENCHgt La
circonscription est située au centre ouest du
Manitoba et vient au deuxième rang quant à sa
superficie habitée. lt/FRENCHgt lt/PAIRgt
15Inducing Bilingual Lexicon
- Input a sequence of paired sentences, each is
assigned a unique identifyer - Construct a feature vector for each word
- the features of the word are the sentences it
appears in - Compute the similarity between English and French
words according to the cosine of their respective
mutual information vectors.
16E_I occ S0 1 E_I occ S1 1 E_in occ S2 1 E_la
rge occ S2 1 E_locate occ S2 1 E_Manitoba occ
S2 1 E_mention occ S1 1 E_my occ S1 1 E_plea
sed occ S0 1 E_rid occ S2 1 E_riding occ S1
1
F_centre occ S2 1 F_circonscription occ S1 1 F
_circonscription occ S2 1 F_Comme occ S1 1 F_d
euxième occ S2 1 F_dit occ S1 1 F_diversifiée
occ S1 1 F_du occ S2 1 F_est occ S1 1 F_est
occ S2 1 F_et occ S2 1 F_habitée occ S2 1 F_
heure occ S1 1 F_heureux occ S0 1 F_Je occ S
0 1 F_Je occ S1 1
17(E_I (desc 24949.3) (sims F_Je 0.66442 F_que 0.324
093 F_à 0.270347 F_j 0.269776 F_' 0.238522 F_d 0.2
25402 )) (E_riding (desc 27361.9)
(sims F_circonscription 0.565143 F_comté 0.315204
F_circonscriptions 0.23595 F_ma 0.15598 F_comtés 0
.155452 F_électeurs 0.0848727 F_pétition 0.071375
F_signée 0.0593435 F_représente 0.0576735 F_située
0.054995 F_transitoire 0.0474934 ))
(E_my (desc 26117.7) (sims F_mon 0.318009 F_ma 0.3
01029 F_Monsieur 0.173827 F_mes 0.171226 F_questio
n 0.171068 F_j 0.159929 F_Je 0.153954 F_président
0.13523 F_Adresse 0.12909 F_ai 0.122849 F_collègue
0.117225 F_s 0.10922 F_ministre 0.0981145 F_au 0.
0878793 ))
18Dealing with Large Bitext
- Frequent words may appear in hundreds of
thousands of sentences. - Uses too much memory
- Slow down similarity computation
- Solution
- only use the first K (K10000) occurrences of a
word
19Homomorphic Dependency Hypothesis
- Let S and T be translation of each other. The
dependency structure of S and T are homomorphic.
20First I congratulate you on your re - election to
the Chair 0 1 2 3 4 5 6 7
8 9 10 11 ---------------------------
------------------- Je tiens à vous féliciter
pour votre réélection à la présidence 1
3 2 5 8(2 0) 11
I suggest they were saying four things 0 1
2 3 4 5 6
---------------------------------------------- À
mon avis , leur message est quadruple 0 1
2 4 3 5
21(No Transcript)
22(No Transcript)
23... regional interests are expressed in their
federal 16 17 18 19 20
21 22 arenas through an effective upper
chamber 23 24 25 26 27 28
---------------------------------------------- .
.. une Chambre haute efficace défend 25 28
27 26 19 les intérêts régionaux
sur la scène fédérale 17 16
23 22 We see ourselves as having a
twofold mandate 0 1 2 3 4 5 6
7 -------------------------------------
--------- Nous considérons avoir un mandat à deux
volets 0 1 4 5 7 6 6
24Constraint on Alignment
- The dependency structure of the source language
sentence can be used to derive a dependency
structure in the target language sentence. - Dependency structures normally do not have
crossing links - Constraint on Alignment
- the derived dependency structure of the target
language sentence must not have crossing links
25(No Transcript)
26Checking for Crossing Links
- There are many possible alignments
- The check for crossing links should be done while
the alignment is being constructed, instead of
when it is finished. - reduce the search space
27Problem 1
- Before the alignment is complete, some dependency
links in T cannot be established. Can we apply
the non-crossing constraint nonetheless?
28Problem 2
- Some of the dependency link in T corresponds to a
path in S. Does that mean we have to check all
the paths?
29Implicitly Checking for Crossings
- the distance from a node A to node B in a
dependency tree is defined as the minimum number
of levels above A so that A and B are under the
same subtree.
30- If a dependency tree does not contain crossing
links, the distances relative to a node should be
monotonic on both side of the node. - When the distances are mapped to T, the
monotonicity should be preserved.