Title: noun verb noun ? subj predicate object
1 ????
- ???? ????????
- ?????????????? ??)
- ????
2??????
- ?????????????
-
- ???????????
- noun verb noun ? subj predicate object
- ?????
- (action???, agent?, target???, timepast)
- ????????(????????????????????????? ???????
- (actioneat, agentI, targetan apple, timepast)
- ???????????(?????)??????????
- ???????
- nounI, verb(past)ate, nounan apple
- ??? I ate an apple.
3??????
- ???????????????????????????
- ??????????????????????????????????
- ??????????????????
- ????????????????
- ? ? hot water?
- ???????? ?
- check???????????!
4????
- ??????
- ??? ? APPLE
- ?????
- ALLPE? if bear noun or singular apple
- if plural apples
- ??????? an apple,???? apples??????????????????
5????????example based translation
- ???????????????????????
- ??????????? ?? I ate an orange.
- ?????????????
- ???????????????????????
- ??????????
- ???????????????????
- ????????????an apple?????
- ????I ate an apple.
- ??????????????????????????????????????????????????
???????????
6????????example based translation
- ??????????????????????????????????????????????
- ????????????????????????
- ???????????????
7???????Statistic Machine Translation (SMT)
- ??????????????????????????
- 2???????????
- ???????????? aligned corpus
- ????????????????????????????????
- ??????????????????????
- ???????
- IBM? Peter Brown,S. Della Pietra, V. Della
Pietra, Robert Mercer??1993??CL???The
Mathematics of Statistical Machine
TranslationParameter Estimation??????
8Bayes???
- Canadian Hansard French-English Bilingual
corpus - ?????????f ????????????? e ????
- Given French string f, find earg
maxePr(ef) - ???f???????e???????!!
- then
9??Pr(ef)?????Pr(fe)Pr(e)??
- ???f???????e???????!!
- ????????????????
- ????????????????? f ???????????????????????
- Pr(ef)????????????e???????????????????????
- ????????????????????Pr(e)????????????????
10Alignment??
- The1 poor2 dont3 have4 any5 money6
- Les1 pauvres2 sont3 demunis4
- (Les pauvres sont demunis
- The(1) poor(2) dont(3,4) have(3,4) any(3,4)
money(3,4)) - A(e,f)a
- ? e,f??????
11??
- Alignment?????Pr(f,ae)
- ???Pr(f,a,e)???????
12IBM Model 1
- ?????????????????????????????????-(1)
- ?????????????????-(2)
13Model 1
- ????????Alignment aj ?0?? m ??????????1/(l1)?????
???????????Pr(fe)???????
14- c() ????(fe)????????e ? ???????f
?????????2???????alignment a ????f,e????????????
15- (9)?? ????f?e???????????????(alignme
nt a????????)????? - f f1, f2(f), f3, ..,f7(f), . fm
- e1(e)
- e2
- e
- e8(e)
-
- el
- ????????S???? (f(s)e(s)) s1,,S????????????????S
????????????(10)??????????????????????????????
(10)??ePr(fe) ??e???????
16t(fe)????????????
17 18????EM?t(fe)???-1
- t(fe)???????????
- ?(f(s),e(s)), 1lts ltS?????
-
- ??????
- ??? f,e
?f(s),e(s)??? - ?????0????
19????EM?t(fe)???-2
- ? ??????1??????
- ???e??????t(fe)???????????
- t(fe)???????2,3??????
20Model 2
21????????????h?????????
22Model 1?????????
- Model 1 ??(l1)-1 ???a(ij,m,l)?Model 2
???????????? - ?????EM???????t(fe)????
- ?????Model 1????????
23Model 3
- 1???n????? not gt ne pas
- n0(??????)??????? ???????????
- ????????????????
- ????????
- ???????????????
24- ???? n(fe) ????e?f?????????????????
- ????t(fe)????e????????f????????
- ???d(ji,m,l)????l,???????m,???????i???????????j?
??????? - ?????????f0
25- ????????????????????????????????f0???????p1???????
????
26???????
fi ! ?ei???????fi?????????????????????????
27- (32)??????n,t,d,p0,1?????????1???????????????????
????? Pr(ef) ?????????? - ????model1,2????????????????????????
- ?????????????????????????
28???????????????
- ???MT??????????????????BLEU???????????????????????
????????? - ????SYSTRAN??????????MT????BLEU???????????????????
??????? - ??????????SMT????????????????????????
29?????MT?????
- ????????????????MT????????????????????
????????(RMT) ???? (SMT)
BLEU ???? ????
????? ?????? ????????
???? ??????????????? ???????????????
30????MT????(1)
- BLEU
- WER(word error rate)
- ??????????????????????
31????MT????(2)
- PER(position independent WER)
- GTM(General Text Matcher)
- ?????????????????????MMS
32???????SMT???
- ?????JAPIO???????????/PAJ?????(1993???2004????12?
??G06??77??????1000??????????500???? - ???????SMT
- ????????(???????????????)???
- ??????????
- ?????????????
33???
Tommrow I will go to the conference
in Japan
?? F ??? ??? ????
?????MT?????????
BLEU WER PER
0.2713 0.848 0.452
MT2006(NIST??)??BLEU?0.35? ?????????
34?????????????? -- Aligned corpus ???--
- Parallel Corpus(?????????)
- Aligned Corpus ?????????????????2????????????????
???????? ?????(align) ??????? - 90?????????2???????????????????????
- 90??????Noisy Parallel Corpus ??????????
(Fung94,Fung98)
35???????????????????
- Gale and Church 1993
- 2????? S,T??????(Alignment) A?????
- S?T?????????? bead ????
- ? B(?? language),
- B(les eaux mineral, mineral water)
- AlignmentargmaxA P(AS,T) argmaxA P(A,S,T)
- Bk? ??????k???bead
36???????????????????
- ? B(?? language),
- B(les eaux mineral, mineral water)
- ???????????????????????collocation?????2??????????
????bead Bk??????P(Bk)????????????? - ???????????DP???
37???????????????????
- ???????????????????????????????????????
- ????????????????????????????????????????
- ????????????????????????????????DP????
- Bead??????????????????????????????????????????????
?????
38???????????????
- ?????????? GaleChruch93, Nissen et al 98
- A?? Wang Waibel 97
- ???????????????????????? Kay Roscheisen 93
39Aligned corpus ??????collocation ???????
- ???????
- ??????????????????????w1,w2(????????????)????????
???????? - w1,w2 ???????????????????????0??EM??????????Kupie
c93 - w1,w2?????? Haruno93, Smadja96
- w1,w2?Dice?? Smadja96
- ??-??????
- Likelihood ratio Melamed97
40??2???????????????????
- ?????????(??????)??W1(??A),w2(??B)?contingency
matrix (??) abcdN - ?????
- Dice??
W2?? W2???
W1?? a b
W1??? c d
41Champollion ( Smdja et al 96)
- Translating collocations based on sentence
aligned bilingual corpus - 1????????Xtract? collocation ???
- ????????????????????????????????????? collocation
??????? - ???????????????Dice??????????
42Champollion
- Dice????X0,Y0(????????collocation????????)??????
?????????????????? - ????? collocation (?????)?Dice????????????????
- ?????????????????????
43Champollion
- Canadian Hansards (50MB order)
- 3000, 5000 collocations extracted by Xtract
- ????300 collocations ???
- Xtract?error rate 11
- Incorrect translations 24
- Correct translations 65
- Champollions precsion 73
44Likelihood ratio Melamed 97
- Melamed 97 ????????one-to-one?????????
- 2?????u,v???????????????????
- ????????????????
- ????u,v?????????
- ? P(????????)? ? -P(?????????)?????????????
- ? ?- ??u,vP(k(u,v)n(u,v),?)??????????????
- ??????????L(u,v)B(k(u,v)n(u,v),?)/B(k(u,v)n(u,
v),?-) - ? ????????????
- recall90 -- precision87
45?????????????? -- non aligned corpus ???--
- Non-aligned Corpus ???
- Align???????????? Fung95ACL
- Alignment ??????????????
- ?????? Fung95WVLC
- ??????????? Rapp95,Tanaka96 ,Fung98,
- ???????????????
46?????????????? -- non aligned corpus ???--
- ????????????????????? collocation
?????????????????????????
47Noisy Parallel Corpus ???
- Fung94 ACL ?? English-French parallel corpus
(Hansards) ????????????????? - ???????????????K????
- ???Wen????Wfr?K????????????(lt1,0,1,0,0gt?????)?????
????MI??? - MI?????????????
- ??????MI???????????(t-score?????)????
- K????????????Fung95??????
48???????????????Alignment ???????????????????Noisy
Parallel Corpus ?????
- Fung95 ACL95
- Alignment ? ???????????
- Step1. ???????????????????????????????????????????
???????? - ?????????????????
- ???????DP?????????????????????????????????????????
?????????? - ???????????????
- ????????????????????? alignment ??????
49Fung 95 ACL ???
- Step2 ??????????
- ????????????????s1,s2,???(K????????????)
- ???????????????????????( i of si
)?????????????????? alignment ????????????????????
? - ???????????????????????
50Fung 95 ACL ???
- ??
- 6000?????????????
- ?????? 128? 80????
- ?????? 533? 70?????
- ???? 73???
- ????????????????
51?????????2?????????????
- Non-parallel comparable corpora
- Similarity of context is cue.
- language A
- language B
- Calculation of contexts similarity is heavy
a b X c d
a b Y c d
52Context Heterogeneity (Fung95WVLC)
- ????????? parallel ???????
- ?? trigram ????????????
- ??L ??0 ??R
- ??????0 ? context heterogeneity ?
- ??L?????a, ??R?????b
- ??0 ?????c ???
- Left-heterogeneitya/c, rigth-heterogeneityb/c
- ?????????????(??)?????
- ????????w1,w2? context heterogeneity x1,y1 (for
w1), x2,y2 (for w2)????
53Context heterogeneity ???
- e??????????????????????????????????????
- ??????????non-aligned corpus ?????????
- Context heterogeneity ?????????????????????????
- ?????????????
- ????????????????????????????????????
54??????????????? Rapp
- Rapp95,99 ???????????????????????????????
- ????????????????
- ?????????????wd???2?????????????
- ?????????????????? wa,wb ??????????MI?????????????
(????????)?????????
55??????????????? Rapp
- ?????
- ?????????(??????)??wa,wb(??B)?contingency matrix
(??) abcdN, - wa,wb??? a, wa?? b ,wb?? c,??????d
- ???????????????????wa?????w(ger)?????w(eng)???????
???????
56??????????????? Rapp
- ???w(ger)???w(eng)???????????
- ??s(w(ger),w(eng))?wa????????????????????S????
- S???????wa??????????
- ????????????????100??????????????
- ?1?????????72?
- ??10???????????????????89???
57??????????????? Fung
- Fung98 Rapp????????
- ????????????????????W?????????
- W?????????????????????????? Wd??????????????????
- W????????Wd?????tf
- W?????document ????????idf (W?????????????????
58??????????????? Fung
- tfidf ???Wd????????Wd?????????????????k????
- cosine, Dice???????????????Wd???????
- ?1????????30
- ?20????????76???????
59??????????????? Tanaka K
- Tanaka96 ????MI????EDICT
- ??A T ??B
- ??u ??k
- A TAT
vs B - ??v ??l
- Tijp(??B???j ??A???i)
- TAT?B ???????????????T?????????
- 378????????
???????? 82??????????????85????
60Context Heterogeneity (Fung95WVLC)
- ????????? parallel ???????
- ?? trigram ????????????
- ??L ??0 ??R
- ??????0 ? context heterogeneity ?
- ??L?????a, ??R?????b
- ??0 ?????c ???
- Left-heterogeneitya/c, rigth-heterogeneityb/c
- ?????????????(??)?????
- ????????w1,w2? context heterogeneity x1,y1 (for
w1), x2,y2 (for w2)????
61Context heterogeneity ???
- e??????????????????????????????????????
- ??????????non-aligned corpus ?????????
- Context heterogeneity ?????????????????????????
- ?????????????
- ????????????????????????????????????
62??????????????????nakagawa2000 LREC WTRC, 2001
NLPRS
- ?????????????????
- ??????????????????????????
- ??????wj????????we1,we2,..????????(EDICT)?????????
?? - wj ??????? wj ???????????????????wei (i1,2,..)
???
63?????????? ????(???)
??????????????(??)
1? N? memory system 100? ?????
1? N-2???????? N3??????? N50???? 100? ????
?
?
64Distance
- distance(Xe, Xj) rank(Xe)-rank(Xj)
- If distance(Xe, Xj) is small, Xe is the
translation of Xj. - distance(Xe1,Xj)ltdistance(Xe2,Xj)lt
- then Xe1 is most likely translation of Xj
65Example of distance
??????? 0.051493 ??????
0.956459 memory system ?????
1.234347 ???? 3.809609 ??? 63.4
98688
??,?????????????????60??80???
66??????????????? ???
- Rapp?Fung, Tanaka ????
- ????????????????
- ????????????(???????)????????
- ????????????????????????????????????????
- ????????????????????(local minimum)???????????????