Title: Examplebased Machine Translation via the Web
1Example-based Machine Translation via the Web
Nano Gough Dr. Andy Way Mary Hearne
- School of Computer Applications
- Dublin City University
- Dublin
- Ireland
2Overview
- System description
- Building the example base
- Matching and recombination
- Calculation of weights
- Experiments and results
- Conclusions and further work
3Building the Phrasal Lexicon
- 200,000 (approx.) phrases extracted from the Penn
Treebank - Rules occurring 1000 times or more (59 rule
types) - Translated using 3 on-line MT systems
- SDL Internationals
- Enterprise Translation Server (A)
http//www.freetranslation.com - Reverso by Softissimo (B) http//trans.voila.fr
- Logomedia (C) http//www.logomedia.net
4The Marker Hypothesis
The Marker Hypothesis states that all natural
languages have a closed set of specific words or
morphemes which appear in a limited set of
grammatical contexts and which signal that
context. Green (1979)
Gaijin A Bootstrapping, Template-Driven Approach
to EBMT ( Veale Way, 1997 )
5The Marker Hypothesis
- ltDTgt the,a,these ltDTgt
le,la,l,une,un,ces.. - English phrase on virtually all uses of
asbestos - French translation (C) sur virtuellement tous
usages dasbeste - ltINgt on virtually ltPDTgt all uses ltINgt of asbestos
- ltINgt sur virtuellement ltPDTgt tous usages ltINgt
dasbeste
6Alignment
- ltINgt on virtually sur virtuellement
- ltPDTgt all uses tous usages
- ltINgt of asbestos damiante
7Generalized Lexicon
8System design
English phrases extracted from PT
On-line translation systems
A
B
C
A
B
C
Marker Hypothesis
Generalized lexicon/word lexicon
9Segmenting the input
Input The man bought the house
10Knowledge Sources
Phrasal Lexicon
Marker Hypothesis Lexicon
Generalized Lexicon
Word Lexicon
11Calculation of Weights
no. of occurrences of the proposed translation
Weight
total no. of translations produced for the S.L
phrase
12Calculation of Weights contd
p
1
?
Translation weight
.
Wi
ks
i 1
Input the house collapsed
13Experiments
- Issues investigated
- Translation coverage
- Translation quality
- Combination of knowledge sources
- Automatic ranking vs. manual evaluation
14Test Sets
15Experiment on 100 sentences
16Evaluation
17Quality
- Reasons for low quality
- Verb form
- Noun/Verb agreement
- Biased knowledge base
- Reasons for failed translations
- Word insertion
18Word insertion
19Experiment on 500 Noun Phrases
20Reasons for failed translations
- Many failures due to absence of a relevant
template in generalized marker lexicon
21Combining Knowledge Sources (sentences)
22Combining Knowledge Sources (NPs)
23Ranking - sentences
24Ranking NPs
25Web Validation
- Input the personal computers
- Chunks retrieved personal computers
ordinateurs personnels - the le /la/ l/ les (word
lexicon)
26Further work
- Increase the size of the lexicon
- Increase in number of rules used
- Inclusion of rules where RHS contains a single
non terminal - Provision of singular verb forms
- Evaluation
- Evaluation of original examples
- Automatic evaluation of translations produced
- Validation
- Noun/Verb Validation