Examplebased Machine Translation via the Web - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Examplebased Machine Translation via the Web

Description:

200,000 (approx.) phrases extracted from the Penn Treebank ... steel company a 14 % increase through issues of new shares and convertible bonds ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 27
Provided by: ngo5
Category:

less

Transcript and Presenter's Notes

Title: Examplebased Machine Translation via the Web


1
Example-based Machine Translation via the Web
Nano Gough Dr. Andy Way Mary Hearne
  • School of Computer Applications
  • Dublin City University
  • Dublin
  • Ireland

2
Overview
  • System description
  • Building the example base
  • Matching and recombination
  • Calculation of weights
  • Experiments and results
  • Conclusions and further work

3
Building the Phrasal Lexicon
  • 200,000 (approx.) phrases extracted from the Penn
    Treebank
  • Rules occurring 1000 times or more (59 rule
    types)
  • Translated using 3 on-line MT systems
  • SDL Internationals
  • Enterprise Translation Server (A)
    http//www.freetranslation.com
  • Reverso by Softissimo (B) http//trans.voila.fr
  • Logomedia (C) http//www.logomedia.net

4
The Marker Hypothesis
The Marker Hypothesis states that all natural
languages have a closed set of specific words or
morphemes which appear in a limited set of
grammatical contexts and which signal that
context. Green (1979)
Gaijin A Bootstrapping, Template-Driven Approach
to EBMT ( Veale Way, 1997 )
5
The Marker Hypothesis
  • ltDTgt the,a,these ltDTgt
    le,la,l,une,un,ces..
  • English phrase on virtually all uses of
    asbestos
  • French translation (C) sur virtuellement tous
    usages dasbeste
  • ltINgt on virtually ltPDTgt all uses ltINgt of asbestos
  • ltINgt sur virtuellement ltPDTgt tous usages ltINgt
    dasbeste

6
Alignment
  • ltINgt on virtually sur virtuellement
  • ltPDTgt all uses tous usages
  • ltINgt of asbestos damiante

7
Generalized Lexicon
8
System design
English phrases extracted from PT
On-line translation systems
A
B
C
A
B
C
Marker Hypothesis
Generalized lexicon/word lexicon
9
Segmenting the input
Input The man bought the house
10
Knowledge Sources
Phrasal Lexicon
Marker Hypothesis Lexicon
Generalized Lexicon
Word Lexicon
11
Calculation of Weights
no. of occurrences of the proposed translation
Weight
total no. of translations produced for the S.L
phrase
12
Calculation of Weights contd
p
1
?
Translation weight
.
Wi
ks
i 1
Input the house collapsed
13
Experiments
  • Issues investigated
  • Translation coverage
  • Translation quality
  • Combination of knowledge sources
  • Automatic ranking vs. manual evaluation

14
Test Sets
15
Experiment on 100 sentences
16
Evaluation
17
Quality
  • Reasons for low quality
  • Verb form
  • Noun/Verb agreement
  • Biased knowledge base
  • Reasons for failed translations
  • Word insertion

18
Word insertion

19
Experiment on 500 Noun Phrases
20
Reasons for failed translations
  • Many failures due to absence of a relevant
    template in generalized marker lexicon

21
Combining Knowledge Sources (sentences)

22
Combining Knowledge Sources (NPs)
23
Ranking - sentences
24
Ranking NPs
25
Web Validation
  • Input the personal computers
  • Chunks retrieved personal computers
    ordinateurs personnels
  • the le /la/ l/ les (word
    lexicon)

26
Further work
  • Increase the size of the lexicon
  • Increase in number of rules used
  • Inclusion of rules where RHS contains a single
    non terminal
  • Provision of singular verb forms
  • Evaluation
  • Evaluation of original examples
  • Automatic evaluation of translations produced
  • Validation
  • Noun/Verb Validation
Write a Comment
User Comments (0)
About PowerShow.com