Title: SMV2 Final Workshop
1SMV2 Final Workshop
Maskinoversættelse med kommerciel
gennemslagskraft
2Schedule
15.0015.10 Velkomst 15.10 -15.50
Maskinoversættelse i en oversættelsesvirksomhed
- ikke kun fremtid men
nutid Lisbeth Kjeldgaard
Almsten, Inter-Set 15.50 16.20 Udvikling og
evaluering af SMT-SMV
-systemet, herunder en demonstration
Direktør Bente Maegaard og
Udviklingsingeniør Lene
Offersgaard, CST, KU. 16.20 16.35 Kommerciel
MT Analyse og overvejelser Lektor
Daniel Hardt, CBS 16.35 - 17.00 Afrunding.
Generel diskussion og konklusioner 17.00
18.00 Reception
3Background SDMT Project
- SDMT Project intended to be basic research in MT
- didnt' receive funding for large basic research
project - Funding for Jakob Elming's PhD and construction
of parallel treebank
4Background First SMV Project
- In connection with SDMT research project, 1 year
project for collaboration between researchers and
business (SMV) - Asked Two Questions
- Can we build or deploy a state of the art
prototype MT system? - How close is MT technology to being commercially
viable?
5Results of First SMV Project
- We built our own prototype system and obtained a
high-quality publicly available system - MT technology is now commercially viable
response from the companies was extremely positive
6Questions for the Second SMV Project
- Systematic Evaluation how good is MT technology
in practice? - can we further confirm the positive results of
the previous SMV project? - When is MT good and when is it bad?
- translators systematically evaluate MT output,
sentence-by-sentence
7Machine Translation Evaluation and Reflections
8Interset Evaluation of MT Output
- Three Categories
- Perfect
- Usable
- needs post-editing
- Unusable
- easier to translate from scratch
9Evaluation Situation
- Research Prototype MT System
- based on Philipp's Koehn's MOSES
- crude formatting, user interface
- Small amount of data
- Commercial systems frequently use 10 or even 100
times as much data
10Evaluation Results
- Unusable 57
- Usable 138
- Perfect 52
11Overall Verdict
- Nearly 80 of system output has value for
translator - saves time, in translator's estimation
- Results with commercial system likely to be even
better - Reinforces positive verdict from previous SMV
workshop
12Examples Perfect
This application does not support keypad tones .
Dette program understøtter ikke
tastaturtonerne.
A message can contain a calendar note and a
business card as attachments .
En besked kan indeholde en kalendernote og et
visitkort som vedhæftede filer.
13Example Usable
- Your phone provides many functions that are
practical for daily use , such as a text and
multimedia messaging , a calendar , a clock , an
alarm clock , a radio , a music player , and a
built-in camera . - System Output
- Telefonen indeholder mange funktioner, der er
nyttige i hverdagen, f.eks. en tekst- og
mms-beskeder, en kalender, et ur, en alarm,
radio, en musikafspiller, og et indbygget kamera.
- Corrected
- Telefonen indeholder mange funktioner, der er
nyttige i hverdagen, f.eks. tekst- og
MMS-beskeder, en kalender, et ur, en alarm, en
radio, en musikafspiller og et indbygget kamera.
14Example Usable
- To download e-mail messages that have been sent
to your e-mail account , select Retrieve . - System Output
- For at hente e-mail-beskeder, der er blevet sendt
til din e-mail-konto, skal du vælge. hente - Corrected
- Du kan hente e-mail-beskeder, der er blevet sendt
til din e-mail-konto, ved at vælge Hent
15Example Unusable
- When you finish writing your message , to send
the message ,select Send . - System Output
- NÃ¥r du slutte skrive din besked, hvis du vil
sende beskeden, skal du vælge. sende - Corrected
- Når du er færdig med at skrive din besked, skal
du vælge Send for at sende beskeden.
16Example Unusable
System Output
Corrected
17Questions
- How good are manual classifications into classes
1-3? - Can translators correctly predict usability of
system output? - Can usability be automatically determined?
18Automatic Comparisons
- Edit Distance numbers of changes needed to make
system output into corrected output - May well correspond to post-editing time, which
is what classification is intended to predict - (We don't have post-editing times per sentence)
19Average Edit Distance
- Unusable 34
- Usable 16
- Perfect 4
20Sentence Length
- Unusable 87
- Usable 68
- Perfect 46
21A Bias Against Long Sentences?
- SMT does not generally do worse with longer
sentences - Human evaluator should balance sentence length
with number of observed problems this is hard to
do!
22Really Unusable?
System Output
Corrected
- Edit Distance 19
- Normalized 0.14
23Edit Distance vs Sentence Length
- Unusable .39
- Usable .23
- Perfect .08
24Summing Up
- Two SMV projects impressively demonstrate
commercial viability of the newest MT technology - We're a long way from perfect, or human-quality
MT - But we have passed the threshold where MT output
demonstrably saves time for human translators
25Some Predictions
- Steady incremental progress in core MT technology
will continue - More dramatic progress will come from two
factors - Expansion of available data for MT
- More effective ways of deploying MT