UIMAbased MEMT: MultiEngine Machine Translation - PowerPoint PPT Presentation

About This Presentation
Title:

UIMAbased MEMT: MultiEngine Machine Translation

Description:

... Washington to the verification of that is to manufacture nuclear weapons 0.7668 ... to allow washington to verify that it does not manufacture nuclear weapons. ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 13
Provided by: AlonL
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: UIMAbased MEMT: MultiEngine Machine Translation


1
UIMA-based MEMTMulti-Engine Machine Translation
  • Faculty
  • Alon Lavie, Jaime Carbonell, Eric Nyberg
  • Students and Staff
  • Gregory Hanneman, Justin Merrill
  • (Shyamsundar Jayaraman, Satanjeev Banerjee)

2
MEMT Goals and Approach
  • Scientific Challenge
  • How to combine the output of multiple MT engines
    into a synthetic output that outperforms the
    originals in translation quality
  • Synthetic combination of the output from the
    original systems, NOT just selecting the best
    system
  • Engineering Challenge
  • How to integrate multiple distributed translation
    engines and the MEMT combination engine in a
    common framework that supports ongoing
    development and evaluation

3
Synthetic Combination MEMT
  • Approach
  • Original MT engines treated as black boxes
    each provides a single best translation
  • Explicitly identify and align the words that are
    common between any pair of translations
  • Use the alignments as reinforcement and as
    indicators of possible locations for the words in
    the combined output
  • Each engine has a weight that is used for the
    words that it contributes
  • Decoder searches for an optimal synthetic
    combination of words and phrases that optimizes a
    scoring function that combines the alignment
    weights and a LM score
  • See details in EAMT-05 ACL-05

4
Example
  • Sys1 korea stands ready to allow visits to
    verify that it does not manufacture nuclear
    weapons 0.7407
  • Sys2 North Korea Is Prepared to Allow
    Washington to Verify that It Does Not Make
    Nuclear Weapons 0.8007
  • Sys3 North Korea prepared to allow Washington
    to the verification of that is to manufacture
    nuclear weapons 0.7668
  • Selected MEMT Sentence
  • north korea is prepared to allow washington to
    verify that it does not manufacture nuclear
    weapons . 0.8894

5
Architecture and Engineering
  • Challenge How do we construct an effective
    architecture for running MEMT within large-scale
    distributed projects?
  • Example GALE Project
  • Multiple MT engines running at different
    locations
  • Input may be text or output of speech
    recognizers, Output may go downstream to other
    applications (IE, Summarization, TDT, etc.)
  • Approach Use IBMs UIMA - Unstructured
    Information Management Architecture
  • Provides support for building robust processing
    workflows with heterogeneous components
  • Components designed as annotators at the
    character level within documents

6
UIMA-based MEMT
  • MT engines and MEMT engine are set up as
    distributed servers
  • Communication over socket connections
  • Sentence-by-sentence translation
  • Java wrapper interfaces convert these into
    UIMA-style annotator components
  • Integrate components with UIMA-based workflows
  • Can chain together multiple components
  • Decomposing Tasks construct separate UIMA-based
    workflows for a variety of independent tasks,
    with results stored in a common Annotations
    Database (ADB)
  • Translation workflows
  • MEMT workflow
  • Evaluation/scoring workflow
  • ADB and ADB Collection Reader/Consumer components
    developed at CMU by Eric Nybergs group

7
UIMA-based MEMT Examples
  • Translation Workflow
  • Retrieve document from ADB
  • Annotate document with translation annotator X
  • Write back new annotation into ADB

8
UIMA-based MEMT Examples
  • MEMT Workflow
  • Retrieve document translation annotations labeled
    by X, Y, Z from ADB
  • Annotate the document with a new MEMT
    annotation
  • Write back MEMT annotation into ADB

9
Development and Evaluation
  • Initial development tests performed on TIDES 2003
    Arabic-to-English MT data, using IBM, ISI and CMU
    SMT system output
  • Further development tests performed on
    Arabic-to-English EBMT Apptek and SYSTRAN system
    output and on three Chinese-to-English COTS
    systems

10
Experimental ResultsArabic-to-English
11
Conclusions and Open Research Issues
  • New sentence-level MEMT approach with promising
    performance
  • Easy to run on both research and COTS systems
  • UIMA-based architecture design for effective
    integration in large distributed systems/projects
    ? GALE
  • Main Open Research Issues
  • Improvements to the underlying algorithm better
    word alignments, artificial word alignments
  • Confidence scores at the sentence or word level
  • Decoding is still suboptimal
  • Oracle scores show there is much room for
    improvement
  • Need for additional discriminant features
  • Extend approach to Multi-Engine SR combination
  • Engineering issues synchronization, human
    friendly interfaces with workflows

12
References
  • 2005, Jayaraman, S. and A. Lavie. "Multi-Engine
    Machine Translation Guided by Explicit Word
    Matching" . In Companion Volume of Proceedings of
    the 43th Annual Meeting of the Association of
    Computational Linguistics (ACL-2005), Ann Arbor,
    Michigan, June 2005.
  • 2005, Jayaraman, S. and A. Lavie. "Multi-Engine
    Machine Translation Guided by Explicit Word
    Matching" . In Proceedings of the 10th Annual
    Conference of the European Association for
    Machine Translation (EAMT-2005), Budapest,
    Hungary, May 2005.
Write a Comment
User Comments (0)
About PowerShow.com