NESPOLE Project Assessment and Evaluation - PowerPoint PPT Presentation

About This Presentation
Title:

NESPOLE Project Assessment and Evaluation

Description:

Evaluation of single components and modules. Evaluation of end-to-end paths ... Alternatively - online tests using the C-STAR prototype systems and CommSwitch ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 11
Provided by: AlonL
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: NESPOLE Project Assessment and Evaluation


1
NESPOLE! Project Assessment and Evaluation
  • Trento Kickoff Meeting
  • February 21-22, 2000

2
Assessment and Evaluation - Main Issues
  • Evaluation of single components and modules
  • Evaluation of end-to-end paths and integrated
    system
  • Types of assessment and evaluation
  • sentence based
  • task based
  • user studies

3
Single Component/Site Evaluations
  • Speech Recognition for each source language
  • Analysis into IF
  • Generation from IF
  • Speech Synthesis into each target language
  • Single language end-to-end evaluations
  • i.e. English-IF-English

4
Speech Recognition Evaluation
  • Standard evaluation methodology - calculate WER
    for an unseen test set
  • Does not take into account that some
    misrecognitions are more harmful than others
  • Alternative method grade the output of the SR as
    if it were a paraphrase translation

5
Evaluation of Analysis into IF
  • IF output of analysis module compared with
    manually tagged Ifs
  • CMU has a matcher utility
  • Requires manually coding a test corpus
  • Does not easily discriminate between minor and
    major errors in the analysis

6
Evaluation of Generation from IF
  • Generate text output from the manually tagged IFs
    of a test corpus
  • Grade the quality of the generated sentence, or
    grade output as paraphrase of the input utterance
  • Again - requires a manually tagged corpus of IFs

7
Single Language End-to-End Evals
  • Analyze from source language into IF and generate
    back into same language
  • Grade end-to-end performance on a sentence level
  • Isolate error set and track down source of error
  • CMU has well developed grading methodology
  • Evaluations on both transcribed and SR input
  • Easier than single component evaluations
  • Supports frequent evals and development cycle
  • Can be performed by mono-lingual speakers

8
Complete System and Multi-site Evaluations
  • End-to-end evals of translation but target
    language not the same as source - combines
    components from different sites
  • Sites communicate via IF
  • Batch mode - one site analyzes a test set into
    IF, the other site generates from the Ifs
  • Alternatively - online tests using the C-STAR
    prototype systems and CommSwitch

9
Complete System and Multi-site Evaluations
  • Evaluation of multimodal components
  • Evaluation of integrated system, including
    multimodal capabilities
  • User studies

10
Task Based Evaluations
  • In addition to sentence/accuracy based
    evaluations
  • Goal is to evaluate the ability of users to
    achieve the task they are trying to perform
  • CMU has been working on developing an appropriate
    TBE (LREC-2000 paper)
  • Main Issues
  • separating human error from machine error - we
    want to evaluate the MT, not the human
  • appropriate definitions for communicative goals
  • scoring scheme for goals that succeed or fail
Write a Comment
User Comments (0)
About PowerShow.com