NESPOLE Project Assessment and Evaluation

About This Presentation

Title:

Description:

Number of Views:31

Avg rating:3.0/5.0

Slides: 11

Provided by: AlonL

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: NESPOLE Project Assessment and Evaluation

1
NESPOLE! Project Assessment and Evaluation

2
Assessment and Evaluation - Main Issues

3
Single Component/Site Evaluations

4
Speech Recognition Evaluation

Standard evaluation methodology - calculate WER
for an unseen test set
Does not take into account that some
misrecognitions are more harmful than others
Alternative method grade the output of the SR as
if it were a paraphrase translation

5
Evaluation of Analysis into IF

6
Evaluation of Generation from IF

Generate text output from the manually tagged IFs
of a test corpus
Grade the quality of the generated sentence, or
grade output as paraphrase of the input utterance
Again - requires a manually tagged corpus of IFs

7
Single Language End-to-End Evals

8
Complete System and Multi-site Evaluations

End-to-end evals of translation but target
language not the same as source - combines
components from different sites
Sites communicate via IF
Batch mode - one site analyzes a test set into
IF, the other site generates from the Ifs
Alternatively - online tests using the C-STAR
prototype systems and CommSwitch

9
Complete System and Multi-site Evaluations

10
Task Based Evaluations

In addition to sentence/accuracy based
evaluations
Goal is to evaluate the ability of users to
achieve the task they are trying to perform
CMU has been working on developing an appropriate
TBE (LREC-2000 paper)
Main Issues
separating human error from machine error - we
want to evaluate the MT, not the human
appropriate definitions for communicative goals
scoring scheme for goals that succeed or fail