A%20Multi-Perspective%20Evaluation%20of%20the%20NESPOLE!%20Speech-to-Speech%20Translation%20System - PowerPoint PPT Presentation

About This Presentation
Title:

A%20Multi-Perspective%20Evaluation%20of%20the%20NESPOLE!%20Speech-to-Speech%20Translation%20System

Description:

Partners: CMU, Univ of Karlsruhe, ITC-irst, UJF-CLIPS, AETHRA, ... reduce mis-communication and disfluencies. supports faster recovery from translation errors ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 23
Provided by: AlonL
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: A%20Multi-Perspective%20Evaluation%20of%20the%20NESPOLE!%20Speech-to-Speech%20Translation%20System


1
A Multi-Perspective Evaluation of the NESPOLE!
Speech-to-Speech Translation System
  • Alon Lavie, Carnegie Mellon University
  • Florian Metze, University of Karlsruhe
  • Roldano Cattoni, ITC-irst
  • Erica Costantini, University of Trieste

2
Outline
  • The NESPOLE! Project
  • Approach and System Architecture
  • Performance and Usability Evaluation
  • Distributed real-time performance over internet
  • Integration and use of multi-modal capabilities
  • End-to-end Translation performance
  • Lessons learned and future work

3
  • Speech-to-speech translation for E-Commerce
    applications
  • Partners CMU, Univ of Karlsruhe, ITC-irst,
    UJF-CLIPS, AETHRA, APT-Trentino
  • Builds on successful collaboration within C-STAR
  • Improved limited-domain speech translation
  • Experiment with multimodality and with MEMT
  • Showcase-1 Travel and Tourism in Trentino,
    completed in Nov-2001, demonstrated IST,HLT
  • Showcase-2 expanded travel medical service

4
Speech-to-speech in E-commerce
  • Augment current passive web E-commerce with live
    interaction capabilities
  • Client starts via web, can easily connect to
    agent for specific detailed information
  • Thin client - very little special hardware and
    software on client PC browser, MS Netmeeting,
    Shared Whiteboard

5
NESPOLE! User Interfaces
6
NESPOLE! Architecture
7
Distributed S2S Translation over the Internet
8
Outline
  • The NESPOLE! Project
  • Approach and System Architecture
  • Performance and Usability Evaluation
  • Distributed real-time performance over internet
  • Integration and use of multi-modal capabilities
  • End-to-end Translation performance
  • Lessons learned and future work

9
Distributed Translation over the Internet
  • Distributed architecture voice-over-IP
    performance highly dependent on network bandwidth
    conditions
  • Two types of internet connections UDP vs. TCP
  • end users to Mediator over H.323/UDP speech
    (G.711)text, 56Kb/sec, packets received out of
    order are discarded
  • Mediator to HLT servers over TCP speechtext,
    56Kb/sec, no packet loss but possible delays
  • Communication between HLT servers IF (text),
    1Kb/sec, also TCP - no packet loss
  • To what extent is speech recognition performance
    affected by the physical location of system
    modules and internet bandwidth conditions?

10
Network Traffic Impact Experiment
  • Client station in either US (CMU) or Germany
    (UKA), Mediator in Italy (irst)
  • German HLT server at UKA
  • One hour of high-quality DAT recording of German
    speech piped into system input, setup otherwise
    as with real subjects
  • 16 tests run at various times of day and days of
    week to capture variety of real-life network
    conditions
  • Goal correlate packet-loss with WERs

11
Network Traffic Impact - Results
  • Average WER 39.6 (on clean 16KHz recording
    28.8)
  • Packet-losses 0.1-5.2 result in WERs of 37-41
  • One case of 21 packet loss, 49.7 WER, very
    difficult to understand
  • Two cases of complete breakdown of connections
  • No severe performance degradation under normal
    network conditions
  • No clear correlation between packet-loss and WER
    in range of up to 5 packet-loss

12
Outline
  • The NESPOLE! Project
  • Approach and System Architecture
  • Performance and Usability Evaluation
  • Distributed real-time performance over internet
  • Integration and use of multi-modal capabilities
  • End-to-end Translation performance
  • Lessons learned and future work

13
Effects of Multimodality on Communication
  • Multimodal capabilities sharing of web pages and
    graphical information via Whiteboard, gesture
    annotations on shared images - mostly spatial
    indicate/point/select, draw connections between
    points
  • To what extent do multimodal gesture capabilities
    enhance communication?
  • increase task completion rate
  • reduce mis-communication and disfluencies
  • supports faster recovery from translation errors

14
Aethra Whiteboard
15
Experimental Setup and Design
  • Well defined task of obtaining winter tourist
    information select a resort and hotel that
    satisfies list of geographical/fiscal constraints
  • Real users novice clients, semi-real agents
  • head-mounted mike, push-to-talk, original audio
    disabled
  • Two conditions
  • Speech Only (SO) translated speech shared
    images
  • Multimodal (MM) above gestures by either party
  • 28 collected dialogues 14 E/I, 14 G/I, 7
    dialogues in each condition for each language
    pair
  • Detailed Transcriptions Costantini et al.
    LREC-2002 including gesture annotations, marking
    of repeated turns, tagging of successful/unsuccess
    ful/partly successful turns

16
MM Experiment Results
  • Speech characteristics not statistically
    different between the two conditions
  • avg dialogue duration, number of turns, task
    completion rate (86)
  • Gestures in MM condition
  • avg 7.6 per dialogue, 98 performed by agents,
    mostly area selections (61), usually preceded by
    a dialogue contribution (79), usually a verbal
    cue, almost no deictics
  • Significant differences between conditions in
    terms of repeated and (un)successful turns

E/I G/I
17
Outline
  • The NESPOLE! Project
  • Approach and System Architecture
  • Performance and Usability Evaluation
  • Distributed real-time performance over internet
  • Integration and use of multi-modal capabilities
  • End-to-end Translation performance
  • Lessons learned and future work

18
End-to-End Evaluation Setup
  • ENG/ITA, GER/ITA, FR/ITA
  • 4 unseen dialogues for each lang pair, 2 from
    winter vacations, 2 from summer resorts 2
    collected monolingually, 2 bilingually
  • Monolingual and crosslingual evaluations
  • to Italian on client, from Italian on agent
  • 3-4 human graders per language pair
  • Accuracy based evaluation at the Semantic
    Dialogue Unit (SDU) level one grader segmented,
    all used segmentation
  • Three point grading scheme perfect/ok/bad

19
Evaluation Results
20
Lessons Learned
  • Network bandwidth is a concern, but performance
    effects reasonable under normal conditions of
    packet-loss
  • Gestures significantly enhance communication
    effectiveness
  • End-to-end accuracy is not yet impressive, but
    task completion is much higher

21
Future Work and Directions
  • Network Bandwidth desire to add video, support
    standard 56Kbs modems. Thus - need to move to
    G.723 (6.3Kb/sec)
  • Improved user interface design, particularly
    translation feedback
  • Get rid of push-to-talk
  • Improved evaluation methods and metrics
  • three-way grading vs binary grading
  • average graders vs majority votes
  • inter-grader agreement
  • automatic metrics, i.e BLEU?
  • Evaluate domain portability - medical service for
    tourists

22
NESPOLE! Monitor
Write a Comment
User Comments (0)
About PowerShow.com