SpeechtoSpeech MT CSTARNespoleLingWear - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

SpeechtoSpeech MT CSTARNespoleLingWear

Description:

Spoken dialogue is very different from written text: ... operatives in the field to assimilate forien language information they encounter ... – PowerPoint PPT presentation

Number of Views:26

Avg rating:3.0/5.0

Slides: 38

Provided by: AlonL

Category:

more less

Transcript and Presenter's Notes

Title: SpeechtoSpeech MT CSTARNespoleLingWear

1
Speech-to-Speech MTC-STAR/Nespole!/LingWear

Lori Levin, Alon Lavie, Alex Waibel,
Bob Frederking, Tanja Schultz
LTI Immigration Course
August 24, 2001

2
Outline

Problems in Speech-to-Speech MT
The JANUS Approach
The Task-oriented Interlingua (IF)
System Design and Engineering
The C-STAR Nespole! And LingWear Projects
Open Problems, Current and Future Research

3
Issues in Speech Translation

Spoken dialogue is very different from written
text
different linguistically syntax, constructions
contains unique phenomena repairs, hesitations,
filled pauses
Speech Translation requires specialized
approches
robust analysis
focus on communicative goals, semantics, rather
than syntax

4
Our Speech Translation Approach

Translation via a task-oriented interlingua
representation
Focus on large, well-defined domains
Robust analysis approaches
Semantic grammars
Modular grammar design
Incorporate alternative translation engines

5
The Travel Planning Domain

General Scenario
Dialogue between one traveler and a travel
service provider (agent, hotel clerk, etc.)
Task oriented goal is to obtain information,
reserve or purchase services related to travel
Free spontaneous speech

6
The Travel Planning Domain

Natural breakdown into several sub-domains
Hotel Information and Reservation
Transportation Information and Reservation
Information about Sights and Events
General Travel Information
Cross Domain

7
Semantic Grammars

Describe structure of semantic concepts instead
of syntactic constituency of phrases
Well suited for task-oriented dialogue containing
many fixed expressions
Appropriate for spoken language - often disfluent
and syntactically ill-formed
Faster to develop reasonable coverage for limited
domains

8
Semantic Grammars

Hotel Reservation Example
Input we have two hotels available
Parse Tree
give-informationavailabilityhotel
(we have hotel-type
(quantity (two)
hotel (hotels)
available)

9
HLT Server Architecture
10
HLT Server Architecture
11
Rule-based Translation Approach
12
The SOUP Parser

Specifically designed to parse spoken language
using domain-specific semantic grammars
Robust - can skip over disfluencies in input
Stochastic - probabilistic CFG encoded as a
collection of RTNs with arc probabilities
Top-Down - parses from top-level concepts of the
grammar down to matching of terminals
Chart-based - dynamic matrix of parse DAGs
indexed by start and end positions and head cat

13
The SOUP Parser

Supports parsing with large multiple domain
grammars
Produces a lattice of parse analyses headed by
top-level concepts
Disambiguation heuristics rank the analyses in
the parse lattice and select a single best path
through the lattice
Graphical grammar editor

14
SOUP Disambiguation Heuristics

Maximize coverage (of input)
Minimize number of parse trees (fragmentation)
Minimize number of parse tree nodes
Minimize the number of wild-card matches
Maximize the probability of parse trees
Find sequence of domain tags with maximal
probability given the input words P(TW), where
T t1,t2,,tn is a sequence of domain tags

15
Generation Modules

Two alternative generation modules
GenKit - unification-based generator augmented
with Morphe morphology module - used for German
Top-Down context-free based generator - fast,
used for English and Japanese

16
Translation with Multiple Domain Grammars
17
A SOUP Parse Lattice
18
Hybrid Stat/Rule-based Analysis

Developing large coverage semantic analysis
grammars is time consuming ? difficult to port
analysis system to new domains
low-level argument grammars are more
domain-independent contain many concepts that
are used across domains time, location, prices,
etc.
high-level domain-actions are domain-specific,
must be redeveloped for each new domain
give-infoonsetsymptom
Tagging data sets with interlingua
representations is less time consuming, needed
anyway for system development

19
Hybrid Rule/Stat Approach

Combines grammar-based and statistical approaches
to analysis
Develop semantic grammars for phrase-level
arguments that are more portable to new domains
Use statistical machine learning techniques for
classifying into domain-actions
Porting to a new domain requires
developing argument parse rules for new domain
tagging training set with domain-actions for new
domain
training the classifiers for domain-actions on
the tagged data

20
The Hybrid Analysis Process

Parse an utterance for arguments
Segment the utterance into sentences
Extract features from the utterance and the
single best parse output
Use a learned classifier to identify the speech
act
Use a learned classifier to identify the concept
sequence
Combine into a full parse

21
Automatic Classification of Domain Actions

Train classifiers for speech acts and concepts
Training data Utterances labeled with speech
act, concepts, and best argument parse
Input features
n most common words
Arguments and pseudo-arguments in best parse
Speaker
Predicted speech act (for concept classifier)

22
Argument Parse Example
We have a double room available for you at
twenty-three thousand five hundred
yen availabilityPSD ( we have
super_room-type ( room-type ( a
roomdouble ( double room ) ) ) available
) arg-partyfor-whomARG ( for you ( you )
) argtimeARG ( point ( at
hour-minute ( bighour ( big23 (
twenty-three ) ) ) ) ) argsuper_priceARG (
price ( one-pricemain-quantity (
n-1000 ( thousand ) pricen-100 ( five
hundred ) ) currency ( yen ( yen ) ) ) )
23
Full Parse Example
We have a double room available for you at
twenty-three thousand five hundred
yen give-informationavailabilityroom
( availabilityPSD ( we have
super_room-type ( room-type ( a
roomdouble ( double room ) ) ) available
) arg-partyfor-whomARG ( for you ( you )
) argtimeARG ( point ( at
hour-minute ( bighour ( big23 (
twenty-three ) ) ) ) ) argsuper_priceARG (
price ( one-pricemain-quantity (
n-1000 ( thousand ) pricen-100 ( five
hundred ) ) currency ( yen ( yen ) ) )
) )
24
Classification Results UsingMemory-based (TiMBL)
Classifiers
25
Alternative Approaches MEMT

Glossary-based Translation
Translates directly into target language (no IF)
Based on Pangloss translation system developed at
CMU
Uses a combination of EBMT, phrase glossaries and
a bilingual dictionary
Good fall-back for uncovered utterances

26
C-STAR-III

Partners ATR, CMU, CLIPS, ETRI, IRST, UKA
Main Research Goals
Expandability - towards unlimited domains
Accessibility - Speech Translation over wireless
phone
Usability - real service for real users

Speech-to-speech translation for eCommerce
CMU, Karlsruhe, IRST, CLIPS, 2 commercial
partners
Improved limited-domain speech translation
Experiment with multimodality and with MEMT
EU-side has strict scheduling and deliverables
First test domain Italian travel agency
Second showcase international Help desk
Tied in to CSTAR-III

28
LingWear for the Information Warrior

New Ideas
The pre-development of appropriate interlingua
representations for domains of interest
facilitates generation into a new language within
two weeks.
The development of new MT engines (e.g.
learnable transfer rules) and improved
multi-engine integration supports rapid
deployment of MT for a new language with scarce
resources.
Gisting and summarzation in the source language
followed by MT is better than vice versa.

Impact
Allow military and relief organizations to
converse in limited domains of interest with the
local population in an area of conflict and/or
disaster
Allow military and other operatives in the field
to assimilate forien language information they
encounter on-the-move
Rapidly port and deploy the technology into new
languages with scarce resources

Schedule
Port to second language
Baseline summarizer ready
Baseline MT systems ready
Port to third language
Carnegie Mellon University School of Computer
Science A.Waibel, L. Levin, A. Lavie, R.
Frederking
29
Domain Portability Travel to Medical
Knowledge-Based Methods Re-usability of knowledge
sources for translation and speech recognition
Corpus-Based Methods Reduce the amount of new
training data for translation and speech
recognition
30
Portability

Advantage Interlingua
Problem Writing semantic grammars
Domain dependent
Requires time, effort, and expertise
Approach
Grammar modularity
Domain action learning
Automatic/Interactive semantic grammar induction

31
Automatic Induction of Semantic Grammars

Seed grammar for a new domain has very limited
coverage
Corpus of development data tagged with
interlingua representations available
Expand the seed grammar by learning new rules for
covering the same domain-actions
First step how well can we do with no human
intervention?

32
System Evaluation Methodology

End-to-end evaluations conducted at the SDU
(sentence) level
Multiple bilingual graders compare the input with
translated output and assign a grade of Perfect,
OK or Bad
OK meaning of SDU comes across
Perfect OK fluent output
Bad translation incomplete or incorrect

33
C-STAR 1999 Evaluation Results
34
Evaluation - Progress Over Time
35
Current and Future Research

Expanding the domains of coverage
Machine Learning-based approaches to analysis
hybrid rule/stat analysis approach, grammar
induction
Multiple interfaces web, phone, PDAs
Integration of multiple MT approaches into a MEMT
system
Disambiguation improved sentence-level
disambiguation applying discourse contextual
information for disambiguation

36
Students Working on the Project

Chad Langley Hybrid Rule/Stat analyzer
Benjamin Han Grammar Induction
Stan Jou Phone interfaces and recognizer
Alicia Tribble Language portability
Kornel Laskowski H323 Speech Recognizer

37
The C-STAR/Nespole!/LingWear Team

Project Leaders Lori Levin, Alon Lavie, Alex
Waibel, Bob Frederking, Tanja Schultz
Grammar and Component Developers Donna
Gates, Dorcas Wallace, Kay Peterson, Chad
Langley, Benjamin Han, Alicia Tribble, Kornel
Laskowski, Stan Jou, Celine Morel, Susie Burger

Write a Comment

User Comments (0)