Title: SpeechtoSpeech MT CSTARNespoleLingWear
1Speech-to-Speech MTC-STAR/Nespole!/LingWear
- Lori Levin, Alon Lavie, Alex Waibel,
- Bob Frederking, Tanja Schultz
- LTI Immigration Course
- August 24, 2001
2Outline
- Problems in Speech-to-Speech MT
- The JANUS Approach
- The Task-oriented Interlingua (IF)
- System Design and Engineering
- The C-STAR Nespole! And LingWear Projects
- Open Problems, Current and Future Research
3Issues in Speech Translation
- Spoken dialogue is very different from written
text - different linguistically syntax, constructions
- contains unique phenomena repairs, hesitations,
filled pauses - Speech Translation requires specialized
approches - robust analysis
- focus on communicative goals, semantics, rather
than syntax
4Our Speech Translation Approach
- Translation via a task-oriented interlingua
representation - Focus on large, well-defined domains
- Robust analysis approaches
- Semantic grammars
- Modular grammar design
- Incorporate alternative translation engines
5The Travel Planning Domain
- General Scenario
- Dialogue between one traveler and a travel
service provider (agent, hotel clerk, etc.) - Task oriented goal is to obtain information,
reserve or purchase services related to travel - Free spontaneous speech
6The Travel Planning Domain
- Natural breakdown into several sub-domains
- Hotel Information and Reservation
- Transportation Information and Reservation
- Information about Sights and Events
- General Travel Information
- Cross Domain
7Semantic Grammars
- Describe structure of semantic concepts instead
of syntactic constituency of phrases - Well suited for task-oriented dialogue containing
many fixed expressions - Appropriate for spoken language - often disfluent
and syntactically ill-formed - Faster to develop reasonable coverage for limited
domains
8Semantic Grammars
- Hotel Reservation Example
- Input we have two hotels available
- Parse Tree
- give-informationavailabilityhotel
- (we have hotel-type
- (quantity (two)
- hotel (hotels)
- available)
9HLT Server Architecture
10HLT Server Architecture
11Rule-based Translation Approach
12The SOUP Parser
- Specifically designed to parse spoken language
using domain-specific semantic grammars - Robust - can skip over disfluencies in input
- Stochastic - probabilistic CFG encoded as a
collection of RTNs with arc probabilities - Top-Down - parses from top-level concepts of the
grammar down to matching of terminals - Chart-based - dynamic matrix of parse DAGs
indexed by start and end positions and head cat
13The SOUP Parser
- Supports parsing with large multiple domain
grammars - Produces a lattice of parse analyses headed by
top-level concepts - Disambiguation heuristics rank the analyses in
the parse lattice and select a single best path
through the lattice - Graphical grammar editor
14SOUP Disambiguation Heuristics
- Maximize coverage (of input)
- Minimize number of parse trees (fragmentation)
- Minimize number of parse tree nodes
- Minimize the number of wild-card matches
- Maximize the probability of parse trees
- Find sequence of domain tags with maximal
probability given the input words P(TW), where
T t1,t2,,tn is a sequence of domain tags
15Generation Modules
- Two alternative generation modules
- GenKit - unification-based generator augmented
with Morphe morphology module - used for German - Top-Down context-free based generator - fast,
used for English and Japanese
16Translation with Multiple Domain Grammars
17A SOUP Parse Lattice
18Hybrid Stat/Rule-based Analysis
- Developing large coverage semantic analysis
grammars is time consuming ? difficult to port
analysis system to new domains - low-level argument grammars are more
domain-independent contain many concepts that
are used across domains time, location, prices,
etc. - high-level domain-actions are domain-specific,
must be redeveloped for each new domain
give-infoonsetsymptom - Tagging data sets with interlingua
representations is less time consuming, needed
anyway for system development
19Hybrid Rule/Stat Approach
- Combines grammar-based and statistical approaches
to analysis - Develop semantic grammars for phrase-level
arguments that are more portable to new domains - Use statistical machine learning techniques for
classifying into domain-actions - Porting to a new domain requires
- developing argument parse rules for new domain
- tagging training set with domain-actions for new
domain - training the classifiers for domain-actions on
the tagged data
20The Hybrid Analysis Process
- Parse an utterance for arguments
- Segment the utterance into sentences
- Extract features from the utterance and the
single best parse output - Use a learned classifier to identify the speech
act - Use a learned classifier to identify the concept
sequence - Combine into a full parse
21Automatic Classification of Domain Actions
- Train classifiers for speech acts and concepts
- Training data Utterances labeled with speech
act, concepts, and best argument parse - Input features
- n most common words
- Arguments and pseudo-arguments in best parse
- Speaker
- Predicted speech act (for concept classifier)
22Argument Parse Example
We have a double room available for you at
twenty-three thousand five hundred
yen availabilityPSD ( we have
super_room-type ( room-type ( a
roomdouble ( double room ) ) ) available
) arg-partyfor-whomARG ( for you ( you )
) argtimeARG ( point ( at
hour-minute ( bighour ( big23 (
twenty-three ) ) ) ) ) argsuper_priceARG (
price ( one-pricemain-quantity (
n-1000 ( thousand ) pricen-100 ( five
hundred ) ) currency ( yen ( yen ) ) ) )
23Full Parse Example
We have a double room available for you at
twenty-three thousand five hundred
yen give-informationavailabilityroom
( availabilityPSD ( we have
super_room-type ( room-type ( a
roomdouble ( double room ) ) ) available
) arg-partyfor-whomARG ( for you ( you )
) argtimeARG ( point ( at
hour-minute ( bighour ( big23 (
twenty-three ) ) ) ) ) argsuper_priceARG (
price ( one-pricemain-quantity (
n-1000 ( thousand ) pricen-100 ( five
hundred ) ) currency ( yen ( yen ) ) )
) )
24Classification Results UsingMemory-based (TiMBL)
Classifiers
25Alternative Approaches MEMT
- Glossary-based Translation
- Translates directly into target language (no IF)
- Based on Pangloss translation system developed at
CMU - Uses a combination of EBMT, phrase glossaries and
a bilingual dictionary - Good fall-back for uncovered utterances
26C-STAR-III
- Partners ATR, CMU, CLIPS, ETRI, IRST, UKA
- Main Research Goals
- Expandability - towards unlimited domains
- Accessibility - Speech Translation over wireless
phone - Usability - real service for real users
27- Speech-to-speech translation for eCommerce
- CMU, Karlsruhe, IRST, CLIPS, 2 commercial
partners - Improved limited-domain speech translation
- Experiment with multimodality and with MEMT
- EU-side has strict scheduling and deliverables
- First test domain Italian travel agency
- Second showcase international Help desk
- Tied in to CSTAR-III
28LingWear for the Information Warrior
- New Ideas
- The pre-development of appropriate interlingua
representations for domains of interest
facilitates generation into a new language within
two weeks. - The development of new MT engines (e.g.
learnable transfer rules) and improved
multi-engine integration supports rapid
deployment of MT for a new language with scarce
resources. - Gisting and summarzation in the source language
followed by MT is better than vice versa.
- Impact
- Allow military and relief organizations to
converse in limited domains of interest with the
local population in an area of conflict and/or
disaster - Allow military and other operatives in the field
to assimilate forien language information they
encounter on-the-move - Rapidly port and deploy the technology into new
languages with scarce resources
Schedule
Port to second language
Baseline summarizer ready
Baseline MT systems ready
Port to third language
Carnegie Mellon University School of Computer
Science A.Waibel, L. Levin, A. Lavie, R.
Frederking
29Domain Portability Travel to Medical
Knowledge-Based Methods Re-usability of knowledge
sources for translation and speech recognition
Corpus-Based Methods Reduce the amount of new
training data for translation and speech
recognition
30Portability
- Advantage Interlingua
- Problem Writing semantic grammars
- Domain dependent
- Requires time, effort, and expertise
- Approach
- Grammar modularity
- Domain action learning
- Automatic/Interactive semantic grammar induction
31Automatic Induction of Semantic Grammars
- Seed grammar for a new domain has very limited
coverage - Corpus of development data tagged with
interlingua representations available - Expand the seed grammar by learning new rules for
covering the same domain-actions - First step how well can we do with no human
intervention?
32System Evaluation Methodology
- End-to-end evaluations conducted at the SDU
(sentence) level - Multiple bilingual graders compare the input with
translated output and assign a grade of Perfect,
OK or Bad - OK meaning of SDU comes across
- Perfect OK fluent output
- Bad translation incomplete or incorrect
33C-STAR 1999 Evaluation Results
34Evaluation - Progress Over Time
35Current and Future Research
- Expanding the domains of coverage
- Machine Learning-based approaches to analysis
hybrid rule/stat analysis approach, grammar
induction - Multiple interfaces web, phone, PDAs
- Integration of multiple MT approaches into a MEMT
system - Disambiguation improved sentence-level
disambiguation applying discourse contextual
information for disambiguation
36Students Working on the Project
- Chad Langley Hybrid Rule/Stat analyzer
- Benjamin Han Grammar Induction
- Stan Jou Phone interfaces and recognizer
- Alicia Tribble Language portability
- Kornel Laskowski H323 Speech Recognizer
37The C-STAR/Nespole!/LingWear Team
- Project Leaders Lori Levin, Alon Lavie, Alex
Waibel, Bob Frederking, Tanja Schultz - Grammar and Component Developers Donna
Gates, Dorcas Wallace, Kay Peterson, Chad
Langley, Benjamin Han, Alicia Tribble, Kornel
Laskowski, Stan Jou, Celine Morel, Susie Burger