Title: SpeechtoSpeech MT Design and Engineering
1Speech-to-Speech MTDesign and Engineering
- Alon Lavie and Lori Levin
- MT Class
- April 16 2001
2Outline
- Design and Engineering of the JANUS
speech-to-speech MT system - The Travel Medical Domain Interlingua (IF)
- Portability to new domains ML approaches
- Evaluation and User Studies
- Open Problems, Current and Future Research
3Overview
- Fundamentals of our approach
- System overview
- Engineering a multi-domain system
- Evaluations and user studies
- Alternative translation approaches
- Current and future research
4JANUS Speech Translation
- Translation via an interlingua representation
- Main translation engine is rule-based
- Semantic grammars
- Modular grammar design
- System engineered for multiple domains
- Recent focus on domain portability
- using machine learning for rapid extension to a
new domain
5The C-STAR Travel Planning Domain
- General Scenario
- Dialogue between one traveler and one or more
travel agents - Focus on making travel arrangements for a
personal leisure trip (not business) - Free spontaneous speech
6The C-STAR Travel Planning Domain
- Natural breakdown into several sub-domains
- Hotel Information and Reservation
- Transportation Information and Reservation
- Information about Sights and Events
- General Travel Information
- Cross Domain
7Semantic Grammars
- Describe structure of semantic concepts instead
of syntactic constituency of phrases - Well suited for task-oriented dialogue containing
many fixed expressions - Appropriate for spoken language - often disfluent
and syntactically ill-formed - Faster to develop reasonable coverage for limited
domains
8Semantic Grammars
- Hotel Reservation Example
- Input we have two hotels available
- Parse Tree
- give-informationavailabilityhotel
- (we have hotel-type
- (quantity (two)
- hotel (hotels)
- available)
9The JANUS-III Translation System
10The JANUS-III Translation System
11The SOUP Parser
- Specifically designed to parse spoken language
using domain-specific semantic grammars - Robust - can skip over disfluencies in input
- Stochastic - probabilistic CFG encoded as a
collection of RTNs with arc probabilities - Top-Down - parses from top-level concepts of the
grammar down to matching of terminals - Chart-based - dynamic matrix of parse DAGs
indexed by start and end positions and head cat
12The SOUP Parser
- Supports parsing with large multiple domain
grammars - Produces a lattice of parse analyses headed by
top-level concepts - Disambiguation heuristics rank the analyses in
the parse lattice and select a single best path
through the lattice - Graphical grammar editor
13SOUP Disambiguation Heuristics
- Maximize coverage (of input)
- Minimize number of parse trees (fragmentation)
- Minimize number of parse tree nodes
- Minimize the number of wild-card matches
- Maximize the probability of parse trees
- Find sequence of domain tags with maximal
probability given the input words P(TW), where
T t1,t2,,tn is a sequence of domain tags
14JANUS Generation Modules
- Two alternative generation modules
- Top-Down context-free based generator - fast,
used for English and Japanese - GenKit - unification-based generator augmented
with Morphe morphology module - used for German
15Modular Grammar Design
- Grammar development separated into modules
corresponding to sub-domains (Hotel,
Transportation, Sights, General Travel, Cross
Domain) - Shared core grammar for lower-level concepts that
are common to the various sub-domains (e.g.
times, prices) - Grammars can be developed independently (using
shared core grammar) - Shared and Cross-Domain grammars significantly
reduce effort in expanding to new domains - Separate grammar modules facilitate associating
parses with domain tags - useful for multi-domain
integration within the parser
16Translation with Multiple Domain Grammars
- Parser is loaded with all domain grammars
- Domain tag attached to grammar rules of each
domain - Previously developed grammars for other domains
can also be incorporated - Parser creates a parse lattice consisting of
multiple analyses of the input into sequences of
top-level domain concepts - Parser disambiguation heuristics rank the
analyses in the parse lattice and select a single
best sequence of concepts
17Translation with Multiple Domain Grammars
18A SOUP Parse Lattice
19Domain Portability Travel to Medical
Knowledge-Based Methods Re-usability of knowledge
sources for translation and speech recognition
Corpus-Based Methods Reduce the amount of new
training data for translation and speech
recognition
20Background
- New domain Medical
- Doctor-patient diagnostic conversations
- Global importance in emergencies and in machine
translation for remote health care - Synergy with Lincoln Lab
- Joint evaluation
- Joint interlingua
- Test case for portability
21Portability
- Advantage Interlingua
- Problem Writing semantic grammars
- Domain dependent
- Requires time, effort, and expertise
- Approach
- Grammar modularity
- Domain action learning
- Automatic/Interactive semantic grammar induction
22Hybrid Stat/Rule-based Analysis
- Developing large coverage semantic analysis
grammars is time consuming ? difficult to port
analysis system to new domains - low-level argument grammars are more
domain-independent contain many concepts that
are used across domains time, location, prices,
etc. - high-level domain-actions are domain-specific,
must be redeveloped for each new domain
give-infoonsetsymptom - Tagging data sets with interlingua
representations is less time consuming, needed
anyway for system development
23Hybrid Rule/Stat Approach
- Combines grammar-based and statistical approaches
to analysis - Develop semantic grammars for phrase-level
arguments that are more portable to new domains - Use statistical machine learning techniques for
classifying into domain-actions - Porting to a new domain requires
- developing argument parse rules for new domain
- tagging training set with domain-actions for new
domain - training the classifiers for domain-actions on
the tagged data
24The Hybrid Analysis Process
- Parse an utterance for arguments
- Segment the utterance into sentences
- Extract features from the utterance and the
single best parse output - Use a learned classifier to identify the speech
act - Use a learned classifier to identify the concept
sequence - Combine into a full parse
25Argument Parsing
- The SOUP parser produces a forest of parse trees
that cover as much of the input as possible - The parse forest can be a mixture of trees
allowed by any of the grammars - Only the best parse is used for further processing
26Argument Parse Example
We have a double room available for you at
twenty-three thousand five hundred
yen availabilityPSD ( we have
super_room-type ( room-type ( a
roomdouble ( double room ) ) ) available
) arg-partyfor-whomARG ( for you ( you )
) argtimeARG ( point ( at
hour-minute ( bighour ( big23 (
twenty-three ) ) ) ) ) argsuper_priceARG (
price ( one-pricemain-quantity (
n-1000 ( thousand ) pricen-100 ( five
hundred ) ) currency ( yen ( yen ) ) ) )
27Automatic Classification of Domain Actions
- Train classifiers for speech acts and concepts
- Training data Utterances labeled with speech
act, concepts, and best argument parse - Input features
- n most common words
- Arguments and pseudo-arguments in best parse
- Speaker
- Predicted speech act (for concept classifier)
28Full Parse Example
We have a double room available for you at
twenty-three thousand five hundred
yen give-informationavailabilityroom
( availabilityPSD ( we have
super_room-type ( room-type ( a
roomdouble ( double room ) ) ) available
) arg-partyfor-whomARG ( for you ( you )
) argtimeARG ( point ( at
hour-minute ( bighour ( big23 (
twenty-three ) ) ) ) ) argsuper_priceARG (
price ( one-pricemain-quantity (
n-1000 ( thousand ) pricen-100 ( five
hundred ) ) currency ( yen ( yen ) ) )
) )
29Classification Results UsingMemory-based (TiMBL)
Classifiers
30Status and Open Research
- Preliminary analysis engine implemented,
currently used for travel domain in NESPOLE! - Areas for further research and development
- Explore a variety of classifiers
- Explore features for domain-action classification
- Classification compositionality how to
claissify the components of the domain-action
separately and combine them? - Taking advantage of additional knowledge sources
the interlingua specification, dialogue context - Better address segmentation of utterance into DAs
31Automatic Induction of Semantic Grammars
- Seed grammar for a new domain has very limited
coverage - Corpus of development data tagged with
interlingua representations available - Expand the seed grammar by learning new rules for
covering the same domain-actions - First step how well can we do with no human
intervention?
32Outline of Semantic Grammar Induction
Parser
IF
Tree Matching
Linearization
Hypotheses Generation
Rules Management
Seed Grammar
sgionsetsym ( manner sym-loc
became adjsym-name )
Rules Induction
Knowledge
Learned Grammar
33Human vs Machine Experiment
- Seed grammar
- Extended by a human
- Extended by automatic semantic grammar induction
34Seed Grammar
Medical
I have a burning sensation in my foot.
Cross Domain
Hello. My name is Sam.
Medical
Around 200 rules
Around 600 rules and growing
Shared Around 100
rules and 6000 lexical items
35A Parse Tree
request-informationexistencebody-stateMED
( WH-PHRASESXDM ( qdurationXDM (
durquestionXDM ( how long ) ) )
HAVE-GET-FEELMED ( GET ( have ) ) you
HAVE-GET-FEELMED ( HAS ( had ) )
super_body-state-specMED (
body-state-specMED ( ID-WHOSEMED
( identifiability (
idnon-distant ( this ) ) )
BODY-STATEMED ( painMED ( pain ) ) ) ) )
36Manual Grammar Development
- About five additional days of development after
the seed grammar was finalized - Focusing on medical rules only
- Domain-independent rules remain untouched
37Development and evaluation sets
- Development set 133 sentences
- from one dialog
- Evaluation set 83 sentences
- from two dialogs
- unseen speakers
- Only SDUs that could be manually tagged with a
full IF according to the current specification
were included.
38Grading Procedure Recall and Precision of IF
Components
- cgive-information speech act
- existencebody-state concepts
- (body-state-spec(pain, top-level argument
- identifiabilityno), sub-argument
- body-location top-level argument
- (insidehead)) sub-argument
- Recall
- ignored if number of items is 0
- Precision
- ignored if 0 out of 0
39Human vs. Machine Evaluation Results
40User Studies
- We conducted three sets of user tests
- Travel agent played by experienced system user
- Traveler is played by a novice and given five
minutes of instruction - Traveler is given a general scenario - e.g., plan
a trip to Heidelberg - Communication only via ST system, multi-modal
interface and muted video connection - Data collected used for system evaluation, error
analysis and then grammar development
41System Evaluation Methodology
- End-to-end evaluations conducted at the SDU
(sentence) level - Multiple bilingual graders compare the input with
translated output and assign a grade of Perfect,
OK or Bad - OK meaning of SDU comes across
- Perfect OK fluent output
- Bad translation incomplete or incorrect
42August-99 Evaluation
- Data from latest user study - traveler planning a
trip to Japan - 132 utterances containing one or more SDUs, from
six different users - SR word error rate 14.7
- 40.2 of utterances contain recognition error(s)
43Evaluation Results
44Evaluation - Progress Over Time
45Current and Future Work
- Expanding the interlingua covering descriptive
as well as task-oriented sentences - Developing the new portable approaches
- development of the server-based architecture for
supporting multiple applications - NESPOLE! speech-MT for advanced e-commerce
- C-STAR speech-to-speech MT over mobile phones
- LingWear MT and language assistance on wearable
devices
46Students Working on the Project
- Chad Langley Hybrid Rule/Stat Analysis, Speech
MT architecture - Ben Han Automatic Grammar Induction
- Alicia Tribble Interlingua and grammar
development for Medical Domain - Joy Zhang, Erik Peterson Chinese EBMT for
LingWear
47The JANUS Speech-MT Team
- Project Leaders Lori Levin, Alon Lavie, Tanja
Schultz, Alex Waibel - Grammar and Component Developers Donna Gates,
Dorcas Wallace, Kay Peterson, Alicia Tribble,
Chad Langley, Ben Han, Celine Morel, Susie
Burger, Vicky MacLaren, Kornel Laskowski, Erik
Peterson