SpeechtoSpeech MT Design and Engineering - PowerPoint PPT Presentation

About This Presentation

Title:

SpeechtoSpeech MT Design and Engineering

Description:

Focus on making travel arrangements for a personal leisure trip (not business) ... to sub-domains (Hotel, Transportation, Sights, General Travel, Cross Domain) ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 48

Provided by: AlonL

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: SpeechtoSpeech MT Design and Engineering

1
Speech-to-Speech MTDesign and Engineering

Alon Lavie and Lori Levin
MT Class
April 16 2001

2
Outline

Design and Engineering of the JANUS
speech-to-speech MT system
The Travel Medical Domain Interlingua (IF)
Portability to new domains ML approaches
Evaluation and User Studies
Open Problems, Current and Future Research

3
Overview

Fundamentals of our approach
System overview
Engineering a multi-domain system
Evaluations and user studies
Alternative translation approaches
Current and future research

4
JANUS Speech Translation

Translation via an interlingua representation
Main translation engine is rule-based
Semantic grammars
Modular grammar design
System engineered for multiple domains
Recent focus on domain portability
using machine learning for rapid extension to a
new domain

5
The C-STAR Travel Planning Domain

General Scenario
Dialogue between one traveler and one or more
travel agents
Focus on making travel arrangements for a
personal leisure trip (not business)
Free spontaneous speech

6
The C-STAR Travel Planning Domain

Natural breakdown into several sub-domains
Hotel Information and Reservation
Transportation Information and Reservation
Information about Sights and Events
General Travel Information
Cross Domain

7
Semantic Grammars

Describe structure of semantic concepts instead
of syntactic constituency of phrases
Well suited for task-oriented dialogue containing
many fixed expressions
Appropriate for spoken language - often disfluent
and syntactically ill-formed
Faster to develop reasonable coverage for limited
domains

8
Semantic Grammars

Hotel Reservation Example
Input we have two hotels available
Parse Tree
give-informationavailabilityhotel
(we have hotel-type
(quantity (two)
hotel (hotels)
available)

9
The JANUS-III Translation System
10
The JANUS-III Translation System
11
The SOUP Parser

Specifically designed to parse spoken language
using domain-specific semantic grammars
Robust - can skip over disfluencies in input
Stochastic - probabilistic CFG encoded as a
collection of RTNs with arc probabilities
Top-Down - parses from top-level concepts of the
grammar down to matching of terminals
Chart-based - dynamic matrix of parse DAGs
indexed by start and end positions and head cat

12
The SOUP Parser

Supports parsing with large multiple domain
grammars
Produces a lattice of parse analyses headed by
top-level concepts
Disambiguation heuristics rank the analyses in
the parse lattice and select a single best path
through the lattice
Graphical grammar editor

13
SOUP Disambiguation Heuristics

Maximize coverage (of input)
Minimize number of parse trees (fragmentation)
Minimize number of parse tree nodes
Minimize the number of wild-card matches
Maximize the probability of parse trees
Find sequence of domain tags with maximal
probability given the input words P(TW), where
T t1,t2,,tn is a sequence of domain tags

14
JANUS Generation Modules

Two alternative generation modules
Top-Down context-free based generator - fast,
used for English and Japanese
GenKit - unification-based generator augmented
with Morphe morphology module - used for German

15
Modular Grammar Design

Grammar development separated into modules
corresponding to sub-domains (Hotel,
Transportation, Sights, General Travel, Cross
Domain)
Shared core grammar for lower-level concepts that
are common to the various sub-domains (e.g.
times, prices)
Grammars can be developed independently (using
shared core grammar)
Shared and Cross-Domain grammars significantly
reduce effort in expanding to new domains
Separate grammar modules facilitate associating
parses with domain tags - useful for multi-domain
integration within the parser

16
Translation with Multiple Domain Grammars

Parser is loaded with all domain grammars
Domain tag attached to grammar rules of each
domain
Previously developed grammars for other domains
can also be incorporated
Parser creates a parse lattice consisting of
multiple analyses of the input into sequences of
top-level domain concepts
Parser disambiguation heuristics rank the
analyses in the parse lattice and select a single
best sequence of concepts

17
Translation with Multiple Domain Grammars
18
A SOUP Parse Lattice
19
Domain Portability Travel to Medical
Knowledge-Based Methods Re-usability of knowledge
sources for translation and speech recognition
Corpus-Based Methods Reduce the amount of new
training data for translation and speech
recognition
20
Background

New domain Medical
Doctor-patient diagnostic conversations
Global importance in emergencies and in machine
translation for remote health care
Synergy with Lincoln Lab
Joint evaluation
Joint interlingua
Test case for portability

21
Portability

Advantage Interlingua
Problem Writing semantic grammars
Domain dependent
Requires time, effort, and expertise
Approach
Grammar modularity
Domain action learning
Automatic/Interactive semantic grammar induction

22
Hybrid Stat/Rule-based Analysis

Developing large coverage semantic analysis
grammars is time consuming ? difficult to port
analysis system to new domains
low-level argument grammars are more
domain-independent contain many concepts that
are used across domains time, location, prices,
etc.
high-level domain-actions are domain-specific,
must be redeveloped for each new domain
give-infoonsetsymptom
Tagging data sets with interlingua
representations is less time consuming, needed
anyway for system development

23
Hybrid Rule/Stat Approach

Combines grammar-based and statistical approaches
to analysis
Develop semantic grammars for phrase-level
arguments that are more portable to new domains
Use statistical machine learning techniques for
classifying into domain-actions
Porting to a new domain requires
developing argument parse rules for new domain
tagging training set with domain-actions for new
domain
training the classifiers for domain-actions on
the tagged data

24
The Hybrid Analysis Process

Parse an utterance for arguments
Segment the utterance into sentences
Extract features from the utterance and the
single best parse output
Use a learned classifier to identify the speech
act
Use a learned classifier to identify the concept
sequence
Combine into a full parse

25
Argument Parsing

The SOUP parser produces a forest of parse trees
that cover as much of the input as possible
The parse forest can be a mixture of trees
allowed by any of the grammars
Only the best parse is used for further processing

26
Argument Parse Example
We have a double room available for you at
twenty-three thousand five hundred
yen availabilityPSD ( we have
super_room-type ( room-type ( a
roomdouble ( double room ) ) ) available
) arg-partyfor-whomARG ( for you ( you )
) argtimeARG ( point ( at
hour-minute ( bighour ( big23 (
twenty-three ) ) ) ) ) argsuper_priceARG (
price ( one-pricemain-quantity (
n-1000 ( thousand ) pricen-100 ( five
hundred ) ) currency ( yen ( yen ) ) ) )
27
Automatic Classification of Domain Actions

Train classifiers for speech acts and concepts
Training data Utterances labeled with speech
act, concepts, and best argument parse
Input features
n most common words
Arguments and pseudo-arguments in best parse
Speaker
Predicted speech act (for concept classifier)

28
Full Parse Example
We have a double room available for you at
twenty-three thousand five hundred
yen give-informationavailabilityroom
( availabilityPSD ( we have
super_room-type ( room-type ( a
roomdouble ( double room ) ) ) available
) arg-partyfor-whomARG ( for you ( you )
) argtimeARG ( point ( at
hour-minute ( bighour ( big23 (
twenty-three ) ) ) ) ) argsuper_priceARG (
price ( one-pricemain-quantity (
n-1000 ( thousand ) pricen-100 ( five
hundred ) ) currency ( yen ( yen ) ) )
) )
29
Classification Results UsingMemory-based (TiMBL)
Classifiers
30
Status and Open Research

Preliminary analysis engine implemented,
currently used for travel domain in NESPOLE!
Areas for further research and development
Explore a variety of classifiers
Explore features for domain-action classification
Classification compositionality how to
claissify the components of the domain-action
separately and combine them?
Taking advantage of additional knowledge sources
the interlingua specification, dialogue context
Better address segmentation of utterance into DAs

31
Automatic Induction of Semantic Grammars

Seed grammar for a new domain has very limited
coverage
Corpus of development data tagged with
interlingua representations available
Expand the seed grammar by learning new rules for
covering the same domain-actions
First step how well can we do with no human
intervention?

32
Outline of Semantic Grammar Induction
Parser
IF
Tree Matching
Linearization
Hypotheses Generation
Rules Management
Seed Grammar
sgionsetsym ( manner sym-loc
became adjsym-name )
Rules Induction
Knowledge
Learned Grammar
33
Human vs Machine Experiment

Seed grammar
Extended by a human
Extended by automatic semantic grammar induction

34
Seed Grammar
Medical
I have a burning sensation in my foot.
Cross Domain
Hello. My name is Sam.
Medical
Around 200 rules
Around 600 rules and growing

Shared Around 100
rules and 6000 lexical items
35
A Parse Tree
request-informationexistencebody-stateMED
( WH-PHRASESXDM ( qdurationXDM (
durquestionXDM ( how long ) ) )
HAVE-GET-FEELMED ( GET ( have ) ) you
HAVE-GET-FEELMED ( HAS ( had ) )
super_body-state-specMED (
body-state-specMED ( ID-WHOSEMED
( identifiability (
idnon-distant ( this ) ) )
BODY-STATEMED ( painMED ( pain ) ) ) ) )
36
Manual Grammar Development

About five additional days of development after
the seed grammar was finalized
Focusing on medical rules only
Domain-independent rules remain untouched

37
Development and evaluation sets

Development set 133 sentences
from one dialog
Evaluation set 83 sentences
from two dialogs
unseen speakers
Only SDUs that could be manually tagged with a
full IF according to the current specification
were included.

38
Grading Procedure Recall and Precision of IF
Components

cgive-information speech act
existencebody-state concepts
(body-state-spec(pain, top-level argument
identifiabilityno), sub-argument
body-location top-level argument
(insidehead)) sub-argument
Recall
ignored if number of items is 0
Precision
ignored if 0 out of 0

39
Human vs. Machine Evaluation Results
40
User Studies

We conducted three sets of user tests
Travel agent played by experienced system user
Traveler is played by a novice and given five
minutes of instruction
Traveler is given a general scenario - e.g., plan
a trip to Heidelberg
Communication only via ST system, multi-modal
interface and muted video connection
Data collected used for system evaluation, error
analysis and then grammar development

41
System Evaluation Methodology

End-to-end evaluations conducted at the SDU
(sentence) level
Multiple bilingual graders compare the input with
translated output and assign a grade of Perfect,
OK or Bad
OK meaning of SDU comes across
Perfect OK fluent output
Bad translation incomplete or incorrect

42
August-99 Evaluation

Data from latest user study - traveler planning a
trip to Japan
132 utterances containing one or more SDUs, from
six different users
SR word error rate 14.7
40.2 of utterances contain recognition error(s)

43
Evaluation Results
44
Evaluation - Progress Over Time
45
Current and Future Work

Expanding the interlingua covering descriptive
as well as task-oriented sentences
Developing the new portable approaches
development of the server-based architecture for
supporting multiple applications
NESPOLE! speech-MT for advanced e-commerce
C-STAR speech-to-speech MT over mobile phones
LingWear MT and language assistance on wearable
devices

46
Students Working on the Project

Chad Langley Hybrid Rule/Stat Analysis, Speech
MT architecture
Ben Han Automatic Grammar Induction
Alicia Tribble Interlingua and grammar
development for Medical Domain
Joy Zhang, Erik Peterson Chinese EBMT for
LingWear

47
The JANUS Speech-MT Team

Project Leaders Lori Levin, Alon Lavie, Tanja
Schultz, Alex Waibel
Grammar and Component Developers Donna Gates,
Dorcas Wallace, Kay Peterson, Alicia Tribble,
Chad Langley, Ben Han, Celine Morel, Susie
Burger, Vicky MacLaren, Kornel Laskowski, Erik
Peterson

Write a Comment

User Comments (0)