SpeechtoSpeech MT Design and Engineering - PowerPoint PPT Presentation

About This Presentation
Title:

SpeechtoSpeech MT Design and Engineering

Description:

Focus on making travel arrangements for a personal leisure trip (not business) ... to sub-domains (Hotel, Transportation, Sights, General Travel, Cross Domain) ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 48
Provided by: AlonL
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: SpeechtoSpeech MT Design and Engineering


1
Speech-to-Speech MTDesign and Engineering
  • Alon Lavie and Lori Levin
  • MT Class
  • April 16 2001

2
Outline
  • Design and Engineering of the JANUS
    speech-to-speech MT system
  • The Travel Medical Domain Interlingua (IF)
  • Portability to new domains ML approaches
  • Evaluation and User Studies
  • Open Problems, Current and Future Research

3
Overview
  • Fundamentals of our approach
  • System overview
  • Engineering a multi-domain system
  • Evaluations and user studies
  • Alternative translation approaches
  • Current and future research

4
JANUS Speech Translation
  • Translation via an interlingua representation
  • Main translation engine is rule-based
  • Semantic grammars
  • Modular grammar design
  • System engineered for multiple domains
  • Recent focus on domain portability
  • using machine learning for rapid extension to a
    new domain

5
The C-STAR Travel Planning Domain
  • General Scenario
  • Dialogue between one traveler and one or more
    travel agents
  • Focus on making travel arrangements for a
    personal leisure trip (not business)
  • Free spontaneous speech

6
The C-STAR Travel Planning Domain
  • Natural breakdown into several sub-domains
  • Hotel Information and Reservation
  • Transportation Information and Reservation
  • Information about Sights and Events
  • General Travel Information
  • Cross Domain

7
Semantic Grammars
  • Describe structure of semantic concepts instead
    of syntactic constituency of phrases
  • Well suited for task-oriented dialogue containing
    many fixed expressions
  • Appropriate for spoken language - often disfluent
    and syntactically ill-formed
  • Faster to develop reasonable coverage for limited
    domains

8
Semantic Grammars
  • Hotel Reservation Example
  • Input we have two hotels available
  • Parse Tree
  • give-informationavailabilityhotel
  • (we have hotel-type
  • (quantity (two)
  • hotel (hotels)
  • available)

9
The JANUS-III Translation System
10
The JANUS-III Translation System
11
The SOUP Parser
  • Specifically designed to parse spoken language
    using domain-specific semantic grammars
  • Robust - can skip over disfluencies in input
  • Stochastic - probabilistic CFG encoded as a
    collection of RTNs with arc probabilities
  • Top-Down - parses from top-level concepts of the
    grammar down to matching of terminals
  • Chart-based - dynamic matrix of parse DAGs
    indexed by start and end positions and head cat

12
The SOUP Parser
  • Supports parsing with large multiple domain
    grammars
  • Produces a lattice of parse analyses headed by
    top-level concepts
  • Disambiguation heuristics rank the analyses in
    the parse lattice and select a single best path
    through the lattice
  • Graphical grammar editor

13
SOUP Disambiguation Heuristics
  • Maximize coverage (of input)
  • Minimize number of parse trees (fragmentation)
  • Minimize number of parse tree nodes
  • Minimize the number of wild-card matches
  • Maximize the probability of parse trees
  • Find sequence of domain tags with maximal
    probability given the input words P(TW), where
    T t1,t2,,tn is a sequence of domain tags

14
JANUS Generation Modules
  • Two alternative generation modules
  • Top-Down context-free based generator - fast,
    used for English and Japanese
  • GenKit - unification-based generator augmented
    with Morphe morphology module - used for German

15
Modular Grammar Design
  • Grammar development separated into modules
    corresponding to sub-domains (Hotel,
    Transportation, Sights, General Travel, Cross
    Domain)
  • Shared core grammar for lower-level concepts that
    are common to the various sub-domains (e.g.
    times, prices)
  • Grammars can be developed independently (using
    shared core grammar)
  • Shared and Cross-Domain grammars significantly
    reduce effort in expanding to new domains
  • Separate grammar modules facilitate associating
    parses with domain tags - useful for multi-domain
    integration within the parser

16
Translation with Multiple Domain Grammars
  • Parser is loaded with all domain grammars
  • Domain tag attached to grammar rules of each
    domain
  • Previously developed grammars for other domains
    can also be incorporated
  • Parser creates a parse lattice consisting of
    multiple analyses of the input into sequences of
    top-level domain concepts
  • Parser disambiguation heuristics rank the
    analyses in the parse lattice and select a single
    best sequence of concepts

17
Translation with Multiple Domain Grammars
18
A SOUP Parse Lattice
19
Domain Portability Travel to Medical
Knowledge-Based Methods Re-usability of knowledge
sources for translation and speech recognition
Corpus-Based Methods Reduce the amount of new
training data for translation and speech
recognition
20
Background
  • New domain Medical
  • Doctor-patient diagnostic conversations
  • Global importance in emergencies and in machine
    translation for remote health care
  • Synergy with Lincoln Lab
  • Joint evaluation
  • Joint interlingua
  • Test case for portability

21
Portability
  • Advantage Interlingua
  • Problem Writing semantic grammars
  • Domain dependent
  • Requires time, effort, and expertise
  • Approach
  • Grammar modularity
  • Domain action learning
  • Automatic/Interactive semantic grammar induction

22
Hybrid Stat/Rule-based Analysis
  • Developing large coverage semantic analysis
    grammars is time consuming ? difficult to port
    analysis system to new domains
  • low-level argument grammars are more
    domain-independent contain many concepts that
    are used across domains time, location, prices,
    etc.
  • high-level domain-actions are domain-specific,
    must be redeveloped for each new domain
    give-infoonsetsymptom
  • Tagging data sets with interlingua
    representations is less time consuming, needed
    anyway for system development

23
Hybrid Rule/Stat Approach
  • Combines grammar-based and statistical approaches
    to analysis
  • Develop semantic grammars for phrase-level
    arguments that are more portable to new domains
  • Use statistical machine learning techniques for
    classifying into domain-actions
  • Porting to a new domain requires
  • developing argument parse rules for new domain
  • tagging training set with domain-actions for new
    domain
  • training the classifiers for domain-actions on
    the tagged data

24
The Hybrid Analysis Process
  • Parse an utterance for arguments
  • Segment the utterance into sentences
  • Extract features from the utterance and the
    single best parse output
  • Use a learned classifier to identify the speech
    act
  • Use a learned classifier to identify the concept
    sequence
  • Combine into a full parse

25
Argument Parsing
  • The SOUP parser produces a forest of parse trees
    that cover as much of the input as possible
  • The parse forest can be a mixture of trees
    allowed by any of the grammars
  • Only the best parse is used for further processing

26
Argument Parse Example
We have a double room available for you at
twenty-three thousand five hundred
yen availabilityPSD ( we have
super_room-type ( room-type ( a
roomdouble ( double room ) ) ) available
) arg-partyfor-whomARG ( for you ( you )
) argtimeARG ( point ( at
hour-minute ( bighour ( big23 (
twenty-three ) ) ) ) ) argsuper_priceARG (
price ( one-pricemain-quantity (
n-1000 ( thousand ) pricen-100 ( five
hundred ) ) currency ( yen ( yen ) ) ) )
27
Automatic Classification of Domain Actions
  • Train classifiers for speech acts and concepts
  • Training data Utterances labeled with speech
    act, concepts, and best argument parse
  • Input features
  • n most common words
  • Arguments and pseudo-arguments in best parse
  • Speaker
  • Predicted speech act (for concept classifier)

28
Full Parse Example
We have a double room available for you at
twenty-three thousand five hundred
yen give-informationavailabilityroom
( availabilityPSD ( we have
super_room-type ( room-type ( a
roomdouble ( double room ) ) ) available
) arg-partyfor-whomARG ( for you ( you )
) argtimeARG ( point ( at
hour-minute ( bighour ( big23 (
twenty-three ) ) ) ) ) argsuper_priceARG (
price ( one-pricemain-quantity (
n-1000 ( thousand ) pricen-100 ( five
hundred ) ) currency ( yen ( yen ) ) )
) )
29
Classification Results UsingMemory-based (TiMBL)
Classifiers
30
Status and Open Research
  • Preliminary analysis engine implemented,
    currently used for travel domain in NESPOLE!
  • Areas for further research and development
  • Explore a variety of classifiers
  • Explore features for domain-action classification
  • Classification compositionality how to
    claissify the components of the domain-action
    separately and combine them?
  • Taking advantage of additional knowledge sources
    the interlingua specification, dialogue context
  • Better address segmentation of utterance into DAs

31
Automatic Induction of Semantic Grammars
  • Seed grammar for a new domain has very limited
    coverage
  • Corpus of development data tagged with
    interlingua representations available
  • Expand the seed grammar by learning new rules for
    covering the same domain-actions
  • First step how well can we do with no human
    intervention?

32
Outline of Semantic Grammar Induction
Parser
IF
Tree Matching
Linearization
Hypotheses Generation
Rules Management
Seed Grammar
sgionsetsym ( manner sym-loc
became adjsym-name )
Rules Induction
Knowledge
Learned Grammar
33
Human vs Machine Experiment
  • Seed grammar
  • Extended by a human
  • Extended by automatic semantic grammar induction

34
Seed Grammar
Medical
I have a burning sensation in my foot.
Cross Domain
Hello. My name is Sam.
Medical
Around 200 rules
Around 600 rules and growing

Shared Around 100
rules and 6000 lexical items
35
A Parse Tree
request-informationexistencebody-stateMED
( WH-PHRASESXDM ( qdurationXDM (
durquestionXDM ( how long ) ) )
HAVE-GET-FEELMED ( GET ( have ) ) you
HAVE-GET-FEELMED ( HAS ( had ) )
super_body-state-specMED (
body-state-specMED ( ID-WHOSEMED
( identifiability (
idnon-distant ( this ) ) )
BODY-STATEMED ( painMED ( pain ) ) ) ) )
36
Manual Grammar Development
  • About five additional days of development after
    the seed grammar was finalized
  • Focusing on medical rules only
  • Domain-independent rules remain untouched

37
Development and evaluation sets
  • Development set 133 sentences
  • from one dialog
  • Evaluation set 83 sentences
  • from two dialogs
  • unseen speakers
  • Only SDUs that could be manually tagged with a
    full IF according to the current specification
    were included.

38
Grading Procedure Recall and Precision of IF
Components
  • cgive-information speech act
  • existencebody-state concepts
  • (body-state-spec(pain, top-level argument
  • identifiabilityno), sub-argument
  • body-location top-level argument
  • (insidehead)) sub-argument
  • Recall
  • ignored if number of items is 0
  • Precision
  • ignored if 0 out of 0

39
Human vs. Machine Evaluation Results
40
User Studies
  • We conducted three sets of user tests
  • Travel agent played by experienced system user
  • Traveler is played by a novice and given five
    minutes of instruction
  • Traveler is given a general scenario - e.g., plan
    a trip to Heidelberg
  • Communication only via ST system, multi-modal
    interface and muted video connection
  • Data collected used for system evaluation, error
    analysis and then grammar development

41
System Evaluation Methodology
  • End-to-end evaluations conducted at the SDU
    (sentence) level
  • Multiple bilingual graders compare the input with
    translated output and assign a grade of Perfect,
    OK or Bad
  • OK meaning of SDU comes across
  • Perfect OK fluent output
  • Bad translation incomplete or incorrect

42
August-99 Evaluation
  • Data from latest user study - traveler planning a
    trip to Japan
  • 132 utterances containing one or more SDUs, from
    six different users
  • SR word error rate 14.7
  • 40.2 of utterances contain recognition error(s)

43
Evaluation Results
44
Evaluation - Progress Over Time
45
Current and Future Work
  • Expanding the interlingua covering descriptive
    as well as task-oriented sentences
  • Developing the new portable approaches
  • development of the server-based architecture for
    supporting multiple applications
  • NESPOLE! speech-MT for advanced e-commerce
  • C-STAR speech-to-speech MT over mobile phones
  • LingWear MT and language assistance on wearable
    devices

46
Students Working on the Project
  • Chad Langley Hybrid Rule/Stat Analysis, Speech
    MT architecture
  • Ben Han Automatic Grammar Induction
  • Alicia Tribble Interlingua and grammar
    development for Medical Domain
  • Joy Zhang, Erik Peterson Chinese EBMT for
    LingWear

47
The JANUS Speech-MT Team
  • Project Leaders Lori Levin, Alon Lavie, Tanja
    Schultz, Alex Waibel
  • Grammar and Component Developers Donna Gates,
    Dorcas Wallace, Kay Peterson, Alicia Tribble,
    Chad Langley, Ben Han, Celine Morel, Susie
    Burger, Vicky MacLaren, Kornel Laskowski, Erik
    Peterson
Write a Comment
User Comments (0)
About PowerShow.com