Research Challenges for Spoken Language Dialog Systems - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Research Challenges for Spoken Language Dialog Systems

Description:

Resolving 'it' - anaphora. Another follow-up by the user, 'How about ...restaurant? ... Anaphora resolution approach: Use focus mechanism, assuming conversation ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 34
Provided by: isipPico
Category:

less

Transcript and Presenter's Notes

Title: Research Challenges for Spoken Language Dialog Systems


1
Research Challenges for Spoken Language Dialog
Systems
  • Julie Baca, Ph.D.
  • Center for Advanced Vehicular Systems
  • Mississippi State University
  • Computer Science Graduate Seminar
  • November 27, 2002

2
Overview
  • Define dialog systems
  • Describe research issues
  • Present current work
  • Give conclusions and discuss future work

3
What is a Dialog System?
  • Current commercial voice products require
    adherence to command and control language,
    e.g.,
  • User Plan Route
  • Such interfaces are not robust to variations from
    the fixed words and phrases.

4
What is a Dialog System?
  • Dialog systems seek to provide a natural
    conversational interaction between the user and
    the computer system, e.g.,
  • User Is there a way I can get to Canal Street
    from here?

5
Domains for Dialog Systems
  • Travel reservation
  • Weather forecasting
  • In-vehicle driver assistance
  • On-line learning environments

6
Dialog Systems Information Flow
  • Must model two-way flow of information
  • User-to-system
  • System-to-user

7
  • Dialog System

Dialog Manager
NLP
Speech Recognition
Application Database
TTS
Response Generation
8
Research Issues
  • Many fundamental problems must be
  • solved for these systems to mature.
  • Three general areas include
  • Automatic Speech Recognition (ASR)
  • Natural Language Processing (NLP)
  • Human-computer Interaction (HCI)

9
NLP Issue for Dialog Systems Semantics
  • Must assess meaning, not just syntactic
    correctness.
  • Therefore, must handle ungrammatical inputs,
    e.g.,
  • The nearest .....station is is there a gas
    station nearby?

10
NLP Issue Semantic Representation 1
  • For NLP, use semantic grammars
  • Semantic frame with slots and fillers
  • ltdestinationgt -gt ltprepgt ltplacegt
  • ltprepgt-gt nearest
  • ltplacegt-gt gas station

11
NLP Issue Semantic Representation 2
  • Must also represent
  • How do I get from Canal Street to Royal Street?
  • ltdirectionsgt -gt ltstartgt ltdestinationgt
  • ltdestinationgt -gt ltprepgtltplacegt
  • ltplacegt -gt ltstreet_namegt ltbusinessgt
  • ltstreet_namegt-gt Canal St Royal St
  • ltprepgt -gt ltto_prepgtltnear_prepgt
  • ltnear-prepgt -gt nearestclosest

12
NLP Issue Semantic Representation 3
  • Two Approaches
  • Hand-craft the grammar for the application, using
    robust parsing to understand meaning 1,2.
  • Problem time, expense
  • Use statistical approach, generating initial
    rules and using annotated tree-banked data to
    discover the full rule set 3,4.
  • Problem annotated training data

13
ASR/NLP Issue Reducing Errors
  • Most systems use a loose coupling of ASR and NLP.
  • Try earlier integration of semantics with
    recognizer.
  • Incorporate dialog state into underlying
    statistical model.
  • Problems
  • Increases search space
  • Training Data

14
NLP Issue Resolving Meaning Using Context
  • Must maintain knowledge of the conversational
    context.
  • After request for nearest gas station, user says,
    What is it close to?
  • Resolving it - anaphora
  • Another follow-up by the user,
  • How about restaurant?
  • Resolving with nearest- ellipsis

15
Resolving Meaning Discourse Analysis
  • To resolve such requests, system must track
    context of the conversation.
  • This is typically handled by a discourse analysis
    component in the Dialog Manager.

16
Dialog Manager Discourse Analysis
  • Anaphora resolution approach Use focus
    mechanism, assuming conversation has focus 5.
  • For our example, gas station is current focus.
  • But how about
  • Im at Food Max. How do I get to a gas station
    close to it and a video store close to it?
  • Problem Resolving the two its.

17
Dialog System
Discourse Analysis
NLP
Speech Recognition
Dialog Manager
Application Database
Response Generation
TTS
18
Dialog Manager Clarification
  • Often cannot satisfy request in one iteration.
  • The previous example may require clarification
    from the user,
  • Do you want to go to the gas station first?

19
HCI IssueSystem vs. User Initiative
  • What level of control do you provide user in the
    conversation?

Initiative
Computer
Human
C "Please say departure city"
U"Tell me how to get to the Hilton."
20
Mixed Initiative
  • Total system initiative provides low usability.
  • Total user initiative introduces higher error
    rate.
  • Thus, mixed initiative approach, balancing
    usability and error rate, is taken most often.
  • Allowing user to adapt the level explicitly has
    also shown merit 6.

21
ASR/HCI IssueError Handling
  • How to handle possible errors?
  • Assign confidence score to result of recognizer.
  • For results with lower confidence score, request
    clarification or revert to system-oriented
    initiative.
  • Can incorporate dialog state in computing
    confidence score 7.

22
HCI Issue Response Generation
  • How to present response to user in a way that
    minimizes cognitive load?
  • Varies depending on whether output is speech-only
    or speech /visual.
  • Speech-only output must respect user short-term
    memory limitations, e.g., lists must be short,
    timed appropriately, and allow repetition.
  • Speech/visual output must be complimentary, e.g.,
    importance of redundancy and timing.

23
HCI Issue Evaluating Dialog Systems
  • How to compare and evaluate dialog systems?
  • PARADISE
  • (Paradigm for Dialog Systems Evaluation)
    provides a standard framework 8.

24
PARADISE Evaluating Dialog Systems
  • Task success
  • Was the necessary information exchanged?
  • Efficiency/Cost
  • Number dialog turns, task completion time
  • Qualitative
  • ASR rejections, timeouts, helps
  • Usability
  • User satisfaction with ASR, task ease,
    interaction pace, system response

25
Current Work
  • Sponsored by CAVS
  • Examining
  • In-vehicle Environment
  • Manufacturing Environment
  • Multidisciplinary Team
  • CS , ECE, IE
  • Baca, Picone, Duffy
  • ECE graduate students
  • Hualin Gao, Zheng Feng

26
Current Work In-vehicle Dialog System
  • Specific ASR Issues for In-vehicle Environment
  • Real-time performance
  • Noise cancellation

27
Current Work In-vehicle Dialog System
  • Other Significant Issues
  • Reducing error rate
  • Graceful error handling and mixed initiative
    strategy
  • Response generation to reduce user cognitive load
  • Evaluation

28
Current Work In-vehicle Dialog System
  • Approach
  • Develop prototype in-vehicle system
  • Initial focus on ASR and NLP issues
  • Integrate real-time recognizer 9
  • Employ noise-cancellation techniques 10
  • Use semantic grammar for NLP
  • Examine tighter integration of ASR and NLP
  • Incorporate dialog state in underlying
    statistical models for ASR

29
Current Work In-vehicle Dialog System
  • Second phase, focus on
  • Response generation
  • Mixed initiative strategies
  • Evaluation

30
Current Work Workforce Training Dialog System
  • Significant issues in manufacturing environment
  • Recognition issues
  • Real-time performance
  • Noisy environments
  • Understanding issues
  • Multimodal interface for reducing error rate,
    e.g., voice and pen 11.
  • HCI/Human Factors Issues
  • Response generation to integrate speech and
    visual output

31
Research Significance
  • Advance the development of dialog systems
    technology through addressing fundamental issues
    as they arise in the automotive domains.
  • Potential areas ASR, NLP, HCI

32
References
  • 1 S.J. Young and C.E. Proctor, The design and
    implementation of dialogue control in voice
    operated database inquiry systems, Computer
    Speech and Language, Vol.3, no. 4, pp. 329-353,
    1992.
  • 2 W. Ward, Understanding spontaneous speech,
    in Proceedings of International Conference on
    Acoustics, Speech and Signal Processing, Toronto,
    Canada, 1991, pp. 365-368.
  • 3 R. Pieraccini and E. Levin, Stochastic
    representation of semantic structure for speech
    understanding, Speech Communication, vol. 11.,
    no.2, pp. 283-288, 1992.
  • 4 Y. Wang and A. Acero, Evaluation of spoken
    grammar learning in the atis domain, in
    Proceedings International Conference on
    Acoustics, Speech, and Signal Processing,
    Orlando, Florida, 2002.
  • 5 C. Sidner, Focusing in the comprehension of
    definite anaphora, in Computational Model of
    Discourse, M. Brady, Berwick, R., eds, 1983,
    Cambridge, MA, pp. 267-330, The MIT Press.
  • 6 D. Littman and S. Pan, Empirically
    evaluating an adaptable spoken language dialog
    system, in The Proceedings of International
    Conference on User Modeling, UM 99, Banff,
    Canada, 1999.
  • 7 S. Pradham and W. Ward, Estimating Semantic
    Confidence for Spoken Dialogue Systems,
    Proceedings of the IEEE International Conference
    on Acoustics, Speech, and Signal Processijng
    (ICASSP-2002), Orlando, Florida, USA, May 2002.

33
References
  • 8 M. Walker, et al., PARADISE A Framework for
    Evaluating Spoken Dialogue Agents, Proceedings
    of the 35th Annual Meeting of the Association for
    Computational Linguistics (ACL-97), pp. 271-289,
    1997.
  • 9 F. Zheng, J. Hamaker, F. Goodman, B. George,
    N. Parihar, and J. Picone,
  • The ISIP 2001 NRL Evaluation for
    Recognition of Speech in Noisy Environments,
    presented at the Speech In Noisy Environments
    (SPINE) Workshop, Orlando, Florida, USA, November
    2001.
  • 10 F. Zheng and J. Picone, "Robust Low
    Perplexity Voice Interfaces, MITRE
    Corporation, December 31, 2001.
  • 11 S. Oviatt, Taming Speech Recognition Errors
    within a Multimodal Interface, Communications
    of the ACM, Sept. 2000, 43 (9), 45-51 (special
    issue on "Conversational Interfaces").
Write a Comment
User Comments (0)
About PowerShow.com