Speech recognition, understanding and conversational interfaces - PowerPoint PPT Presentation

About This Presentation
Title:

Speech recognition, understanding and conversational interfaces

Description:

ATIS system. air travel information retrieval. context management. film clip ... create an itinerary using air schedule, hotel and car information ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 34
Provided by: alexander5
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Speech recognition, understanding and conversational interfaces


1
Speech recognition, understanding and
conversational interfaces
  • Alexander Rudnicky
  • School of Computer Science
  • http//www.cs.cmu.edu/air

2
Outline
  • Speech
  • Types of speech interfaces
  • Speech systems and their structure
  • Designing speech interfaces
  • Some applications
  • SpeechWear
  • Communicator

3
Speech as a signal
  • The difference between speech and sound
  • CD quality vs. intelligible quality
  • high-quality is 44.1 / 48 kHz
  • desirable speech bandwidth 0-8kHz, 16bits
  • at 16bits/sample 256kbps (tethered mic)
  • telephone 64kbps (and lower)
  • Compression
  • MPEG 64kbps/channel and up (but not
    speech-optimal)
  • CELP 16kbps 2.4kbps (optimized for speech)

4
Speech for communication
  • The difference between speech and language
  • Speech recognition and speech understanding

5
Computers and speech
  • Transcription
  • dictation, information retrieval
  • Command and control
  • data entry, device control, navigation
  • Information access
  • airline schedules, stock quotes
  • Problem solving
  • travel planning, logistics

6
Speech system architecture
  • SIGNAL PROCESSING
  • DECODING
  • UNDERSTANDING
  • DISCOURSE
  • ACTION

7
Varieties of speech systems
8
A generic speech system
speech
9
Decoding speech
Acoustic models
Language models
Corpus-base statistical models
10
Creating models for recognition
Speech data
Acoustic models
Train
Transcribe
Text data
Language models
Train
11
Understanding speech
Grammar
Ontology design, language acquisition
Parser
  • Extract semantic content from utterance

Post parser
  • Introduce context and world knowledge into
    interpretation

Context
Domain Agents
Grounding, knowledge engineering
12
Interacting with the user
Task schemas
Task analysis
Context
Dialog manager
  • Guide interaction through task
  • Map user inputs and system state into actions

Domain agent
  • Interact with back-end(s)
  • Interpret information using domain knowledge

Domain agent
Domain agent
Database
Live data (e.g. Web)
Domain expert
Knowledge engineering
13
Communicating with the user
Language Generator
  • Decide what to say to user (and how to phrase it)

Speech synthesizer
Display Generator
Action Generator
14
Speech recognition and understanding
  • Sphinx system
  • speaker-independent
  • continuous speech
  • large vocabulary
  • ATIS system
  • air travel information retrieval
  • context management
  • film clip

15
Command and control systems
  • Small vocabularies, fixed syntax
  • OPEN WINDOW ltwindow_idgt
  • MOVE OBJECT ltobject_idgt to ltpositiongt
  • Applications
  • data entry (e.g., zip codes), process control
    (e.g., electron microscope, darkroom equipment)
  • Large vocabulary, fixed syntax
  • Web browsing (?)

16
SpeechWear
  • Vehicle inspection task
  • USMC mechanics, fixed inspection form
  • Wearable computer (COTS components)
  • html-based task representation
  • film clip

17
Information access
  • Moderate to very large vocabulary
  • IVR and frame based systems
  • Commercial systems
  • Nuance http//www.nuance.com/demo/index.html
  • SpeechWorks http//www.speechworks.com/demos/demo
    s.htm
  • lots of others..

18
IVR and frame-based systems
  • Interactive voice response (IVR)
  • interactions specified by a graph (typically a
    tree)
  • Frame systems
  • ergodic graphs
  • states defined by multi-item forms

19
Graph-based systems
Welcome to Bank ABC! Please say one of the
following Balance, Hours, Loan, ...
What type of loan are you interested in? Please
say one of the following Mortgage, Car,
Personal, ...
. . . .
20
Frame-based systems
  • I would like to fly to Boston
  • Id like to go to Boston on Friday,
  • When would you like to fly?

21
Frame-based systems
Zxfgdh_dxab _____ askjs _____ dhe
_____ aa_hgjs_aa _____ . .
Transition on keyword or phrase
Zxfgdh_dxab _____ askjs _____ dhe
_____ aa_hgjs_aa _____ . .
Zxfgdh_dxab _____ askjs _____ dhe
_____ aa_hgjs_aa _____ . .
Zxfgdh_dxab _____ askjs _____ dhe
_____ aa_hgjs_aa _____ . .
Zxfgdh_dxab _____ askjs _____ dhe
_____ aa_hgjs_aa _____ . .
22
Some problems
  • IVR systems work great, but only for
    well-structured ( shallow) tasks
  • Frame systems are good for tasks that
    correspond to a single form leading to an action
  • Neither approach does well with more complex
    problem-solving activities

23
Dialog Systems
  • Problem solving activity complex task
  • Order of progression through task depends on user
    goals (which can change) and system state (a
    back-end retrieval) and is not predictable.
  • Track progress and help task along
  • mixed-initiative dialog
  • Discourse phenomena
  • User expect to converse with the system

24
Carnegie Mellon Communicator
  • A dialog system that supports complex problem
    solving in a travel planning domain
  • create an itinerary using air schedule, hotel and
    car information
  • 186 U.S. airports (gt140k enplanements/yr)
  • currently gt500 world airports
  • Web-based data resources
  • Live and cached flight information
  • Airport, airline, etc. information

25
Value schema/handlers
transform
value
receptors
Domain Agent
26
Compound schema
transform
value

e.g. SQL query
Domain Agent
27
Schema ordering
Schema i
Value i
Schema j
Value j
Schema k
Value k
transform
Value
28
Carnegie Mellon Communicator
  • CMU Communicator
  • Call 268-5144
  • the information is accurate you can use it for
    your own travel planning...

29
User-aware speech interfaces
  • Predictable behavior on the systems part
  • Users coomunicate at different levels
  • http//www.speech.cs.cmu.edu/air/papers/InterfaceC
    hars.html

30
User-aware speech interfaces
  • Content task-centric utterances
  • Possibility What can I do?
  • Orientation Where are we?
  • Navigation moving through the task space
  • Control verbose/terse, listen!
  • Customization define this word

31
Speech interface guidelines
  • Speech recognition is errorful
  • System state is often opaque to the user
  • http//www.speech.cs.cmu.edu/air/papers/SpInGuidel
    ines/SpInGuidelines.html

32
Interface guidelines
  • State transparency
  • Input control
  • Error recovery
  • Error detection
  • Error correction
  • Log performance
  • Application integration

33
Summary
  • Speech and language communication
  • Dialog structure
  • Interface design
Write a Comment
User Comments (0)
About PowerShow.com