Speech recognition, understanding and conversational interfaces - PowerPoint PPT Presentation

About This Presentation

Title:

Speech recognition, understanding and conversational interfaces

Description:

ATIS system. air travel information retrieval. context management. film clip ... create an itinerary using air schedule, hotel and car information ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 34

Provided by: alexander5

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Speech recognition, understanding and conversational interfaces

1
Speech recognition, understanding and
conversational interfaces

Alexander Rudnicky
School of Computer Science
http//www.cs.cmu.edu/air

2
Outline

Speech
Types of speech interfaces
Speech systems and their structure
Designing speech interfaces
Some applications
SpeechWear
Communicator

3
Speech as a signal

The difference between speech and sound
CD quality vs. intelligible quality
high-quality is 44.1 / 48 kHz
desirable speech bandwidth 0-8kHz, 16bits
at 16bits/sample 256kbps (tethered mic)
telephone 64kbps (and lower)
Compression
MPEG 64kbps/channel and up (but not
speech-optimal)
CELP 16kbps 2.4kbps (optimized for speech)

4
Speech for communication

The difference between speech and language
Speech recognition and speech understanding

5
Computers and speech

Transcription
dictation, information retrieval
Command and control
data entry, device control, navigation
Information access
airline schedules, stock quotes
Problem solving
travel planning, logistics

6
Speech system architecture

SIGNAL PROCESSING
DECODING
UNDERSTANDING
DISCOURSE
ACTION

7
Varieties of speech systems
8
A generic speech system
speech
9
Decoding speech
Acoustic models
Language models
Corpus-base statistical models
10
Creating models for recognition
Speech data
Acoustic models
Train
Transcribe
Text data
Language models
Train
11
Understanding speech
Grammar
Ontology design, language acquisition
Parser

Extract semantic content from utterance

Post parser

Introduce context and world knowledge into
interpretation

Context
Domain Agents
Grounding, knowledge engineering
12
Interacting with the user
Task schemas
Task analysis
Context
Dialog manager

Guide interaction through task
Map user inputs and system state into actions

Domain agent

Interact with back-end(s)
Interpret information using domain knowledge

Domain agent
Domain agent
Database
Live data (e.g. Web)
Domain expert
Knowledge engineering
13
Communicating with the user
Language Generator

Decide what to say to user (and how to phrase it)

Speech synthesizer
Display Generator
Action Generator
14
Speech recognition and understanding

Sphinx system
speaker-independent
continuous speech
large vocabulary
ATIS system
air travel information retrieval
context management
film clip

15
Command and control systems

Small vocabularies, fixed syntax
OPEN WINDOW ltwindow_idgt
MOVE OBJECT ltobject_idgt to ltpositiongt
Applications
data entry (e.g., zip codes), process control
(e.g., electron microscope, darkroom equipment)
Large vocabulary, fixed syntax
Web browsing (?)

16
SpeechWear

Vehicle inspection task
USMC mechanics, fixed inspection form
Wearable computer (COTS components)
html-based task representation
film clip

17
Information access

Moderate to very large vocabulary
IVR and frame based systems
Commercial systems
Nuance http//www.nuance.com/demo/index.html
SpeechWorks http//www.speechworks.com/demos/demo
s.htm
lots of others..

18
IVR and frame-based systems

Interactive voice response (IVR)
interactions specified by a graph (typically a
tree)
Frame systems
ergodic graphs
states defined by multi-item forms

19
Graph-based systems
Welcome to Bank ABC! Please say one of the
following Balance, Hours, Loan, ...
What type of loan are you interested in? Please
say one of the following Mortgage, Car,
Personal, ...
. . . .
20
Frame-based systems

I would like to fly to Boston
Id like to go to Boston on Friday,
When would you like to fly?

21
Frame-based systems
Zxfgdh_dxab _____ askjs _____ dhe
_____ aa_hgjs_aa _____ . .
Transition on keyword or phrase
Zxfgdh_dxab _____ askjs _____ dhe
_____ aa_hgjs_aa _____ . .
Zxfgdh_dxab _____ askjs _____ dhe
_____ aa_hgjs_aa _____ . .
Zxfgdh_dxab _____ askjs _____ dhe
_____ aa_hgjs_aa _____ . .
Zxfgdh_dxab _____ askjs _____ dhe
_____ aa_hgjs_aa _____ . .
22
Some problems

IVR systems work great, but only for
well-structured ( shallow) tasks
Frame systems are good for tasks that
correspond to a single form leading to an action
Neither approach does well with more complex
problem-solving activities

23
Dialog Systems

Problem solving activity complex task
Order of progression through task depends on user
goals (which can change) and system state (a
back-end retrieval) and is not predictable.
Track progress and help task along
mixed-initiative dialog
Discourse phenomena
User expect to converse with the system

24
Carnegie Mellon Communicator

A dialog system that supports complex problem
solving in a travel planning domain
create an itinerary using air schedule, hotel and
car information
186 U.S. airports (gt140k enplanements/yr)
currently gt500 world airports
Web-based data resources
Live and cached flight information
Airport, airline, etc. information

25
Value schema/handlers
transform
value
receptors
Domain Agent
26
Compound schema
transform
value

e.g. SQL query
Domain Agent
27
Schema ordering
Schema i
Value i
Schema j
Value j
Schema k
Value k
transform
Value
28
Carnegie Mellon Communicator