SpeechBuilder: Facilitating Spoken Dialogue System Creation

About This Presentation

Title:

SpeechBuilder: Facilitating Spoken Dialogue System Creation

Description:

SpeechBuilder aims to help novices rapidly create speech-based systems ... For a speech-based interface to ... city: Boston, New York... day: Monday, Tuesday ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 17

Provided by: jimg150

Learn more at: http://people.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: SpeechBuilder: Facilitating Spoken Dialogue System Creation

1
SpeechBuilder Facilitating Spoken Dialogue
System Creation

Eugene Weinstein
Project Oxygen Core Team
MIT Laboratory for Computer Science
ecoder_at_mit.edu

2
Bridging the Experience Gap

Developing robust, mixed-initiative spoken
dialogue systems is difficult
Complex systems can be created by human-language
technology experts

Novice developers must overcome a considerable
technical challenge

SpeechBuilder aims to help novices rapidly create
speech-based systems
Uses intuitive methods for specifying
domain-specific constraints
Automatically configures HLT components using MIT
GALAXY architecture
Leverages future technical advances
Encourages research on portability

3
Baseline Configuration

Gives developer total control over application
functionality

Communication with Galaxy via simple HTTP protocol

Developer Application
4
Modified Baseline Configuration (this class)

Still gives developer total control over
application functionality
Frame Relay server exposes Galaxy meaning
representation to app

Developer Application
5
Database Access Configuration

For a speech-based interface to structured data

No programming required specify table(s) and
constraints

6
Creating a Speech-Based Application
Step 1 Off-line creation and compilation
Step 2 On-line deployment
7
Human Language Technologies
8
Extracting Database Information
What is the phone number for Victor Zue?

Some columns are used to access entries (e.g.,
Name)
Column entries must be incorporated into ASR
NLU
Some columns are only used in responses (e.g.,
Phone)
Column names must be incorporated into ASR NLU

9
Knowledge Representation

Concepts and actions form basis for understanding
Concepts become key/value entries in meaning
representation
city Boston, New York day Monday, Tuesday
Actions provide sentence-level patterns of
specific queries
I want to fly from Boston to Taipei
actionlookup_flight
Action text can be bracketed to define
hierarchical concepts
I want to fly source(from Boston)
destination(to Taipei)
sourceBoston destinationTaipei
Concepts and actions used to configure the
following components
Speech Recognition
Natural Language Understanding
Discourse
Database columns define basic concepts
Column names can be grouped into concepts
property phone, email weather snow, rain

10
Language Modeling and Understanding

By default, concepts are used for language
modeling, parsing grammar, and meaning
representation

Will it snow?
weather snow

Concept usage can be fine-tuned to improve
performance

For language modeling and parsing grammar only
(i.e., no meaning)
For keyword spotting only (i.e., no role in
language modeling)
For fine-grained language modeling with coarser
meaning representation

snowfall
sprinkles
snowstorm
breezy
showers
accumulation
snowy
thunderstorm
flurries
blizzard
rainy
rainfall
weather snow
11
Current Status

SpeechBuilder has been operational for over two
years
Used by over 50 developers from MIT and elsewhere
Used in undergraduate classes at MIT and
Georgetown University
ASR capabilities benchmarked against main systems
Achieves same ASR performance as MIT Jupiter
weather information system (6.8 word error rate
on clean data) (phone )
Several prototype systems have been developed
Information about faculty, staff and students at
LCS and AI Labs (phone, email, room, voice
messages, transfer, etc.)
Application to control the various physical items
in a typical office (lights, curtains, TV, VCR,
projector, etc.)
Others include TV schedules, real-time weather
forecasts, hotel and restaurant information etc.
SpeechBuilder used for initial design of many
more complex domains

12
Ongoing and Future Work

Increase sophistication of discourse and dialogue
manager to handle more complex dialogues
Enable finer specification of discourse
capabilities
Add generic capabilities for times, dates, etc.
Incorporate confidence scoring and implement
unsupervised training of acoustic and language
models
Create functionality to allow developers to
create domain-specific concatenative speech
synthesis
Create alternative methods of domain
specifications to streamline development
Advanced developers dont necessarily use web
interface
Allow for more efficient automatic generation of
SpeechBuilder domains

13
Acknowledgements

Issam Bazzi
Scott Cyphers
Ed Filisko
Jim Glass
TJ Hazen
Lee Hetherington
Joe Polifroni
Stephanie Seneff
Michelle Spina
Eugene Weinstein
Jon Yi
Misha Zitser

14
SpeechBuilder Hands-on Activity

Eugene Weinstein
Project Oxygen Core Team
MIT Laboratory for Computer Science
ecoder_at_mit.edu

15
Modified Baseline Configuration (this class)

Still gives developer total control over
application functionality
Frame Relay server exposes Galaxy meaning
representation to app

Jaim
Developer Application
Semantic Frame
16
SpeechBuilder API

Galaxy meaning representation provided through
frame relay
Applications connect via TCP sockets
API provided in Perl, Python, and Java
This class Python API

Galaxy Frame Relay
Python class galaxy.server.Server
TCP Socket
galaxy.frame.Frame methods getAction() getAttribu
te(attr_name) getText() toString()
galaxy.server.Server methods Constructor(machine,
port,ID) connect() processMessage(blocking) discon
nect()
Python class galaxy.frame.Frame
Python API
Application

Write a Comment

User Comments (0)