Title: SpeechBuilder: Facilitating Spoken Dialogue System Creation
1SpeechBuilder Facilitating Spoken Dialogue
System Creation
- Eugene Weinstein
- Project Oxygen Core Team
- MIT Laboratory for Computer Science
- ecoder_at_mit.edu
2Bridging the Experience Gap
- Developing robust, mixed-initiative spoken
dialogue systems is difficult - Complex systems can be created by human-language
technology experts
- Novice developers must overcome a considerable
technical challenge
- SpeechBuilder aims to help novices rapidly create
speech-based systems - Uses intuitive methods for specifying
domain-specific constraints - Automatically configures HLT components using MIT
GALAXY architecture - Leverages future technical advances
- Encourages research on portability
3Baseline Configuration
- Gives developer total control over application
functionality
- Communication with Galaxy via simple HTTP protocol
Developer Application
4Modified Baseline Configuration (this class)
- Still gives developer total control over
application functionality - Frame Relay server exposes Galaxy meaning
representation to app
Developer Application
5Database Access Configuration
- For a speech-based interface to structured data
- No programming required specify table(s) and
constraints
6Creating a Speech-Based Application
Step 1 Off-line creation and compilation
Step 2 On-line deployment
7Human Language Technologies
8Extracting Database Information
What is the phone number for Victor Zue?
- Some columns are used to access entries (e.g.,
Name) - Column entries must be incorporated into ASR
NLU - Some columns are only used in responses (e.g.,
Phone) - Column names must be incorporated into ASR NLU
9Knowledge Representation
- Concepts and actions form basis for understanding
- Concepts become key/value entries in meaning
representation - city Boston, New York day Monday, Tuesday
- Actions provide sentence-level patterns of
specific queries - I want to fly from Boston to Taipei
actionlookup_flight - Action text can be bracketed to define
hierarchical concepts - I want to fly source(from Boston)
destination(to Taipei) - sourceBoston destinationTaipei
- Concepts and actions used to configure the
following components - Speech Recognition
- Natural Language Understanding
- Discourse
- Database columns define basic concepts
- Column names can be grouped into concepts
- property phone, email weather snow, rain
10Language Modeling and Understanding
- By default, concepts are used for language
modeling, parsing grammar, and meaning
representation
Will it snow?
weather snow
- Concept usage can be fine-tuned to improve
performance
- For language modeling and parsing grammar only
(i.e., no meaning) - For keyword spotting only (i.e., no role in
language modeling) - For fine-grained language modeling with coarser
meaning representation
snowfall
sprinkles
snowstorm
breezy
showers
accumulation
snowy
thunderstorm
flurries
blizzard
rainy
rainfall
weather snow
11Current Status
- SpeechBuilder has been operational for over two
years - Used by over 50 developers from MIT and elsewhere
- Used in undergraduate classes at MIT and
Georgetown University - ASR capabilities benchmarked against main systems
- Achieves same ASR performance as MIT Jupiter
weather information system (6.8 word error rate
on clean data) (phone ) - Several prototype systems have been developed
- Information about faculty, staff and students at
LCS and AI Labs (phone, email, room, voice
messages, transfer, etc.) - Application to control the various physical items
in a typical office (lights, curtains, TV, VCR,
projector, etc.) - Others include TV schedules, real-time weather
forecasts, hotel and restaurant information etc. - SpeechBuilder used for initial design of many
more complex domains
12Ongoing and Future Work
- Increase sophistication of discourse and dialogue
manager to handle more complex dialogues - Enable finer specification of discourse
capabilities - Add generic capabilities for times, dates, etc.
- Incorporate confidence scoring and implement
unsupervised training of acoustic and language
models - Create functionality to allow developers to
create domain-specific concatenative speech
synthesis - Create alternative methods of domain
specifications to streamline development - Advanced developers dont necessarily use web
interface - Allow for more efficient automatic generation of
SpeechBuilder domains
13Acknowledgements
- Issam Bazzi
- Scott Cyphers
- Ed Filisko
- Jim Glass
- TJ Hazen
- Lee Hetherington
- Joe Polifroni
- Stephanie Seneff
- Michelle Spina
- Eugene Weinstein
- Jon Yi
- Misha Zitser
14SpeechBuilder Hands-on Activity
- Eugene Weinstein
- Project Oxygen Core Team
- MIT Laboratory for Computer Science
- ecoder_at_mit.edu
15Modified Baseline Configuration (this class)
- Still gives developer total control over
application functionality - Frame Relay server exposes Galaxy meaning
representation to app
Jaim
Developer Application
Semantic Frame
16SpeechBuilder API
- Galaxy meaning representation provided through
frame relay - Applications connect via TCP sockets
- API provided in Perl, Python, and Java
- This class Python API
Galaxy Frame Relay
Python class galaxy.server.Server
TCP Socket
galaxy.frame.Frame methods getAction() getAttribu
te(attr_name) getText() toString()
galaxy.server.Server methods Constructor(machine,
port,ID) connect() processMessage(blocking) discon
nect()
Python class galaxy.frame.Frame
Python API
Application