Title: Stochastic Language Generation for Dialog Systems
1Stochastic Language Generation for Dialog Systems
- Alice Oh
- aliceo_at_cs.cmu.edu
- 27 September 2009
2Motivation
- To build a generation engine for a dialog system
that can combine the advantages, as well as
overcome the difficulties, of the two current
approaches (template-based generation, and
traditional, linguistic NLG)
3General Spec of NLG input, output, parts (from
Kevin Knights presentation at the DARPA
Communicator Meeting, June 1999)
Intermediate reps all allow for varying depth.
Semantic frame, pronouns, speech act
Labeled syntactic bracketing, speech act
Database, discourse history
Speech acts, discourse history
SABLE
.wav file
Macro planning
Micro planning
Sentence realizing
Prosody
Speech Synthesis
CMU
SRI CTalk
SRI Comm
MIT Comm
MIT/Envoice
chatterbot
4Current Approaches
- Traditional NLG produces high-quality output but
needs hand-crafted rules and other knowledge
sources - Template-based NLG does not offer quality but is
simple to build - Recent workshop on NLG vs. Templates
- http//www.dfki.de/service/NLG/KI99.html
5Our Approach
- corpus-driven
- easy to build (no expert knowledge)
- fast prototyping
- minimal input
- natural output
- leverages data-collecting/tagging effort
- modular (enables plug-n-play)
6Stochastic NLG overview
- Language Model an n-gram language model built
from a corpus of travel reservation dialogs - Generation given an utterance class, randomly
generates a set of candidate utterances based on
the LM distributions - Scoring based on a set of rules, scores the
candidates and picks the best one - Slot filling substitute slots in the utterance
with the appropriate values in the input frame
7Stochastic NLG can also be thought of as a way to
automatically build templates from a corpus
- If you set n equal to a large enough number, most
utterances generated by LM-NLG will be exact
duplicates of the utterances in the corpus.
8Stochastic NLG Language Model
- Human-Human dialogs in travel reservations
- (Leah, ATIS/American Express dialogs)
9Tags
- Utterance classes (29)
- query_arrive_city inform_airport
- query_arrive_time inform_confirm_utterance
- query_arrive_time inform_epilogue
- query_confirm inform_flight
- query_depart_date inform_flight_another
- query_depart_time inform_flight_earlier
- query_pay_by_card inform_flight_earliest
- query_preferred_airport inform_flight_later
- query_return_date inform_flight_latest
- query_return_time inform_not_avail
- hotel_car_info inform_num_flights
- hotel_hotel_chain inform_price
- hotel_hotel_info other
- hotel_need_car
- hotel_need_hotel
- hotel_where
- Attributes (24)
- airline flight_num
- am hotel
- arrive_airport hotel_city
- arrive_city hotel_price
- arrive_date name
- arrive_time num_flights
- car_company pm
- car_price price
- connect_airline
- connect_airport
- connect_city
- depart_airport
- depart_city
- depart_date
- depart_time
- depart_tod
10Tagging
- CMU corpus tagged manually
- SRI corpus tagged semi-automatically using
trigram language models built from CMU corpus
11Stochastic NLG Generation
- Given an utterance class, randomly generates a
set of candidate utterances based on the LM
distributions - Generation stops when an utterance has penalty
score of 0 or the maximum number of iterations
(50) has been reached - Average time 238 msec for Communicator dialogs
12Stochastic NLG Scoring
- Assign various penalty scores for
- unusual length of utterance (thresholds for
too-long and too-short) - slot in the generated utterance with an invalid
(or no) value in the input frame - a new and required attribute in the input
frame thats missing from the generated utterance - repeated slots in the generated utterance
- Pick the utterance with the lowest penalty (or
stop generating at an utterance with 0 penalty)
13Stochastic NLG Slot Filling
- Substitute slots in the utterance with the
appropriate values in the input frame - Example
- What time do you need to arrive in arrive_city?
- What time do you need to arrive in New York?
14Examples
- I have a u.s. air flight at ten ten a.m. from
pittsburgh arriving at twelve eleven p.m. - I have a flight departing seattle at one thirty
arrives into pittsburgh international at eight
fifty seven. - There is a u.s. air flight departing pittsburgh
at ten ten a.m. arriving at twelve eleven p.m. - Which one is template?
- You WILL know after calling the template-based
system a few times.
15Rejected Examples
- Not enough info
- There is a flight at depart_time ampm.
- Contains attributes not specified in the frame
- I have an airline flight at depart_time
ampm from depart_city arriving at
arrive_time ampm with a stop-over in
connect_city at connect_airport. - Scoring to get the best utterance is important!
16Stochastic NLG Shortcomings
- What might sound natural (imperfect grammar,
intentional omission of words, etc.) for a human
speaker may sound awkward (or wrong) for the
system. - It is difficult to define utterance boundaries
and utterance classes. Some utterances in the
corpus may be a conjunction of more than one
utterance class. - Factors other than the utterance class may affect
the words (e.g., discourse history). - Some sophistication built into traditional NLG
engines is not available (e.g., aggregation,
anaphorization).
17Related Work
- Statistical NLG
- Irene Langkilde and Kevin Knight (USC/ISI)
- Jon Oberlander and Chris Brew (U. Edinburgh)
- NLG in dialog systems
- Amanda Stent (U. Rochester)
- Lena Santamarta (Linköping Univ.)
18Evaluation
- User satisfaction questionnaire
- Comparative evaluation
- two systems with different NLG
- human reading the output, teasing out TTS
- compare task completion, as well as user
satisfaction - Batch-mode generation, output evaluated by a
human grader
19Future Work
- How big of a corpus do we need?
- How much of it needs manual tagging?
- How does the n in n-gram affect the output?
- What happens to output when two different human
speakers are modeled in one model? - Can we replace scoring with a search algorithm?