Stochastic Language Generation for Dialog Systems - PowerPoint PPT Presentation

About This Presentation

Title:

Stochastic Language Generation for Dialog Systems

Description:

Human-Human dialogs in travel reservations (Leah, ATIS/American Express dialogs) 8/17/09 ... am hotel. arrive_airport hotel_city. arrive_city hotel_price ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 20

Provided by: alic49

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Stochastic Language Generation for Dialog Systems

1
Stochastic Language Generation for Dialog Systems

Alice Oh
aliceo_at_cs.cmu.edu
27 September 2009

2
Motivation

To build a generation engine for a dialog system
that can combine the advantages, as well as
overcome the difficulties, of the two current
approaches (template-based generation, and
traditional, linguistic NLG)

3

General Spec of NLG input, output, parts (from
Kevin Knights presentation at the DARPA
Communicator Meeting, June 1999)
Intermediate reps all allow for varying depth.
Semantic frame, pronouns, speech act
Labeled syntactic bracketing, speech act
Database, discourse history
Speech acts, discourse history
SABLE
.wav file
Macro planning
Micro planning
Sentence realizing
Prosody
Speech Synthesis
CMU
SRI CTalk
SRI Comm
MIT Comm
MIT/Envoice
chatterbot
4
Current Approaches

Traditional NLG produces high-quality output but
needs hand-crafted rules and other knowledge
sources
Template-based NLG does not offer quality but is
simple to build
Recent workshop on NLG vs. Templates
http//www.dfki.de/service/NLG/KI99.html

5
Our Approach

corpus-driven
easy to build (no expert knowledge)
fast prototyping
minimal input
natural output
leverages data-collecting/tagging effort
modular (enables plug-n-play)

6
Stochastic NLG overview

Language Model an n-gram language model built
from a corpus of travel reservation dialogs
Generation given an utterance class, randomly
generates a set of candidate utterances based on
the LM distributions
Scoring based on a set of rules, scores the
candidates and picks the best one
Slot filling substitute slots in the utterance
with the appropriate values in the input frame

7
Stochastic NLG can also be thought of as a way to
automatically build templates from a corpus

If you set n equal to a large enough number, most
utterances generated by LM-NLG will be exact
duplicates of the utterances in the corpus.

8
Stochastic NLG Language Model

Human-Human dialogs in travel reservations
(Leah, ATIS/American Express dialogs)

9
Tags

Utterance classes (29)
query_arrive_city inform_airport
query_arrive_time inform_confirm_utterance
query_arrive_time inform_epilogue
query_confirm inform_flight
query_depart_date inform_flight_another
query_depart_time inform_flight_earlier
query_pay_by_card inform_flight_earliest
query_preferred_airport inform_flight_later
query_return_date inform_flight_latest
query_return_time inform_not_avail
hotel_car_info inform_num_flights
hotel_hotel_chain inform_price
hotel_hotel_info other
hotel_need_car
hotel_need_hotel
hotel_where

Attributes (24)
airline flight_num
am hotel
arrive_airport hotel_city
arrive_city hotel_price
arrive_date name
arrive_time num_flights
car_company pm
car_price price
connect_airline
connect_airport
connect_city
depart_airport
depart_city
depart_date
depart_time
depart_tod

10
Tagging

CMU corpus tagged manually
SRI corpus tagged semi-automatically using
trigram language models built from CMU corpus

11
Stochastic NLG Generation

Given an utterance class, randomly generates a
set of candidate utterances based on the LM
distributions
Generation stops when an utterance has penalty
score of 0 or the maximum number of iterations
(50) has been reached
Average time 238 msec for Communicator dialogs

12
Stochastic NLG Scoring

Assign various penalty scores for
unusual length of utterance (thresholds for
too-long and too-short)
slot in the generated utterance with an invalid
(or no) value in the input frame
a new and required attribute in the input
frame thats missing from the generated utterance
repeated slots in the generated utterance
Pick the utterance with the lowest penalty (or
stop generating at an utterance with 0 penalty)

13
Stochastic NLG Slot Filling

Substitute slots in the utterance with the
appropriate values in the input frame
Example
What time do you need to arrive in arrive_city?
What time do you need to arrive in New York?

14
Examples

I have a u.s. air flight at ten ten a.m. from
pittsburgh arriving at twelve eleven p.m.
I have a flight departing seattle at one thirty
arrives into pittsburgh international at eight
fifty seven.
There is a u.s. air flight departing pittsburgh
at ten ten a.m. arriving at twelve eleven p.m.
Which one is template?
You WILL know after calling the template-based
system a few times.

15
Rejected Examples

Not enough info
There is a flight at depart_time ampm.
Contains attributes not specified in the frame
I have an airline flight at depart_time
ampm from depart_city arriving at
arrive_time ampm with a stop-over in
connect_city at connect_airport.
Scoring to get the best utterance is important!

16
Stochastic NLG Shortcomings

What might sound natural (imperfect grammar,
intentional omission of words, etc.) for a human
speaker may sound awkward (or wrong) for the
system.
It is difficult to define utterance boundaries
and utterance classes. Some utterances in the
corpus may be a conjunction of more than one
utterance class.
Factors other than the utterance class may affect
the words (e.g., discourse history).
Some sophistication built into traditional NLG
engines is not available (e.g., aggregation,
anaphorization).

17
Related Work

Statistical NLG
Irene Langkilde and Kevin Knight (USC/ISI)
Jon Oberlander and Chris Brew (U. Edinburgh)
NLG in dialog systems
Amanda Stent (U. Rochester)
Lena Santamarta (Linköping Univ.)

18
Evaluation

User satisfaction questionnaire
Comparative evaluation
two systems with different NLG
human reading the output, teasing out TTS
compare task completion, as well as user
satisfaction
Batch-mode generation, output evaluated by a
human grader

19
Future Work

How big of a corpus do we need?
How much of it needs manual tagging?
How does the n in n-gram affect the output?
What happens to output when two different human
speakers are modeled in one model?
Can we replace scoring with a search algorithm?

Write a Comment

User Comments (0)