Learning the Structure of TaskOriented Conversations from the Corpus presentation

About This Presentation

Transcript and Presenter's Notes

Title: Learning the Structure of TaskOriented Conversations from the Corpus

1
Learning the Structure of Task-Oriented
Conversations from the Corpus

Ananlada Chotimongkol
LTI Ph.D. thesis proposal
Thesis Committee
Alexander Rudnicky (Chair)
William Cohen
Carolyn Penstein Rose
Gokhan Tur (ATT Lab Research)

2
Outline

Introduction to the problem
Approach
Research program
Summary

3
Outline

Introduction to the problem
Approach
Research program
Summary

4
Building a new dialog system
problem approach research program summary
When would you like to leave?
I would like to fly to Seattle tomorrow.
Domain Knowledge
Speech Synthesizer
Speech Recognizer
Natural Language Generator
Natural Language Understanding
Dialog Manager
5
Domain knowledge
problem approach research program summary

Steps in the task
Specify the desired flight
Search for flights that match the criteria
Negotiate the flights
Make a reservation
Important information, keywords
Destination, date, time, airlines, etc.
Domain language how do people talk

6
What is the problem?
problem approach research program summary
When would you like to leave?
I would like to fly to Seattle tomorrow.

Cant reuse
Time consuming
May need an expert

Domain Knowledge
Speech Synthesizer
Speech Recognizer
Natural Language Generator
Natural Language Understanding
Dialog Manager
7
Research goal
problem approach research program summary

Reduce human effort on acquiring domain knowledge
when create a dialog system in a new domain

8
Outline

Introduction to the problem
Approach
Research Program
Summary

9
Observations
problem approach research program summary

Task-oriented conversations have a clear
structure
Reflects domain information e.g. a task is
divided into sub-tasks
Has recurring patterns that are observable
through the language

10
Thesis statement
problem approach research program summary

Approach
Identify the structure of task-oriented dialogs
Learn the structure from observations

Develop a learning system that is able to
identify all necessary domain knowledge required
by a dialog system in a task-oriented domain
through the observation of human-human
conversations
11
Desired structure properties
problem approach research program summary

Sufficient
Capture all domain knowledge required to carry
out the task
General (domain-independent)
Can describe dialog in dissimilar domains and
types
Learnable
Can be learned from data using a machine learning
technique

12
Previous Approaches
problem approach research program summary

Theoretical-oriented
Theory of Discourse Structure (Grosz and Sidner,
1986)
Discourse Representation Theory (DRT) (Kamp and
Reyle, 1993)
Engineering-oriented
Plan-based theory (Allen and Perrault, 1980)
The theory of Conversation Acts (Traum and
Hinkelman, 1992)

13
Outline

Introduction to the problem
Approach
Form-based dialog structure
Dialog structure learning
Research Program
Summary

14
Form-based dialog structure
problem approach form-based structure
learning research program summary

Use a form-based dialog architecture to represent
a structure of a dialog
Concrete mapping between structure components and
dialog system components
Sufficient for an information-accessing task
General enough to represent other types of
task-oriented dialogs
Through the analysis of dialogs
Learnable from a corpus of human-human
conversations
Preliminary experiments on concept clustering

15
Form-based structure components
problem approach form-based structure
learning research program summary

Task Structure
Domain information necessary for achieving the
task goal
Dialog mechanism
The mechanisms that the participants use to
advance the dialog toward the goal

16
Task structure
problem approach form-based structure
learning research program summary

Data representation for domain information
Task a subset of dialogs that has a specific
goal
a set of forms
Sub-task a step in a task that contributes
toward a task goal
form
Concept key information
slot

17
Task structure example Bus schedule enquiry
domain
problem approach form-based structure
learning research program summary

Task (multiple tasks)
Which bus runs between A and B?
When will the bus X arrive?
Sub-tasks no further decomposition
Concepts
Bus Number61C, 28X,
LocationCMU, airport,

18
Task structure example Map reading domain
problem approach form-based structure
learning research program summary

Task draw a route on a map
Sub-tasks
Draw a segment of a route
Concepts
Landmark White_Mountain, Machete,
Orientation down, left,
Distance a couple of centimeters, an inch,

19
Dialogue mechanisms (form operators)
problem approach form-based structure
learning research program summary

Task-oriented operations
Manipulate a form (data structure)
Ex init_form, fill_form
Discourse-oriented operations
Manage the flow of a conversation
Ex acknowledgement, greeting
Domain independent
same consequence, only operation parameters that
are different
Fill city_name in flight_information form
Fill landmark in line_segment form

20
Bus schedule enquiry domain
problem approach form-based structure
learning research program summary
U2 fill_form_info i wanted to take the 28X bus
from /um/ DepLocforbes avenue to ArLocthe
airport
Form Query_Departure_Time Depart_Location Arriv
e_Location Arrive_Time Bus_Number
Form Query_Departure_Time Depart_Location
forbes avenue Arrive_Location the
airport Arrive_Time Bus_Number 28X
21
Map reading domain
problem approach form-based structure
learning research program summary
GIVER89 fill_form_info well go
Orientstraight up from Orithe Modtop of
the Landmarkwhite mountain 'til you're just
DestModbeside the Landmarkgolden
beach FOLLOWER90 acknowledge right,
Form Line_Segment Origin Orientation Distance
Path Destination
Form Line_Segment Origin Modifier top
Landmark white mountain Orientation straight
up Distance Path Destination Modifier beside
Landmark golden beach
22
Outline

Introduction to the problem
Approach
Form-based dialog structure
Dialog structure learning
Research Program
Contributions
Thesis timeline

23
The learning framework
problem approach form-based structure
learning research program summary

Goal minimize human effort
Use unsupervised learning when possible
Incorporating information from existing knowledge
sources
If additional knowledge from a human is required
Train an initial model with a small amount of
annotated data
Use unsupervised learning or active learning to
explore un-annotated data that is informative
A human can correct a mistake

24
Learning problems
problem approach form-based structure
learning research program summary

Concept identification and clustering
Form identification
Operation classification

25
Concept identification and clustering
problem approach form-based structure
learning research program summary

Goal Identify concept words and group the
similar ones into the same cluster
CityPittsburgh, Boston, Austin,
MonthJanuary, February, March,
Assumption
Word boundaries including compound word
boundaries are given

26
Approach
problem approach form-based structure
learning research program summary

Identify potential concept members
Filter out noise, function words
Cluster similar words together
Statistical-based Mutual information,
Kullback-Liebler distance
Knowledgebase WordNet
Select clusters that represent domain concepts
Use the same criteria as 1. but work on a cluster
level

27
Concept clustering result
problem approach form-based structure
learning research program summary
28
Form-based dialog structure summary
problem approach form-based structure
learning research program summary

Concrete mapping between structure components and
dialog system components
Sufficient for an information-accessing task
General enough to explain other types of
task-oriented dialogs
Through the analysis of dialogs
Learnable from a corpus of human-human
conversations
Preliminary experiments on concept clustering

29
Outline

Introduction to the problem
Approach
Research Program
Summary

30
Proposed research program
problem approach research program summary

Dialog structure analysis
Is the scheme generalizable?
Inter-annotator agreement experiment
Is the scheme unambiguous?
Improve concept clustering
How can concepts best be identified?
Form identification
How are topics/forms identified?
Operation classification
How can operators be identified?

31
Dialog structure analysis
problem approach research program structure
analysis summary

Goal Verify that the proposed dialog structure
is generalized for other task-oriented domains
Analyze 2 more domains
Tutoring domain (WHY Human Tutoring corpus)
Meeting domain (CMU CALO Meeting corpus)

32
Inter-annotator agreement
problem approach research program
inter-annotator agreement summary

Goal Verify that the proposed dialog structure
can be understood and applied by other annotators
Evaluate with kappa coefficient (K)

33
Inter-annotator agreement experiments
problem approach research program
inter-annotator agreement summary

Two annotation tasks
Task-structure identification
Identify the structure of the task in the new
domain
Design domain-specific labels from the definition
of dialog structure
Dialog structure recognition
Annotate dialogs for the task-structure and the
operation
Two different types of task-oriented dialogs
Air travel domain (information-accessing task)
Map reading domain (command-and-control task)

34
Improve concept clustering
problem approach research program concept
clustering summary

Goal Improve the quality of the concept
identification and clustering technique
Combine concept identification features
Develop the concept likelihood score
Combine statistical-based clustering with
knowledgebase clustering
Revise result from statistical-based clustering
with information in the knowledgebase
Implement post-clustering selection

35
Form Identification
problem approach research program form
identification summary

Goal determine different types of forms that
occur in the domain
Assumption
A dialog may be annotated with concept labels

36
Approach
problem approach research program form
identification summary

Segment a dialog into a sequence of sub-tasks
(form boundaries identification)
Train a classifier on lexicon cohesion (Hearst,
1994) and prosodic features
Group together the sub-tasks that belong to the
same form type
Use unsupervised clustering based on cosine
similarity
Identify a set of slots that associated with each
form type
Analyze a cluster of similar form instances

37
Operation Classification
problem approach research program operation
classification summary

Goal Learn the expressions that associate with
each operation
by classifying an utterance into a pre-defined
set of operations
Assumption
A dialog may be annotated with concepts labels
List of operation types are given
Operation boundaries are known

38
Supervised classification
problem approach research program operation
classification summary

Features words, concepts, prosody
Markov model (Woszczyna and Waibel, 1994)
States operation types
Emission probability
Operation-dependent language model probability
Decision tree probability for prosodic features
Conditional random fields (Lafferty et al., 2001)
Use the same model structure as Markov model

39
Unsupervised learning and active learning
problem approach research program operation
classification summary

Train an initial classifier from human-labeled
data
Apply the current classifier to an unlabeled
operation
(Unsupervised learning) if the confidence is
high, add this instance and the predicted label
into the training set
(Active learning) if the confidence is low, ask a
human to label this instance and then add it into
the training set
Train a new classifier on all labeled data (both
machine-labeled and human-labeled)
Step 2-3 can be iterated

40
Classifier confidence score
problem approach research program operation
classification summary

Difference in probabilities between the first
rank and the second rank
The entropy of the classifier output
High entropy low confidence

41
Outline

Introduction to the problem
Approach
Research Program
Summary

42
Thesis contributions
problem approach research program form
identification summary

A dialog structure framework that is sufficient,
general and learnable, and has a concrete mapping
between dialog structure components and dialog
system behavior
A machine learning technique for inferring the
structure of the dialog from data with limit
amount of human supervision
Reduce human effort in acquiring domain-specific
information

43
Thesis contributions (Cont.)
problem approach research program form
identification summary

An unsupervised algorithm that can identify and
cluster domain concepts from un-annotated data
An utterance-type classifier that is able to
utilize unlabeled data through unsupervised
learning and active learning
A discourse segmentation algorithm that can
identify the boundaries between similar type
sub-tasks and dissimilar type sub-tasks

44
Timeline
problem approach research program form
identification summary
45
Question?
46
Reference

Grosz, B. and Sidner, C., Attentions, intentions
and the structure of discourse, Computational
Linguistics, Vol. 12, pp. 175-204, 1986.
Kamp, H. and Reyle, U., From Discourse to Logic
Introduction to Modeltheoretic Semantics of
Natural Language, Formal Logic and Discourse
Representation Theory, Kluwer, Dordrecht, The
Netherlands, 1993.
Allen, J. and Perrault, R., Analyzing intention
in utterances, Artificial Intelligence, Vol. 15,
pp. 143-178, 1980.
Traum, D. and Hinkelman, E., Conversation Acts
in Task-Oriented Spoken Dialogue, Computational
Intelligence, Vol. 8, No. 3, pp. 575-599, 1992.
Hearst, M., Multi-paragraph segmentation of
expository text, Proceedings of the 32nd Annual
Meeting of the Association for Computational
Linguistics, Las Cruces, NM, 1994.
Woszczyna, M. and Waibel, A., Inferring
linguistic structure in spoken language,
Proceedings of ICSLP-1994, Yokohama, Japan,
September, 1994.
Lafferty, J., McCallum, A. and Pereira, F.,
Conditional random fields Probabilistic models
for segmenting and labeling sequence data,
Proceedings of 18th International Conference on
Machine Learning, pp. 282-289, San Francisco, CA,
2001.

Write a Comment

User Comments (0)

About PowerShow.com

Learning the Structure of TaskOriented Conversations from the Corpus PowerPoint PPT Presentation