Learning the Structure of TaskOriented Conversations from the Corpus PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Learning the Structure of TaskOriented Conversations from the Corpus


1
Learning the Structure of Task-Oriented
Conversations from the Corpus
  • Ananlada Chotimongkol
  • LTI Ph.D. thesis proposal
  • Thesis Committee
  • Alexander Rudnicky (Chair)
  • William Cohen
  • Carolyn Penstein Rose
  • Gokhan Tur (ATT Lab Research)

2
Outline
  • Introduction to the problem
  • Approach
  • Research program
  • Summary

3
Outline
  • Introduction to the problem
  • Approach
  • Research program
  • Summary

4
Building a new dialog system
problem approach research program summary
When would you like to leave?
I would like to fly to Seattle tomorrow.
Domain Knowledge
Speech Synthesizer
Speech Recognizer
Natural Language Generator
Natural Language Understanding
Dialog Manager
5
Domain knowledge
problem approach research program summary
  • Steps in the task
  • Specify the desired flight
  • Search for flights that match the criteria
  • Negotiate the flights
  • Make a reservation
  • Important information, keywords
  • Destination, date, time, airlines, etc.
  • Domain language how do people talk

6
What is the problem?
problem approach research program summary
When would you like to leave?
I would like to fly to Seattle tomorrow.
  • Cant reuse
  • Time consuming
  • May need an expert

Domain Knowledge
Speech Synthesizer
Speech Recognizer
Natural Language Generator
Natural Language Understanding
Dialog Manager
7
Research goal
problem approach research program summary
  • Reduce human effort on acquiring domain knowledge
    when create a dialog system in a new domain

8
Outline
  • Introduction to the problem
  • Approach
  • Research Program
  • Summary

9
Observations
problem approach research program summary
  • Task-oriented conversations have a clear
    structure
  • Reflects domain information e.g. a task is
    divided into sub-tasks
  • Has recurring patterns that are observable
    through the language

10
Thesis statement
problem approach research program summary
  • Approach
  • Identify the structure of task-oriented dialogs
  • Learn the structure from observations

Develop a learning system that is able to
identify all necessary domain knowledge required
by a dialog system in a task-oriented domain
through the observation of human-human
conversations
11
Desired structure properties
problem approach research program summary
  • Sufficient
  • Capture all domain knowledge required to carry
    out the task
  • General (domain-independent)
  • Can describe dialog in dissimilar domains and
    types
  • Learnable
  • Can be learned from data using a machine learning
    technique

12
Previous Approaches
problem approach research program summary
  • Theoretical-oriented
  • Theory of Discourse Structure (Grosz and Sidner,
    1986)
  • Discourse Representation Theory (DRT) (Kamp and
    Reyle, 1993)
  • Engineering-oriented
  • Plan-based theory (Allen and Perrault, 1980)
  • The theory of Conversation Acts (Traum and
    Hinkelman, 1992)

13
Outline
  • Introduction to the problem
  • Approach
  • Form-based dialog structure
  • Dialog structure learning
  • Research Program
  • Summary

14
Form-based dialog structure
problem approach form-based structure
learning research program summary
  • Use a form-based dialog architecture to represent
    a structure of a dialog
  • Concrete mapping between structure components and
    dialog system components
  • Sufficient for an information-accessing task
  • General enough to represent other types of
    task-oriented dialogs
  • Through the analysis of dialogs
  • Learnable from a corpus of human-human
    conversations
  • Preliminary experiments on concept clustering

15
Form-based structure components
problem approach form-based structure
learning research program summary
  • Task Structure
  • Domain information necessary for achieving the
    task goal
  • Dialog mechanism
  • The mechanisms that the participants use to
    advance the dialog toward the goal

16
Task structure
problem approach form-based structure
learning research program summary
  • Data representation for domain information
  • Task a subset of dialogs that has a specific
    goal
  • a set of forms
  • Sub-task a step in a task that contributes
    toward a task goal
  • form
  • Concept key information
  • slot

17
Task structure example Bus schedule enquiry
domain
problem approach form-based structure
learning research program summary
  • Task (multiple tasks)
  • Which bus runs between A and B?
  • When will the bus X arrive?
  • Sub-tasks no further decomposition
  • Concepts
  • Bus Number61C, 28X,
  • LocationCMU, airport,

18
Task structure example Map reading domain
problem approach form-based structure
learning research program summary
  • Task draw a route on a map
  • Sub-tasks
  • Draw a segment of a route
  • Concepts
  • Landmark White_Mountain, Machete,
  • Orientation down, left,
  • Distance a couple of centimeters, an inch,

19
Dialogue mechanisms (form operators)
problem approach form-based structure
learning research program summary
  • Task-oriented operations
  • Manipulate a form (data structure)
  • Ex init_form, fill_form
  • Discourse-oriented operations
  • Manage the flow of a conversation
  • Ex acknowledgement, greeting
  • Domain independent
  • same consequence, only operation parameters that
    are different
  • Fill city_name in flight_information form
  • Fill landmark in line_segment form

20
Bus schedule enquiry domain
problem approach form-based structure
learning research program summary
U2 fill_form_info  i wanted to take the 28X bus
from /um/ DepLocforbes avenue to ArLocthe
airport    
Form Query_Departure_Time Depart_Location Arriv
e_Location Arrive_Time Bus_Number
Form Query_Departure_Time Depart_Location
forbes avenue Arrive_Location the
airport Arrive_Time Bus_Number 28X
21
Map reading domain
problem approach form-based structure
learning research program summary
GIVER89 fill_form_info well go
Orientstraight up from Orithe Modtop of
the Landmarkwhite mountain 'til you're just
DestModbeside the Landmarkgolden
beach FOLLOWER90 acknowledge  right,
Form Line_Segment Origin Orientation Distance
Path Destination
Form Line_Segment Origin Modifier top
Landmark white mountain Orientation straight
up Distance Path Destination Modifier beside
Landmark golden beach
22
Outline
  • Introduction to the problem
  • Approach
  • Form-based dialog structure
  • Dialog structure learning
  • Research Program
  • Contributions
  • Thesis timeline

23
The learning framework
problem approach form-based structure
learning research program summary
  • Goal minimize human effort
  • Use unsupervised learning when possible
  • Incorporating information from existing knowledge
    sources
  • If additional knowledge from a human is required
  • Train an initial model with a small amount of
    annotated data
  • Use unsupervised learning or active learning to
    explore un-annotated data that is informative
  • A human can correct a mistake

24
Learning problems
problem approach form-based structure
learning research program summary
  • Concept identification and clustering
  • Form identification
  • Operation classification

25
Concept identification and clustering
problem approach form-based structure
learning research program summary
  • Goal Identify concept words and group the
    similar ones into the same cluster
  • CityPittsburgh, Boston, Austin,
  • MonthJanuary, February, March,
  • Assumption
  • Word boundaries including compound word
    boundaries are given

26
Approach
problem approach form-based structure
learning research program summary
  • Identify potential concept members
  • Filter out noise, function words
  • Cluster similar words together
  • Statistical-based Mutual information,
    Kullback-Liebler distance
  • Knowledgebase WordNet
  • Select clusters that represent domain concepts
  • Use the same criteria as 1. but work on a cluster
    level

27
Concept clustering result
problem approach form-based structure
learning research program summary
28
Form-based dialog structure summary
problem approach form-based structure
learning research program summary
  • Concrete mapping between structure components and
    dialog system components
  • Sufficient for an information-accessing task
  • General enough to explain other types of
    task-oriented dialogs
  • Through the analysis of dialogs
  • Learnable from a corpus of human-human
    conversations
  • Preliminary experiments on concept clustering

29
Outline
  • Introduction to the problem
  • Approach
  • Research Program
  • Summary

30
Proposed research program
problem approach research program summary
  • Dialog structure analysis
  • Is the scheme generalizable?
  • Inter-annotator agreement experiment
  • Is the scheme unambiguous?
  • Improve concept clustering
  • How can concepts best be identified?
  • Form identification
  • How are topics/forms identified?
  • Operation classification
  • How can operators be identified?

31
Dialog structure analysis
problem approach research program structure
analysis summary
  • Goal Verify that the proposed dialog structure
    is generalized for other task-oriented domains
  • Analyze 2 more domains
  • Tutoring domain (WHY Human Tutoring corpus)
  • Meeting domain (CMU CALO Meeting corpus)

32
Inter-annotator agreement
problem approach research program
inter-annotator agreement summary
  • Goal Verify that the proposed dialog structure
    can be understood and applied by other annotators
  • Evaluate with kappa coefficient (K)

33
Inter-annotator agreement experiments
problem approach research program
inter-annotator agreement summary
  • Two annotation tasks
  • Task-structure identification
  • Identify the structure of the task in the new
    domain
  • Design domain-specific labels from the definition
    of dialog structure
  • Dialog structure recognition
  • Annotate dialogs for the task-structure and the
    operation
  • Two different types of task-oriented dialogs
  • Air travel domain (information-accessing task)
  • Map reading domain (command-and-control task)

34
Improve concept clustering
problem approach research program concept
clustering summary
  • Goal Improve the quality of the concept
    identification and clustering technique
  • Combine concept identification features
  • Develop the concept likelihood score
  • Combine statistical-based clustering with
    knowledgebase clustering
  • Revise result from statistical-based clustering
    with information in the knowledgebase
  • Implement post-clustering selection

35
Form Identification
problem approach research program form
identification summary
  • Goal determine different types of forms that
    occur in the domain
  • Assumption
  • A dialog may be annotated with concept labels

36
Approach
problem approach research program form
identification summary
  • Segment a dialog into a sequence of sub-tasks
    (form boundaries identification)
  • Train a classifier on lexicon cohesion (Hearst,
    1994) and prosodic features
  • Group together the sub-tasks that belong to the
    same form type
  • Use unsupervised clustering based on cosine
    similarity
  • Identify a set of slots that associated with each
    form type
  • Analyze a cluster of similar form instances

37
Operation Classification
problem approach research program operation
classification summary
  • Goal Learn the expressions that associate with
    each operation
  • by classifying an utterance into a pre-defined
    set of operations
  • Assumption
  • A dialog may be annotated with concepts labels
  • List of operation types are given
  • Operation boundaries are known

38
Supervised classification
problem approach research program operation
classification summary
  • Features words, concepts, prosody
  • Markov model (Woszczyna and Waibel, 1994)
  • States operation types
  • Emission probability
  • Operation-dependent language model probability
  • Decision tree probability for prosodic features
  • Conditional random fields (Lafferty et al., 2001)
  • Use the same model structure as Markov model

39
Unsupervised learning and active learning
problem approach research program operation
classification summary
  • Train an initial classifier from human-labeled
    data
  • Apply the current classifier to an unlabeled
    operation
  • (Unsupervised learning) if the confidence is
    high, add this instance and the predicted label
    into the training set
  • (Active learning) if the confidence is low, ask a
    human to label this instance and then add it into
    the training set
  • Train a new classifier on all labeled data (both
    machine-labeled and human-labeled)
  • Step 2-3 can be iterated

40
Classifier confidence score
problem approach research program operation
classification summary
  • Difference in probabilities between the first
    rank and the second rank
  • The entropy of the classifier output
  • High entropy low confidence

41
Outline
  • Introduction to the problem
  • Approach
  • Research Program
  • Summary

42
Thesis contributions
problem approach research program form
identification summary
  • A dialog structure framework that is sufficient,
    general and learnable, and has a concrete mapping
    between dialog structure components and dialog
    system behavior
  • A machine learning technique for inferring the
    structure of the dialog from data with limit
    amount of human supervision
  • Reduce human effort in acquiring domain-specific
    information

43
Thesis contributions (Cont.)
problem approach research program form
identification summary
  • An unsupervised algorithm that can identify and
    cluster domain concepts from un-annotated data
  • An utterance-type classifier that is able to
    utilize unlabeled data through unsupervised
    learning and active learning
  • A discourse segmentation algorithm that can
    identify the boundaries between similar type
    sub-tasks and dissimilar type sub-tasks

44
Timeline
problem approach research program form
identification summary
45
Question?
46
Reference
  • Grosz, B. and Sidner, C., Attentions, intentions
    and the structure of discourse, Computational
    Linguistics, Vol. 12, pp. 175-204, 1986.
  • Kamp, H. and Reyle, U., From Discourse to Logic
    Introduction to Modeltheoretic Semantics of
    Natural Language, Formal Logic and Discourse
    Representation Theory, Kluwer, Dordrecht, The
    Netherlands, 1993.
  • Allen, J. and Perrault, R., Analyzing intention
    in utterances, Artificial Intelligence, Vol. 15,
    pp. 143-178, 1980.
  • Traum, D. and Hinkelman, E., Conversation Acts
    in Task-Oriented Spoken Dialogue, Computational
    Intelligence, Vol. 8, No. 3, pp. 575-599, 1992.
  • Hearst, M., Multi-paragraph segmentation of
    expository text, Proceedings of the 32nd Annual
    Meeting of the Association for Computational
    Linguistics, Las Cruces, NM, 1994.
  • Woszczyna, M. and Waibel, A., Inferring
    linguistic structure in spoken language,
    Proceedings of ICSLP-1994, Yokohama, Japan,
    September, 1994.
  • Lafferty, J., McCallum, A. and Pereira, F.,
    Conditional random fields Probabilistic models
    for segmenting and labeling sequence data,
    Proceedings of 18th International Conference on
    Machine Learning, pp. 282-289, San Francisco, CA,
    2001.
Write a Comment
User Comments (0)
About PowerShow.com