Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source - PowerPoint PPT Presentation

About This Presentation
Title:

Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Description:

Build a classification tree ... Automatic Concept identification In goal-oriented conversations ... Concept identification. First step towards the goal of ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source


1
Towards Learning Dialogue Structures from Speech
Data and Domain Knowledge Challenges to
Conceptual Clustering using Multiple and

Complex Knowledge Source
  • Jens-Uwe Moller
  • Natural Language Systems Division,
  • Dept. of Computer Science, Univ. of Hamburg

2
Overview
  • Dialog modeling based on a set of units called
    dialog act
  • Dialog acts from theory doesnt fit with a
    specific domain
  • Labeling dialog is time consuming and subjective
  • learn an application specific dialog acts from
    speech data using conceptual clustering

3
The learning task
  • Learning dialog acts from turns
  • Unsupervised classification (no prior definition
    of dialog acts is given)
  • Hierarchy classification with inspectable
    classifying rules

4
Features
  • Domain knowledge structure of task, task
    knowledge represented by goals and plans
  • Word recognizer word hypotheses
  • Prosodic data Pause Stress mark important unit
  • Lexical semantics
  • Syntax (less important in spoken dialog)
  • Semantics (larger units of lexical semantics)

5
COWEB
  • Symbolic machine learning algorithm
  • Build a classification tree
  • Distinction between subnodes are made from a
    function overall attribute
  • Support probabilistic data
  • Support multiple overlapping hierarchies (for
    ambiguous case)
  • Can handle multiple entries of one attribute
    (e.g. stream of words)

6
COWEB (2)
  • Learning from simultaneous events
  • Learn from structure data Conceptual Graphs.
  • Learn case descriptions from terminological
    descriptions
  • Subsumption correclation criterion over
    structured data. e.g. subsumption of individuals
    to classes

7
Metrics for Measuring Domain Independence of
Semantic Classes
  • Andrew Pargellis, Eric Fosler-Lussier, Alexandros
    Potamianos, Chin-Hui LeeDialogue Systems
    Research Dept., Bell Labs, Lucent Technologies
    Murray Hill, NJ, USA

8
Introduction
  • Employ semantic classes (concepts) from another
    domain
  • Need to identify domain-independent concepts base
    on comparison across domain
  • Domain-independent concepts should occur in
    similar syntactic (lexical) contexts across
    domains

9
Comparing concepts across domains
  • Concept-comparison method
  • Concept-projection method

10
Concept-comparison method
  • Find the similarity between all pairs of concepts
    across the two domains
  • Two concepts are similar if their respective
    bigram contexts are similar
  • Use left and right context bigram language models

11
Kullback-Leibler (KL) distance
  • Compare how san francisco and newark are used in
    the Travel domain with how comedies and westerns
    are used in the Movie domain
  • Distance between two concepts

12
Concept-projection method
  • How well a single concept from one domain is
    represented in another domain.
  • How the words comedies and westerns are used in
    both domains
  • Useful for identifying the degree of
    domain-independence for a particular concept.

13
Result Concept-comparison
14
Result Concept-projection
15
Concept Example
16
Semi-Automatic Acquisition of Domain-Specific
Semantic Structures
  • Siu K.C., Meng H.M.
  • Human-Computer Communications Laboratory
  • Department of Systems Engineering
  • and Engineering Management
  • The Chinese University of Hong Kong

17
Grammar induction
  • Use unannotated corpora
  • Portable across domain language
  • Output grammar has reasonable coverage of
    within-domain data and reject out-of-domain data
  • Amenable to interactive refinement by human
  • Support optional injection of prior knowledge

18
Spatial clustering
  • Use kullback-liebler distance.
  • use left and right context.
  • Consider word with pre-set minimum occurrence.
    (set to 5)
  • use left and right context. Consider word w1, w2
    (later be c1, c2) pair-wise for words that have a
    least pre-set minimum occurrence. (set to 5)

19
Temporal clustering
  • Use Mutual Information (MI).
  • N-highest MI pairs are clustered (N5 in
    experiment)
  • Do spatial clustering and temporal clustering
    iteratively
  • Post-process by human

20
Automatic Concept identification
In
goal-oriented conversations
  • Ananlada Chotimongkol and Alexander I. Rudnicky
  • Language Technologies Institute Carnegie Mellon
    University

21
Concept identification
  • First step towards the goal of automatically
    inferring domain ontologies
  • Goal-oriented human-human conversation has a
    clear structure
  • This structure can be used to automatically
    identify domain topics, e.g. dialog classfication

22
Clustering algorithm
  • Hierarchical clustering
  • Mutual information based
  • Criterionminimize the loss of average mutual
    information
  • Kullback-Lierbler based
  • Criterionword pair with minimum distance

23
Evaluation metrics
  • Reference concept from class-based n-gram model
  • Cluster conceptmajority concept
  • Precision
  • Recall
  • Singularity score (SS)
  • Quality score (QS)
Write a Comment
User Comments (0)
About PowerShow.com