Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source - PowerPoint PPT Presentation

About This Presentation

Title:

Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Description:

Build a classification tree ... Automatic Concept identification In goal-oriented conversations ... Concept identification. First step towards the goal of ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 24

Provided by: ananladach

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

1
Towards Learning Dialogue Structures from Speech
Data and Domain Knowledge Challenges to
Conceptual Clustering using Multiple and

Complex Knowledge Source

Jens-Uwe Moller
Natural Language Systems Division,
Dept. of Computer Science, Univ. of Hamburg

2
Overview

Dialog modeling based on a set of units called
dialog act
Dialog acts from theory doesnt fit with a
specific domain
Labeling dialog is time consuming and subjective
learn an application specific dialog acts from
speech data using conceptual clustering

3
The learning task

Learning dialog acts from turns
Unsupervised classification (no prior definition
of dialog acts is given)
Hierarchy classification with inspectable
classifying rules

4
Features

Domain knowledge structure of task, task
knowledge represented by goals and plans
Word recognizer word hypotheses
Prosodic data Pause Stress mark important unit
Lexical semantics
Syntax (less important in spoken dialog)
Semantics (larger units of lexical semantics)

5
COWEB

Symbolic machine learning algorithm
Build a classification tree
Distinction between subnodes are made from a
function overall attribute
Support probabilistic data
Support multiple overlapping hierarchies (for
ambiguous case)
Can handle multiple entries of one attribute
(e.g. stream of words)

6
COWEB (2)

Learning from simultaneous events
Learn from structure data Conceptual Graphs.
Learn case descriptions from terminological
descriptions
Subsumption correclation criterion over
structured data. e.g. subsumption of individuals
to classes

7
Metrics for Measuring Domain Independence of
Semantic Classes

Andrew Pargellis, Eric Fosler-Lussier, Alexandros
Potamianos, Chin-Hui LeeDialogue Systems
Research Dept., Bell Labs, Lucent Technologies
Murray Hill, NJ, USA

8
Introduction

Employ semantic classes (concepts) from another
domain
Need to identify domain-independent concepts base
on comparison across domain
Domain-independent concepts should occur in
similar syntactic (lexical) contexts across
domains

9
Comparing concepts across domains

Concept-comparison method
Concept-projection method

10
Concept-comparison method

Find the similarity between all pairs of concepts
across the two domains
Two concepts are similar if their respective
bigram contexts are similar
Use left and right context bigram language models

11
Kullback-Leibler (KL) distance

Compare how san francisco and newark are used in
the Travel domain with how comedies and westerns
are used in the Movie domain
Distance between two concepts

12
Concept-projection method

How well a single concept from one domain is
represented in another domain.
How the words comedies and westerns are used in
both domains
Useful for identifying the degree of
domain-independence for a particular concept.

13
Result Concept-comparison
14
Result Concept-projection
15
Concept Example
16
Semi-Automatic Acquisition of Domain-Specific
Semantic Structures

Siu K.C., Meng H.M.
Human-Computer Communications Laboratory
Department of Systems Engineering
and Engineering Management
The Chinese University of Hong Kong

17
Grammar induction

Use unannotated corpora
Portable across domain language
Output grammar has reasonable coverage of
within-domain data and reject out-of-domain data
Amenable to interactive refinement by human
Support optional injection of prior knowledge

18
Spatial clustering

Use kullback-liebler distance.
use left and right context.
Consider word with pre-set minimum occurrence.
(set to 5)
use left and right context. Consider word w1, w2
(later be c1, c2) pair-wise for words that have a
least pre-set minimum occurrence. (set to 5)

19
Temporal clustering

Use Mutual Information (MI).
N-highest MI pairs are clustered (N5 in
experiment)
Do spatial clustering and temporal clustering
iteratively
Post-process by human

20
Automatic Concept identification
In
goal-oriented conversations

Ananlada Chotimongkol and Alexander I. Rudnicky
Language Technologies Institute Carnegie Mellon
University

21
Concept identification

First step towards the goal of automatically
inferring domain ontologies
Goal-oriented human-human conversation has a
clear structure
This structure can be used to automatically
identify domain topics, e.g. dialog classfication

22
Clustering algorithm