Title: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source
1Towards Learning Dialogue Structures from Speech
Data and Domain Knowledge Challenges to
Conceptual Clustering using Multiple and
Complex Knowledge Source
- Jens-Uwe Moller
- Natural Language Systems Division,
- Dept. of Computer Science, Univ. of Hamburg
2Overview
- Dialog modeling based on a set of units called
dialog act - Dialog acts from theory doesnt fit with a
specific domain - Labeling dialog is time consuming and subjective
- learn an application specific dialog acts from
speech data using conceptual clustering
3The learning task
- Learning dialog acts from turns
- Unsupervised classification (no prior definition
of dialog acts is given) - Hierarchy classification with inspectable
classifying rules
4Features
- Domain knowledge structure of task, task
knowledge represented by goals and plans - Word recognizer word hypotheses
- Prosodic data Pause Stress mark important unit
- Lexical semantics
- Syntax (less important in spoken dialog)
- Semantics (larger units of lexical semantics)
5COWEB
- Symbolic machine learning algorithm
- Build a classification tree
- Distinction between subnodes are made from a
function overall attribute - Support probabilistic data
- Support multiple overlapping hierarchies (for
ambiguous case) - Can handle multiple entries of one attribute
(e.g. stream of words)
6COWEB (2)
- Learning from simultaneous events
- Learn from structure data Conceptual Graphs.
- Learn case descriptions from terminological
descriptions - Subsumption correclation criterion over
structured data. e.g. subsumption of individuals
to classes
7Metrics for Measuring Domain Independence of
Semantic Classes
- Andrew Pargellis, Eric Fosler-Lussier, Alexandros
Potamianos, Chin-Hui LeeDialogue Systems
Research Dept., Bell Labs, Lucent Technologies
Murray Hill, NJ, USA
8Introduction
- Employ semantic classes (concepts) from another
domain - Need to identify domain-independent concepts base
on comparison across domain - Domain-independent concepts should occur in
similar syntactic (lexical) contexts across
domains
9Comparing concepts across domains
- Concept-comparison method
- Concept-projection method
10Concept-comparison method
- Find the similarity between all pairs of concepts
across the two domains - Two concepts are similar if their respective
bigram contexts are similar - Use left and right context bigram language models
11Kullback-Leibler (KL) distance
- Compare how san francisco and newark are used in
the Travel domain with how comedies and westerns
are used in the Movie domain - Distance between two concepts
12Concept-projection method
- How well a single concept from one domain is
represented in another domain. - How the words comedies and westerns are used in
both domains - Useful for identifying the degree of
domain-independence for a particular concept.
13Result Concept-comparison
14Result Concept-projection
15Concept Example
16Semi-Automatic Acquisition of Domain-Specific
Semantic Structures
- Siu K.C., Meng H.M.
- Human-Computer Communications Laboratory
- Department of Systems Engineering
- and Engineering Management
- The Chinese University of Hong Kong
17Grammar induction
- Use unannotated corpora
- Portable across domain language
- Output grammar has reasonable coverage of
within-domain data and reject out-of-domain data - Amenable to interactive refinement by human
- Support optional injection of prior knowledge
18Spatial clustering
- Use kullback-liebler distance.
- use left and right context.
- Consider word with pre-set minimum occurrence.
(set to 5) - use left and right context. Consider word w1, w2
(later be c1, c2) pair-wise for words that have a
least pre-set minimum occurrence. (set to 5)
19Temporal clustering
- Use Mutual Information (MI).
- N-highest MI pairs are clustered (N5 in
experiment) - Do spatial clustering and temporal clustering
iteratively - Post-process by human
20Automatic Concept identification
In
goal-oriented conversations
- Ananlada Chotimongkol and Alexander I. Rudnicky
- Language Technologies Institute Carnegie Mellon
University
21Concept identification
- First step towards the goal of automatically
inferring domain ontologies - Goal-oriented human-human conversation has a
clear structure - This structure can be used to automatically
identify domain topics, e.g. dialog classfication
22Clustering algorithm
- Hierarchical clustering
- Mutual information based
- Criterionminimize the loss of average mutual
information - Kullback-Lierbler based
- Criterionword pair with minimum distance
23Evaluation metrics
- Reference concept from class-based n-gram model
- Cluster conceptmajority concept
- Precision
- Recall
- Singularity score (SS)
- Quality score (QS)