Spoken Language Understanding for Conversational Dialog Systems - PowerPoint PPT Presentation

About This Presentation

Title:

Spoken Language Understanding for Conversational Dialog Systems

Description:

IEEE/ACL 2006 Workshop on Spoken Language Technology. Aruba, December 10-13, 2006 ... SLU module creates a database query from user's spoken input by ... – PowerPoint PPT presentation

Number of Views:239

Avg rating:3.0/5.0

Slides: 34

Provided by: michael654

Learn more at: http://www.slt2006.org

Category:

more less

Transcript and Presenter's Notes

Title: Spoken Language Understanding for Conversational Dialog Systems

1
Spoken Language Understanding for Conversational
Dialog Systems

Michael McTear
University of Ulster

IEEE/ACL 2006 Workshop on Spoken Language
Technology Aruba, December 10-13, 2006
2
Overview

Introductory definitions
Task-based and conversational dialog systems
Spoken language understanding
Issues for spoken language understanding
Coverage
Robustness
Overview of spoken language understanding
Hand-crafted approaches
Data-driven methods
Conclusions

3
Basic dialog system architecture
4
Task-based Dialog Systems

Mainly interact with databases to get information
or support transactions
SLU module creates a database query from users
spoken input by extracting relevant concepts
System initiative constrains user input
Keyword / keyphrase extraction
User-initiative less constrained input
Call-routing call classification with named
entity extraction
Question answering

5
Conversational Dialog

AI (agent-based systems) e.g. TRIPS
User can take initiative, e.g. raise new topic,
ask for clarification (TRIPS)
More complex interactions involving recognition
of the users intentions, goals, beliefs or plans
Deep understanding of the users utterance,
taking into account contextual information
Information State Theory, Planning Theory, User
Modelling, Belief Modelling
Simulated conversation e.g. CONVERSE
Conversational companions, chatbots, help desk
Does not require deep understanding
SLU involves identifying system utterance type
and determining a suitable response

6
Defining Spoken Language Understanding

extracting the meaning from speech utterances
a transduction of the recognition result to an
interpretable representation
Meaning (in humancomputer interactive systems)
a representation that can be executed by an
interpreter in order to change the state of the
system
Bangelore et al. 2006

7
SLU for task based systems

a flight from Belfast to Malaga
uh Id like uh um could you uh is there a flight
from Bel- uh Belfast to um Gran- I mean Malaga
I would like to find a flight from Pittsburgh to
Boston on Wednesday and I have to be in Boston by
one so I would like a flight out of here no later
than 11 a.m.

8
SLU for advanced conversational systems (TRIPS)

Interpretation requires intention recognitioncan
we use a helicopter to get the people from
Abyss (request to modify plan) Barnacle
(include sub-goal and suggest solution) Delta
(extension of a solution)
Six possible interpretations with only change of
city name
Requires reasoning about task and current context
to identify most plausible interpretation
Requires more than concept spotting to identify
structure and meaning of utterance as basis for
reasoning

9
Role of syntax in deep understanding

List all employees of the companies who/which are
based in the city centre

I would like to know where to mail the check.
I would like you to mail the check to me

10
SLU for simulated dialog

C Are you still a friend of XXX?
H I am not sure if he has any real friends. He
has achieved a lot and has left a lot of people
behind.
C You really dont like the guy - well, at least
thats definite - I thought most people were just
indifferent or numb by now.

11
Coverage

Possible requirement
The system should be able to understand
everything the user might say
Predicting user input
Analysis of corpora and iterative design of
hand-crafted grammars
Use of carefully designed prompts to constrain
user input is constrained
Learning grammar from data

12
Robustness

Characteristics of spontaneous spoken language
Disfluencies and filled pauses not just errors,
reflect cognitive aspects of speech production
and interaction management
Output from speech recognition component
Words and word boundaries not known with
certainty
Recognition errors
Approaches
Use of semantic grammars and robust parsing for
concepts spotting
Data-driven approaches learn mappings between
input strings and output structures

13
Developing the SLU component

Hand-crafted approaches
Grammar development
Parsing
Data-driven approaches
Learning from data
Statistical models rather than grammars
Efficient decoding

14
Hand-crafting grammars

Traditional software engineering approach of
design and iterative refinement
Decisions about type of grammar required
Chomsky hierarchy
Flat v hierarchical representations
Processing issues (parsing)
Dealing with ambiguity
Efficiency

15
Semantic Grammar and Robust Parsing PHOENIX
(CMU/CU)

The Phoenix parser maps input word strings on to
a sequence of semantic frames.
named set of slots, where the slots represent
related pieces of information.
each slot has an associated Context-Free Grammar
that specifies word string patterns that match
the slot
chart parsing with path pruning e.g. path that
accounts for fewer words is pruned

16
Deriving Meaning directly from ASR output
VoiceXML
Uses finite state grammars as language models for
recognition and semantic tags in the grammars for
semantic parsing
ASR
meaning representation
17
Deep understanding

Requirements for deep understanding
advanced grammatical formalisms
syntax-semantics issues
parsing technologies
Example TRIPS
Uses feature-based augmented CFG with
agenda-driven best-first chart parser
Combined strategy combining shallow and deep
parsing (Swift et al. )

18
Combined strategies TINA (MIT)

Grammar rules include mix of syntactic and
semantic categories
Context free grammar using probabilities trained
from user utterances to estimate likelihood of a
parse
Parse tree converted to a semantic frame that
encapsulates the meaning
Robust parsing strategy
Sentences that fail to parse are parsed using
fragments that are combined into a full semantic
frame
When all things fail, word spotting is used

19
Problems with hand-crafted approaches

Hand-crafted grammars are
not robust to spoken language input
require linguistic and engineering expertise to
develop if grammar is to have good coverage and
optimised performance
time consuming to develop
error prone
subject to designer bias
difficult to maintain

20
Statistical modelling for SLU
SLU as pattern matching problemGiven word
sequence W, find semantic representation of
meaning M that has maximum a posteriori
probability P(MW)
P(M) semantic prior model assigns probability
to underlying semantic structure P(WM)
lexicalisation model assigns probability to
word sequence W given the semantic structure
21
Early Examples

CHRONUS (ATT Pieraccini et al, 1992 Levin
Pieraccini, 1995)
Finite state semantic tagger
Flat-concept model simple to train but does
not represent hierarchical structure
HUM (Hidden Understanding Model) (BBN Miller et
al, 1995)
Probabilistic CFG using tree structured meaning
representations
Grammatical constraints represented in networks
rather than rules
Ordering of constituents unconstrained -
increases robustness
Transition probabilities constrain
over-generation
Requires fully annotated treebank data for
training

22
Using Hidden State Vectors (He Young)

Extends flat-concept HMM model
Represents hierarchical structure
(right-branching) using hidden state vectors
Each state expanded to encode stack of a push
down automaton
Avoids computational tractability issues
associated with hierarchical HMMs
Can be trained using lightly annotated data
Comparison with FST model and with hand-crafted
SLU systems using ATIS test sets and reference
parse results

23
Which flights arrive in Burbank from Denver on
Saturday?
24
SLU Evaluation Performance

Statistical models competitive with approaches
based on handcrafted rules
Hand-crafted grammars better for full
understanding and for users familiar with
systems coverage, statistical model better for
shallow and more robust understanding for naïve
users
Statistical systems more robust to noise and more
portable

25
SLU Evaluation Software Development

Cost of producing training data should be less
than cost of hand-crafting a semantic grammar
(Young, 2002)
Issues
Availability of training data
Maintainability
Portability
Objective metrics? e.g. time, resources, lines of
code,
Subjective issues e.g. designer bias, designer
control over system
Few concrete results, except
HVS model (He Young) can be robustly trained
from only minimally annotated corpus data
Model is robust to noise and portable to other
domains

26
Additional technologies

Named entity extraction
Rule-based methods e.g. using grammars in form
of regular expressions compiled into finite state
acceptors (ATT SLU system) higher precision
Statistical methods e.g. HMIHY, learn mappings
between strings and NEs higher recall as more
robust
Call routing
Question Answering

27
Additional Issues 1

ASR/SLU coupling
Post-processing results from ASR
noisy channel model of ASR errors (Ringger
Allen)
Combining shallow and deep parsing
major gains in speed, slight gains in accuracy
(Swift et al.)
Use of context, discourse history, prosodic
information
re-ordering n-best hypotheses
determining dialog act based on combinations of
features at various levels ASR and parse
probabilities, semantic and contextual features
(Purver et al, Lemon)

28
Additional Issues 2

Methods for learning from sparse data or without
annotation
e.g. ATT system uses active learning (Tur et
al, 2005) to reduce effort of human data
labelling uses only those data items that
improve classifier performance the most
Development tools e.g. SGStudio (Wang Acero)
build semantic grammar with little linguistic
knowledge

29
Additional Issues 3

Some issues addressed in poster session
Using SLU for
Dialog act tagging
Prosody labelling
User satisfaction analysis
Topic segmentation and labelling
Emotion prediction

30
Conclusions 1

SLU approach is determined by
type of application
finite state dialog with single word recognition
frame based dialog with topic classification and
named entity extraction
advanced dialog requiring deep understanding
simulated conversation,

31
Conclusions 2

SLU approach is determined by
type of output required
syntactic / semantic parse trees
semantic frames
speech / dialog acts,
intentions, beliefs, emotions,

32
Conclusions 3

SLU approach is determined by
Deployment and usability issues
applications requiring accurate extraction of
information
applications involving complex processing of
content
applications involving shallow processing of
content (e.g. conversational companions,
interactive games)

33
Selected References

Bangalore, S., Hakkani-Tür, D., Tur, G. (eds),
(2006) Special Issue on Spoken Language
Understanding in Conversational Systems. Speech
Communication 48.
Gupta, N., Tur, G., Hakkani-Tür, D., Bangalore,
S., Riccardi, G., Gilbert, M. (2006) The ATT
Spoken Language Understanding System. IEEE
Transactions on Speech and Audio Processing 141,
213-222.
Allen, JF, Byron, DK, Dzikovska, O, Ferguson, G,
Galescu, L, Stent, A. (2001) Towards
conversational human-computer interaction. AI
Magazine, 22(4)2735.
Jurafsky, D. Martin, J. (2000) Speech and
Language Processing, Prentice-Hall
Huang, X, Acero, A, Hon, H-W. (2001) Spoken
Language Processing A Guide to Theory, Algorithm
and System Development. Prentice-Hall