Spoken Language Understanding for Conversational Dialog Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Spoken Language Understanding for Conversational Dialog Systems

Description:

IEEE/ACL 2006 Workshop on Spoken Language Technology. Aruba, December 10-13, 2006 ... SLU module creates a database query from user's spoken input by ... – PowerPoint PPT presentation

Number of Views:239
Avg rating:3.0/5.0
Slides: 34
Provided by: michael654
Learn more at: http://www.slt2006.org
Category:

less

Transcript and Presenter's Notes

Title: Spoken Language Understanding for Conversational Dialog Systems


1
Spoken Language Understanding for Conversational
Dialog Systems
  • Michael McTear
  • University of Ulster

IEEE/ACL 2006 Workshop on Spoken Language
Technology Aruba, December 10-13, 2006
2
Overview
  • Introductory definitions
  • Task-based and conversational dialog systems
  • Spoken language understanding
  • Issues for spoken language understanding
  • Coverage
  • Robustness
  • Overview of spoken language understanding
  • Hand-crafted approaches
  • Data-driven methods
  • Conclusions

3
Basic dialog system architecture
4
Task-based Dialog Systems
  • Mainly interact with databases to get information
    or support transactions
  • SLU module creates a database query from users
    spoken input by extracting relevant concepts
  • System initiative constrains user input
  • Keyword / keyphrase extraction
  • User-initiative less constrained input
  • Call-routing call classification with named
    entity extraction
  • Question answering

5
Conversational Dialog
  • AI (agent-based systems) e.g. TRIPS
  • User can take initiative, e.g. raise new topic,
    ask for clarification (TRIPS)
  • More complex interactions involving recognition
    of the users intentions, goals, beliefs or plans
  • Deep understanding of the users utterance,
    taking into account contextual information
  • Information State Theory, Planning Theory, User
    Modelling, Belief Modelling
  • Simulated conversation e.g. CONVERSE
  • Conversational companions, chatbots, help desk
  • Does not require deep understanding
  • SLU involves identifying system utterance type
    and determining a suitable response

6
Defining Spoken Language Understanding
  • extracting the meaning from speech utterances
  • a transduction of the recognition result to an
    interpretable representation
  • Meaning (in humancomputer interactive systems)
  • a representation that can be executed by an
    interpreter in order to change the state of the
    system
  • Bangelore et al. 2006

7
SLU for task based systems
  • a flight from Belfast to Malaga
  • uh Id like uh um could you uh is there a flight
    from Bel- uh Belfast to um Gran- I mean Malaga
  • I would like to find a flight from Pittsburgh to
    Boston on Wednesday and I have to be in Boston by
    one so I would like a flight out of here no later
    than 11 a.m.

8
SLU for advanced conversational systems (TRIPS)
  • Interpretation requires intention recognitioncan
    we use a helicopter to get the people from
    Abyss (request to modify plan) Barnacle
    (include sub-goal and suggest solution) Delta
    (extension of a solution)
  • Six possible interpretations with only change of
    city name
  • Requires reasoning about task and current context
    to identify most plausible interpretation
  • Requires more than concept spotting to identify
    structure and meaning of utterance as basis for
    reasoning

9
Role of syntax in deep understanding
  • List all employees of the companies who/which are
    based in the city centre
  • I would like to know where to mail the check.
  • I would like you to mail the check to me

10
SLU for simulated dialog
  • C Are you still a friend of XXX?
  • H I am not sure if he has any real friends. He
    has achieved a lot and has left a lot of people
    behind.
  • C You really dont like the guy - well, at least
    thats definite - I thought most people were just
    indifferent or numb by now.

11
Coverage
  • Possible requirement
  • The system should be able to understand
    everything the user might say
  • Predicting user input
  • Analysis of corpora and iterative design of
    hand-crafted grammars
  • Use of carefully designed prompts to constrain
    user input is constrained
  • Learning grammar from data

12
Robustness
  • Characteristics of spontaneous spoken language
  • Disfluencies and filled pauses not just errors,
    reflect cognitive aspects of speech production
    and interaction management
  • Output from speech recognition component
  • Words and word boundaries not known with
    certainty
  • Recognition errors
  • Approaches
  • Use of semantic grammars and robust parsing for
    concepts spotting
  • Data-driven approaches learn mappings between
    input strings and output structures

13
Developing the SLU component
  • Hand-crafted approaches
  • Grammar development
  • Parsing
  • Data-driven approaches
  • Learning from data
  • Statistical models rather than grammars
  • Efficient decoding

14
Hand-crafting grammars
  • Traditional software engineering approach of
    design and iterative refinement
  • Decisions about type of grammar required
  • Chomsky hierarchy
  • Flat v hierarchical representations
  • Processing issues (parsing)
  • Dealing with ambiguity
  • Efficiency

15
Semantic Grammar and Robust Parsing PHOENIX
(CMU/CU)
  • The Phoenix parser maps input word strings on to
    a sequence of semantic frames.
  • named set of slots, where the slots represent
    related pieces of information.
  • each slot has an associated Context-Free Grammar
    that specifies word string patterns that match
    the slot
  • chart parsing with path pruning e.g. path that
    accounts for fewer words is pruned

16
Deriving Meaning directly from ASR output
VoiceXML
Uses finite state grammars as language models for
recognition and semantic tags in the grammars for
semantic parsing
ASR
meaning representation
17
Deep understanding
  • Requirements for deep understanding
  • advanced grammatical formalisms
  • syntax-semantics issues
  • parsing technologies
  • Example TRIPS
  • Uses feature-based augmented CFG with
    agenda-driven best-first chart parser
  • Combined strategy combining shallow and deep
    parsing (Swift et al. )

18
Combined strategies TINA (MIT)
  • Grammar rules include mix of syntactic and
    semantic categories
  • Context free grammar using probabilities trained
    from user utterances to estimate likelihood of a
    parse
  • Parse tree converted to a semantic frame that
    encapsulates the meaning
  • Robust parsing strategy
  • Sentences that fail to parse are parsed using
    fragments that are combined into a full semantic
    frame
  • When all things fail, word spotting is used

19
Problems with hand-crafted approaches
  • Hand-crafted grammars are
  • not robust to spoken language input
  • require linguistic and engineering expertise to
    develop if grammar is to have good coverage and
    optimised performance
  • time consuming to develop
  • error prone
  • subject to designer bias
  • difficult to maintain

20
Statistical modelling for SLU
SLU as pattern matching problemGiven word
sequence W, find semantic representation of
meaning M that has maximum a posteriori
probability P(MW)
P(M) semantic prior model assigns probability
to underlying semantic structure P(WM)
lexicalisation model assigns probability to
word sequence W given the semantic structure
21
Early Examples
  • CHRONUS (ATT Pieraccini et al, 1992 Levin
    Pieraccini, 1995)
  • Finite state semantic tagger
  • Flat-concept model simple to train but does
    not represent hierarchical structure
  • HUM (Hidden Understanding Model) (BBN Miller et
    al, 1995)
  • Probabilistic CFG using tree structured meaning
    representations
  • Grammatical constraints represented in networks
    rather than rules
  • Ordering of constituents unconstrained -
    increases robustness
  • Transition probabilities constrain
    over-generation
  • Requires fully annotated treebank data for
    training

22
Using Hidden State Vectors (He Young)
  • Extends flat-concept HMM model
  • Represents hierarchical structure
    (right-branching) using hidden state vectors
  • Each state expanded to encode stack of a push
    down automaton
  • Avoids computational tractability issues
    associated with hierarchical HMMs
  • Can be trained using lightly annotated data
  • Comparison with FST model and with hand-crafted
    SLU systems using ATIS test sets and reference
    parse results

23
Which flights arrive in Burbank from Denver on
Saturday?
24
SLU Evaluation Performance
  • Statistical models competitive with approaches
    based on handcrafted rules
  • Hand-crafted grammars better for full
    understanding and for users familiar with
    systems coverage, statistical model better for
    shallow and more robust understanding for naïve
    users
  • Statistical systems more robust to noise and more
    portable

25
SLU Evaluation Software Development
  • Cost of producing training data should be less
    than cost of hand-crafting a semantic grammar
    (Young, 2002)
  • Issues
  • Availability of training data
  • Maintainability
  • Portability
  • Objective metrics? e.g. time, resources, lines of
    code,
  • Subjective issues e.g. designer bias, designer
    control over system
  • Few concrete results, except
  • HVS model (He Young) can be robustly trained
    from only minimally annotated corpus data
  • Model is robust to noise and portable to other
    domains

26
Additional technologies
  • Named entity extraction
  • Rule-based methods e.g. using grammars in form
    of regular expressions compiled into finite state
    acceptors (ATT SLU system) higher precision
  • Statistical methods e.g. HMIHY, learn mappings
    between strings and NEs higher recall as more
    robust
  • Call routing
  • Question Answering

27
Additional Issues 1
  • ASR/SLU coupling
  • Post-processing results from ASR
  • noisy channel model of ASR errors (Ringger
    Allen)
  • Combining shallow and deep parsing
  • major gains in speed, slight gains in accuracy
    (Swift et al.)
  • Use of context, discourse history, prosodic
    information
  • re-ordering n-best hypotheses
  • determining dialog act based on combinations of
    features at various levels ASR and parse
    probabilities, semantic and contextual features
    (Purver et al, Lemon)

28
Additional Issues 2
  • Methods for learning from sparse data or without
    annotation
  • e.g. ATT system uses active learning (Tur et
    al, 2005) to reduce effort of human data
    labelling uses only those data items that
    improve classifier performance the most
  • Development tools e.g. SGStudio (Wang Acero)
    build semantic grammar with little linguistic
    knowledge

29
Additional Issues 3
  • Some issues addressed in poster session
  • Using SLU for
  • Dialog act tagging
  • Prosody labelling
  • User satisfaction analysis
  • Topic segmentation and labelling
  • Emotion prediction

30
Conclusions 1
  • SLU approach is determined by
  • type of application
  • finite state dialog with single word recognition
  • frame based dialog with topic classification and
    named entity extraction
  • advanced dialog requiring deep understanding
  • simulated conversation,

31
Conclusions 2
  • SLU approach is determined by
  • type of output required
  • syntactic / semantic parse trees
  • semantic frames
  • speech / dialog acts,
  • intentions, beliefs, emotions,

32
Conclusions 3
  • SLU approach is determined by
  • Deployment and usability issues
  • applications requiring accurate extraction of
    information
  • applications involving complex processing of
    content
  • applications involving shallow processing of
    content (e.g. conversational companions,
    interactive games)

33
Selected References
  • Bangalore, S., Hakkani-Tür, D., Tur, G. (eds),
    (2006) Special Issue on Spoken Language
    Understanding in Conversational Systems. Speech
    Communication 48.
  • Gupta, N., Tur, G., Hakkani-Tür, D., Bangalore,
    S., Riccardi, G., Gilbert, M. (2006) The ATT
    Spoken Language Understanding System. IEEE
    Transactions on Speech and Audio Processing 141,
    213-222.
  • Allen, JF, Byron, DK, Dzikovska, O, Ferguson, G,
    Galescu, L, Stent, A. (2001) Towards
    conversational human-computer interaction. AI
    Magazine, 22(4)2735.
  • Jurafsky, D. Martin, J. (2000) Speech and
    Language Processing, Prentice-Hall
  • Huang, X, Acero, A, Hon, H-W. (2001) Spoken
    Language Processing A Guide to Theory, Algorithm
    and System Development. Prentice-Hall
Write a Comment
User Comments (0)
About PowerShow.com