Belief Updating in Spoken Dialog Systems - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Belief Updating in Spoken Dialog Systems

Description:

More recently: system-specific semantic confidence scores [Carpenter, Walker, San-Segundo, etc] ... inhibit barge-in until Antoine finishes his thesis ~10 ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 39
Provided by: danb7
Category:

less

Transcript and Presenter's Notes

Title: Belief Updating in Spoken Dialog Systems


1
Belief Updating in Spoken Dialog Systems
  • Dialogs on Dialogs Reading Group
  • June, 2005
  • Dan Bohus
  • Carnegie Mellon University, January 2004

2
Misunderstandings
  • Misunderstandings are an important problem in
    spoken dialog systems
  • System obtains an incorrect semantic
    interpretation of the users utterance
  • 15-40 of turns
  • Significant negative impact on overall success
    rate

3
Confidence annotation
  • Use confidence scores to guard against potential
    misunderstandings
  • Traditionally from speech recognition engine
    Chase, Bansal, Cox, Kemp, etc
  • Focuses on WER, not tuned to task at hand
  • More recently system-specific semantic
    confidence scores Carpenter, Walker,
    San-Segundo, etc
  • Integrate knowledge from different levels in the
    system
  • speech recognition, language understanding,
    dialog management

4
Correction Detection
  • Detect whether or not the user is trying to
    correct the system
  • Related aware-site detection
  • Similar ML approaches using multiple sources of
    knowledge Litman, Swerts, Krahmer, etc

5
Proposed Belief Updating
  • Integrate confidence annotation and correction
    detection in a unified framework for continuously
    tracking beliefs
  • A belief updating problem

S Where are you flying from? U
CityNameAspen/0.6 Austin/0.2 S Did you
say you wanted to fly out of Aspen? U No/0.6
CityNameBoston/0.8
initial belief
system action
user response
updated belief
CityNameAspen/? Austin/?
Boston/?
6
Formally
  • Given
  • An initial belief Pinitial(C) over concept C
  • A system action SA
  • A user response R
  • Construct an updated belief Pupdated(C)
  • As accurate as possible
  • Pupdated(C) ? f (Pinitial(C), SA, R)

7
Examples
8
Examples - continued
9
Outline
  • Introduction
  • Data
  • A simplified version of the problem. Approach
  • User behaviors
  • Learning Preliminary results
  • More on evaluation
  • Where to from here?

data problem/approach user behaviors
preliminary results more on evaluation what
next?
10
Data
  • Collected in an experiment with RoomLine
  • Phone-based, mixed initiative system for making
    conference room reservations
  • Equipped with explicit and implicit confirmations
  • Corpus statistics
  • 46 participants
  • 449 sessions, 8278 turns
  • 13.5 misunderstandings 9.8 / 22.5
  • 25.6 WER 19.6 / 39.5
  • 11362 concept updates

data problem/approach user behaviors
preliminary results more on evaluation what
next?
11
System actions and concept updates
  • Explicit and implicit confirmations

data problem/approach user behaviors
preliminary results more on evaluation what
next?
12
System actions and concept updates
  • Implicit Confirmations Task

data problem/approach user behaviors
preliminary results more on evaluation what
next?
13
of Conflicting Hypotheses
  • Below 3 involve more than 1 hypothesis
  • System not using multiple hypotheses
  • Future work regenerate multiple hypotheses in
    batch

data problem/approach user behaviors
preliminary results more on evaluation what
next?
14
Outline
  • Introduction
  • Data
  • A simplified version of the problem. Approach
  • User behaviors
  • Learning preliminary results
  • More on evaluation
  • Where to from here?

data problem/approach user behaviors
preliminary results more on evaluation what
next?
15
A Simplified Version
  • Given only 3 have more than 1 hypothesis,
  • Update belief in the top-hypothesis after
    implicit and explicit confirmations
  • Instead of
  • Pupdated(C) ? f (Pinitial(C), SA, R)
  • Do
  • ConfTopupdated(C) ? f (ConfTopinitial(C), SA, R)
  • For SA EC, IC, ICT

data problem/approach user behaviors
preliminary results more on evaluation what
next?
16
Approach
  • Use machine learning
  • Dataset
  • Concept updates for EC, IC, ICTs
  • Features
  • Initial confidence score ConfTopinitial(C)
  • System action (SA)
  • User response (R)
  • Target
  • Updated confidence score ConfTopupdated(C)
  • Data is labeled, so we have a binary target

data problem/approach user behaviors
preliminary results more on evaluation what
next?
17
Outline
  • Introduction
  • Data
  • A simplified version of the problem. Approach
  • User behaviors
  • Learning preliminary results
  • More on evaluation
  • Where to from here?

data problem/approach user behaviors
preliminary results more on evaluation what
next?
18
User behaviors
  • Study of user behaviors in response to ICs and
    ECs
  • Can inform feature selection and feature
    development
  • Provide insights into where the difficulties are
  • Can inform potential strategy refinements

data problem/approach user behaviors
preliminary results more on evaluation what
next?
19
User responses to ECs
  • Transcripts
  • Decoded

data problem/approach user behaviors
preliminary results more on evaluation what
next?
20
Other Responses to EC
  • Eyeball estimates (out of 146 responses)
  • 70 simply repeat the correct concept value
  • That should come in as a handy feature
  • 10 change conversation focus
  • 10 turn overtaking issues
  • Maybe inhibit barge-in until Antoine finishes his
    thesis
  • 10 other

data problem/approach user behaviors
preliminary results more on evaluation what
next?
21
User responses to ICs
  • Transcripts
  • Decoded

data problem/approach user behaviors
preliminary results more on evaluation what
next?
22
Users Dont Always Correct ICs
  • Actually, they corrected in 45 of the cases
  • That means if we knew exactly when they correct,
    wed still have (1261)/788 16 error
  • So what do users do when they dont correct?
  • They may actually correct partially
  • Completely ignore the error (if non-essential)
  • Readjust to accommodate task

data problem/approach user behaviors
preliminary results more on evaluation what
next?
23
More questions
  • Understand better this ignore phenomenon
  • Impact on task success?
  • IC correction rate 49 (successful tasks) vs 41
    (unsuccessful)
  • Fixed vs more flexible scenarios
  • Impact of prompt length on P(user will correct)?
  • Essential vs non-essential concepts?

data problem/approach user behaviors
preliminary results more on evaluation what
next?
24
Outline
  • Introduction
  • Data
  • A simplified version of the problem. Approach
  • User behaviors
  • Learning preliminary results
  • More on evaluation
  • Where to from here?

data problem/approach user behaviors
preliminary results more on evaluation what
next?
25
Which ML technique?
  • Need good probability outputs
  • Margins produced by discriminant classifiers are
    inadequate
  • If you want probability scores, i.e. conf 0.85
    means that in 85 of cases with conf0.85 the
    concept is right
  • evaluate on a soft-metric Ill contradict myself
    later!! ?
  • Step-wise logistic regression
  • Sample-efficient
  • Feature selection
  • Good soft-metric performance
  • optimizes for avg. log likelihood of data

data problem/approach user behaviors
preliminary results more on evaluation what
next?
26
Data. Features
  • For each system action EC, IC, ICT
  • Initial Confidence score
  • Other indicators about current state
  • How well has the dialog been going
  • Which concept are we talking about
  • How far back was this concept acquired
  • Features on user response
  • Confirmation and Disconfirmation markers
  • Acoustic / Prosodic f0 (min, max, range,
    maxslope, etc) normalized versions
  • Num words turn length (secs)
  • Concept information expected / repeated / new
    concepts and grammar slots
  • Confidence
  • Barge-in Timeout info
  • Lexical features (preselected by MI with target
    or confirm/disconfirm markers)

data problem/approach user behaviors
preliminary results more on evaluation what
next?
27
Results
  • Actually using a 1-level logistic model-tree
  • Split on answer_type yes, no, other, no_parse
  • Perform step-wise logistic regression on the 4
    leaves
  • P-entry 0.05
  • P-reject 0.30
  • BIC stopping criterion
  • Also tried full-blown model tree, results are
    similar, maybe marginally worse

data problem/approach user behaviors
preliminary results more on evaluation what
next?
28
Explicit Confirmation
data problem/approach user behaviors
preliminary results more on evaluation what
next?
29
Implicit Confirmation
data problem/approach user behaviors
preliminary results more on evaluation what
next?
30
Outline
  • Introduction
  • Data
  • A simplified version of the problem. Approach
  • User behaviors
  • Learning preliminary results
  • More on evaluation
  • Where to from here?

data problem/approach user behaviors
preliminary results more on evaluation what
next?
31
What can Logistic Regression / AVG-LL do for you?
  • D d1, d2, d3, d4, di 1/0
  • P(D) ?P(di1 xi)
  • Express density P(di1 xi) as
  • P(d1 x) 1 / (1 exp(-wx))
  • You can actually derive this if you start with
    P(x d) gaussian
  • Find parameters w to max(P(D))
  • argmax(P(D)) argmax ?P(di1 xi)
  • argmax(P(D)) argmin ?-log(P(di1 xi))
  • Hence we maximize the average log-likelihood
  • But what does that mean?

data problem/approach user behaviors
preliminary results more on evaluation what
next?
32
Loss function in Logistic Regression
  • Log-likelihood loss function

If d1, then P(d1)0.01 is ten times worse than
P(d1)0.1, but P(d1)0.7 is about the same as
P(d1)0.8 Things are mirrored for d0
0.01 0.1 0.7
0.8 1
d1
data problem/approach user behaviors
preliminary results more on evaluation what
next?
33
A New Loss Function T2
  • A loss function that better matches our domain
    T2 (or even T3)

d1
d0
C3
C1
C4
C2
0 t1 t2
1
0 t1 t2
1
  • Optimize argmax ? T2(P(dic xi))
  • Not differentiable ?
  • Not convex ?

data problem/approach user behaviors
preliminary results more on evaluation what
next?
34
Smoothed version
  • A loss function that better matches our domain
    T2 (or even T3)

d1
SmoothT2(p) s1(p) s2(p) si(p) 1 /
(1exp(ki(p-?i))) with ks and ?s chosen
accordingly
C1
C2
0 t1 t2
1
  • Optimize argmax ? SmoothT2(P(dic xi))
  • Differentiable! ?
  • But still not convex ? multiple local maxima

data problem/approach user behaviors
preliminary results more on evaluation what
next?
35
Costs Thresholds
  • Costs where from?
  • Expert knowledge
  • Derive from data (might be tricky)
  • Thresholds where from?
  • Fixed
  • Actually optimize at the same time
  • SmoothT2 SmoothT2(w, th1, th2)
  • Differentiable in th1 and th2, so we can do
    gradient search for it
  • Calibrates in one step both the belief updating
    and the threshold to minimize loss

data problem/approach user behaviors
preliminary results more on evaluation what
next?
36
Questions What Next?
  • ICT can we do anything there?
  • Looks really tough
  • Push for better performance
  • Add more features?
  • Debug the models more, eliminate singularities
  • Why doesnt the model-tree do better?
  • Push for better understanding
  • What are the other interesting questions
  • Optimize for new loss function
  • More in the future look at the full belief
    updating problem

data problem/approach user behaviors
preliminary results more on evaluation what
next?
37
Thank You!
38
Encoding System Actions
  • For each concept update, define system action
    signature ltIC, ICT, EC, REQgt
  • IC Implicit Confirm grounding
  • ICT Implicit Confirm task
  • EC Explicit Confirm
  • REQ Request
  • Each variable can have 1 of 4 values
  • 0
  • C (action happens on concept of interest)
  • OC (action happens on some other concept)
  • COC (action happens both on concept of interest
    and some other concept)
  • Only certain combinations are valid and appear in
    the data
Write a Comment
User Comments (0)
About PowerShow.com