Title: Belief Updating in Spoken Dialog Systems
1Belief Updating in Spoken Dialog Systems
- Dialogs on Dialogs Reading Group
- June, 2005
- Dan Bohus
- Carnegie Mellon University, January 2004
2Misunderstandings
- Misunderstandings are an important problem in
spoken dialog systems - System obtains an incorrect semantic
interpretation of the users utterance - 15-40 of turns
- Significant negative impact on overall success
rate
3Confidence annotation
- Use confidence scores to guard against potential
misunderstandings - Traditionally from speech recognition engine
Chase, Bansal, Cox, Kemp, etc - Focuses on WER, not tuned to task at hand
- More recently system-specific semantic
confidence scores Carpenter, Walker,
San-Segundo, etc - Integrate knowledge from different levels in the
system - speech recognition, language understanding,
dialog management
4Correction Detection
- Detect whether or not the user is trying to
correct the system - Related aware-site detection
- Similar ML approaches using multiple sources of
knowledge Litman, Swerts, Krahmer, etc
5Proposed Belief Updating
- Integrate confidence annotation and correction
detection in a unified framework for continuously
tracking beliefs
- A belief updating problem
S Where are you flying from? U
CityNameAspen/0.6 Austin/0.2 S Did you
say you wanted to fly out of Aspen? U No/0.6
CityNameBoston/0.8
initial belief
system action
user response
updated belief
CityNameAspen/? Austin/?
Boston/?
6Formally
- Given
- An initial belief Pinitial(C) over concept C
- A system action SA
- A user response R
- Construct an updated belief Pupdated(C)
- As accurate as possible
- Pupdated(C) ? f (Pinitial(C), SA, R)
7Examples
8Examples - continued
9Outline
- Introduction
- Data
- A simplified version of the problem. Approach
- User behaviors
- Learning Preliminary results
- More on evaluation
- Where to from here?
data problem/approach user behaviors
preliminary results more on evaluation what
next?
10Data
- Collected in an experiment with RoomLine
- Phone-based, mixed initiative system for making
conference room reservations - Equipped with explicit and implicit confirmations
- Corpus statistics
- 46 participants
- 449 sessions, 8278 turns
- 13.5 misunderstandings 9.8 / 22.5
- 25.6 WER 19.6 / 39.5
- 11362 concept updates
data problem/approach user behaviors
preliminary results more on evaluation what
next?
11System actions and concept updates
- Explicit and implicit confirmations
data problem/approach user behaviors
preliminary results more on evaluation what
next?
12System actions and concept updates
- Implicit Confirmations Task
data problem/approach user behaviors
preliminary results more on evaluation what
next?
13 of Conflicting Hypotheses
- Below 3 involve more than 1 hypothesis
- System not using multiple hypotheses
- Future work regenerate multiple hypotheses in
batch
data problem/approach user behaviors
preliminary results more on evaluation what
next?
14Outline
- Introduction
- Data
- A simplified version of the problem. Approach
- User behaviors
- Learning preliminary results
- More on evaluation
- Where to from here?
data problem/approach user behaviors
preliminary results more on evaluation what
next?
15A Simplified Version
- Given only 3 have more than 1 hypothesis,
- Update belief in the top-hypothesis after
implicit and explicit confirmations - Instead of
- Pupdated(C) ? f (Pinitial(C), SA, R)
- Do
- ConfTopupdated(C) ? f (ConfTopinitial(C), SA, R)
- For SA EC, IC, ICT
data problem/approach user behaviors
preliminary results more on evaluation what
next?
16Approach
- Use machine learning
- Dataset
- Concept updates for EC, IC, ICTs
- Features
- Initial confidence score ConfTopinitial(C)
- System action (SA)
- User response (R)
- Target
- Updated confidence score ConfTopupdated(C)
- Data is labeled, so we have a binary target
data problem/approach user behaviors
preliminary results more on evaluation what
next?
17Outline
- Introduction
- Data
- A simplified version of the problem. Approach
- User behaviors
- Learning preliminary results
- More on evaluation
- Where to from here?
data problem/approach user behaviors
preliminary results more on evaluation what
next?
18User behaviors
- Study of user behaviors in response to ICs and
ECs - Can inform feature selection and feature
development - Provide insights into where the difficulties are
- Can inform potential strategy refinements
data problem/approach user behaviors
preliminary results more on evaluation what
next?
19User responses to ECs
data problem/approach user behaviors
preliminary results more on evaluation what
next?
20Other Responses to EC
- Eyeball estimates (out of 146 responses)
- 70 simply repeat the correct concept value
- That should come in as a handy feature
- 10 change conversation focus
- 10 turn overtaking issues
- Maybe inhibit barge-in until Antoine finishes his
thesis - 10 other
data problem/approach user behaviors
preliminary results more on evaluation what
next?
21User responses to ICs
data problem/approach user behaviors
preliminary results more on evaluation what
next?
22Users Dont Always Correct ICs
- Actually, they corrected in 45 of the cases
- That means if we knew exactly when they correct,
wed still have (1261)/788 16 error - So what do users do when they dont correct?
- They may actually correct partially
- Completely ignore the error (if non-essential)
- Readjust to accommodate task
data problem/approach user behaviors
preliminary results more on evaluation what
next?
23More questions
- Understand better this ignore phenomenon
- Impact on task success?
- IC correction rate 49 (successful tasks) vs 41
(unsuccessful) - Fixed vs more flexible scenarios
- Impact of prompt length on P(user will correct)?
- Essential vs non-essential concepts?
data problem/approach user behaviors
preliminary results more on evaluation what
next?
24Outline
- Introduction
- Data
- A simplified version of the problem. Approach
- User behaviors
- Learning preliminary results
- More on evaluation
- Where to from here?
data problem/approach user behaviors
preliminary results more on evaluation what
next?
25Which ML technique?
- Need good probability outputs
- Margins produced by discriminant classifiers are
inadequate - If you want probability scores, i.e. conf 0.85
means that in 85 of cases with conf0.85 the
concept is right - evaluate on a soft-metric Ill contradict myself
later!! ? - Step-wise logistic regression
- Sample-efficient
- Feature selection
- Good soft-metric performance
- optimizes for avg. log likelihood of data
data problem/approach user behaviors
preliminary results more on evaluation what
next?
26Data. Features
- For each system action EC, IC, ICT
- Initial Confidence score
- Other indicators about current state
- How well has the dialog been going
- Which concept are we talking about
- How far back was this concept acquired
- Features on user response
- Confirmation and Disconfirmation markers
- Acoustic / Prosodic f0 (min, max, range,
maxslope, etc) normalized versions - Num words turn length (secs)
- Concept information expected / repeated / new
concepts and grammar slots - Confidence
- Barge-in Timeout info
- Lexical features (preselected by MI with target
or confirm/disconfirm markers)
data problem/approach user behaviors
preliminary results more on evaluation what
next?
27Results
- Actually using a 1-level logistic model-tree
- Split on answer_type yes, no, other, no_parse
- Perform step-wise logistic regression on the 4
leaves - P-entry 0.05
- P-reject 0.30
- BIC stopping criterion
- Also tried full-blown model tree, results are
similar, maybe marginally worse
data problem/approach user behaviors
preliminary results more on evaluation what
next?
28Explicit Confirmation
data problem/approach user behaviors
preliminary results more on evaluation what
next?
29Implicit Confirmation
data problem/approach user behaviors
preliminary results more on evaluation what
next?
30Outline
- Introduction
- Data
- A simplified version of the problem. Approach
- User behaviors
- Learning preliminary results
- More on evaluation
- Where to from here?
data problem/approach user behaviors
preliminary results more on evaluation what
next?
31What can Logistic Regression / AVG-LL do for you?
- D d1, d2, d3, d4, di 1/0
- P(D) ?P(di1 xi)
- Express density P(di1 xi) as
- P(d1 x) 1 / (1 exp(-wx))
- You can actually derive this if you start with
P(x d) gaussian - Find parameters w to max(P(D))
- argmax(P(D)) argmax ?P(di1 xi)
- argmax(P(D)) argmin ?-log(P(di1 xi))
- Hence we maximize the average log-likelihood
- But what does that mean?
data problem/approach user behaviors
preliminary results more on evaluation what
next?
32Loss function in Logistic Regression
- Log-likelihood loss function
If d1, then P(d1)0.01 is ten times worse than
P(d1)0.1, but P(d1)0.7 is about the same as
P(d1)0.8 Things are mirrored for d0
0.01 0.1 0.7
0.8 1
d1
data problem/approach user behaviors
preliminary results more on evaluation what
next?
33A New Loss Function T2
- A loss function that better matches our domain
T2 (or even T3)
d1
d0
C3
C1
C4
C2
0 t1 t2
1
0 t1 t2
1
- Optimize argmax ? T2(P(dic xi))
- Not differentiable ?
- Not convex ?
data problem/approach user behaviors
preliminary results more on evaluation what
next?
34Smoothed version
- A loss function that better matches our domain
T2 (or even T3)
d1
SmoothT2(p) s1(p) s2(p) si(p) 1 /
(1exp(ki(p-?i))) with ks and ?s chosen
accordingly
C1
C2
0 t1 t2
1
- Optimize argmax ? SmoothT2(P(dic xi))
- Differentiable! ?
- But still not convex ? multiple local maxima
data problem/approach user behaviors
preliminary results more on evaluation what
next?
35Costs Thresholds
- Costs where from?
- Expert knowledge
- Derive from data (might be tricky)
- Thresholds where from?
- Fixed
- Actually optimize at the same time
- SmoothT2 SmoothT2(w, th1, th2)
- Differentiable in th1 and th2, so we can do
gradient search for it - Calibrates in one step both the belief updating
and the threshold to minimize loss
data problem/approach user behaviors
preliminary results more on evaluation what
next?
36Questions What Next?
- ICT can we do anything there?
- Looks really tough
- Push for better performance
- Add more features?
- Debug the models more, eliminate singularities
- Why doesnt the model-tree do better?
- Push for better understanding
- What are the other interesting questions
- Optimize for new loss function
- More in the future look at the full belief
updating problem
data problem/approach user behaviors
preliminary results more on evaluation what
next?
37Thank You!
38Encoding System Actions
- For each concept update, define system action
signature ltIC, ICT, EC, REQgt - IC Implicit Confirm grounding
- ICT Implicit Confirm task
- EC Explicit Confirm
- REQ Request
- Each variable can have 1 of 4 values
- 0
- C (action happens on concept of interest)
- OC (action happens on some other concept)
- COC (action happens both on concept of interest
and some other concept) - Only certain combinations are valid and appear in
the data