Belief Updating in Spoken Dialog Systems - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Belief Updating in Spoken Dialog Systems

Description:

More recently: system-specific semantic confidence scores [Carpenter, Walker, San-Segundo, etc] ... inhibit barge-in until Antoine finishes his thesis ~10 ... – PowerPoint PPT presentation

Number of Views:90

Avg rating:3.0/5.0

Slides: 39

Provided by: danb7

Category:

more less

Transcript and Presenter's Notes

Title: Belief Updating in Spoken Dialog Systems

1
Belief Updating in Spoken Dialog Systems

Dialogs on Dialogs Reading Group
June, 2005
Dan Bohus
Carnegie Mellon University, January 2004

2
Misunderstandings

Misunderstandings are an important problem in
spoken dialog systems
System obtains an incorrect semantic
interpretation of the users utterance
15-40 of turns
Significant negative impact on overall success
rate

3
Confidence annotation

Use confidence scores to guard against potential
misunderstandings
Traditionally from speech recognition engine
Chase, Bansal, Cox, Kemp, etc
Focuses on WER, not tuned to task at hand
More recently system-specific semantic
confidence scores Carpenter, Walker,
San-Segundo, etc
Integrate knowledge from different levels in the
system
speech recognition, language understanding,
dialog management

4
Correction Detection

Detect whether or not the user is trying to
correct the system
Related aware-site detection
Similar ML approaches using multiple sources of
knowledge Litman, Swerts, Krahmer, etc

5
Proposed Belief Updating

Integrate confidence annotation and correction
detection in a unified framework for continuously
tracking beliefs

A belief updating problem

S Where are you flying from? U
CityNameAspen/0.6 Austin/0.2 S Did you
say you wanted to fly out of Aspen? U No/0.6
CityNameBoston/0.8
initial belief
system action
user response
updated belief
CityNameAspen/? Austin/?
Boston/?
6
Formally

Given
An initial belief Pinitial(C) over concept C
A system action SA
A user response R
Construct an updated belief Pupdated(C)
As accurate as possible
Pupdated(C) ? f (Pinitial(C), SA, R)

7
Examples
8
Examples - continued
9
Outline

Introduction
Data
A simplified version of the problem. Approach
User behaviors
Learning Preliminary results
More on evaluation
Where to from here?

data problem/approach user behaviors
preliminary results more on evaluation what
next?
10
Data

Collected in an experiment with RoomLine
Phone-based, mixed initiative system for making
conference room reservations
Equipped with explicit and implicit confirmations
Corpus statistics
46 participants
449 sessions, 8278 turns
13.5 misunderstandings 9.8 / 22.5
25.6 WER 19.6 / 39.5
11362 concept updates

data problem/approach user behaviors
preliminary results more on evaluation what
next?
11
System actions and concept updates

Explicit and implicit confirmations

data problem/approach user behaviors
preliminary results more on evaluation what
next?
12
System actions and concept updates

Implicit Confirmations Task

data problem/approach user behaviors
preliminary results more on evaluation what
next?
13
of Conflicting Hypotheses

Below 3 involve more than 1 hypothesis
System not using multiple hypotheses
Future work regenerate multiple hypotheses in
batch

data problem/approach user behaviors
preliminary results more on evaluation what
next?
14
Outline

Introduction
Data
A simplified version of the problem. Approach
User behaviors
Learning preliminary results
More on evaluation
Where to from here?

data problem/approach user behaviors
preliminary results more on evaluation what
next?
15
A Simplified Version

Given only 3 have more than 1 hypothesis,
Update belief in the top-hypothesis after
implicit and explicit confirmations
Instead of
Pupdated(C) ? f (Pinitial(C), SA, R)
Do
ConfTopupdated(C) ? f (ConfTopinitial(C), SA, R)
For SA EC, IC, ICT

data problem/approach user behaviors
preliminary results more on evaluation what
next?
16
Approach

Use machine learning
Dataset
Concept updates for EC, IC, ICTs
Features
Initial confidence score ConfTopinitial(C)
System action (SA)
User response (R)
Target
Updated confidence score ConfTopupdated(C)
Data is labeled, so we have a binary target

data problem/approach user behaviors
preliminary results more on evaluation what
next?
17
Outline

Introduction
Data
A simplified version of the problem. Approach
User behaviors
Learning preliminary results
More on evaluation
Where to from here?

data problem/approach user behaviors
preliminary results more on evaluation what
next?
18
User behaviors

Study of user behaviors in response to ICs and
ECs
Can inform feature selection and feature
development
Provide insights into where the difficulties are
Can inform potential strategy refinements

data problem/approach user behaviors
preliminary results more on evaluation what
next?
19
User responses to ECs

Transcripts
Decoded

data problem/approach user behaviors
preliminary results more on evaluation what
next?
20
Other Responses to EC

Eyeball estimates (out of 146 responses)
70 simply repeat the correct concept value
That should come in as a handy feature
10 change conversation focus
10 turn overtaking issues
Maybe inhibit barge-in until Antoine finishes his
thesis
10 other

data problem/approach user behaviors
preliminary results more on evaluation what
next?
21
User responses to ICs

Transcripts
Decoded

data problem/approach user behaviors
preliminary results more on evaluation what
next?
22
Users Dont Always Correct ICs

Actually, they corrected in 45 of the cases

That means if we knew exactly when they correct,
wed still have (1261)/788 16 error
So what do users do when they dont correct?
They may actually correct partially
Completely ignore the error (if non-essential)
Readjust to accommodate task

data problem/approach user behaviors
preliminary results more on evaluation what
next?
23
More questions

Understand better this ignore phenomenon
Impact on task success?
IC correction rate 49 (successful tasks) vs 41
(unsuccessful)
Fixed vs more flexible scenarios
Impact of prompt length on P(user will correct)?
Essential vs non-essential concepts?

data problem/approach user behaviors
preliminary results more on evaluation what
next?
24
Outline

Introduction
Data
A simplified version of the problem. Approach
User behaviors
Learning preliminary results
More on evaluation
Where to from here?

data problem/approach user behaviors
preliminary results more on evaluation what
next?
25
Which ML technique?

Need good probability outputs
Margins produced by discriminant classifiers are
inadequate
If you want probability scores, i.e. conf 0.85
means that in 85 of cases with conf0.85 the
concept is right
evaluate on a soft-metric Ill contradict myself
later!! ?
Step-wise logistic regression
Sample-efficient
Feature selection
Good soft-metric performance
optimizes for avg. log likelihood of data

data problem/approach user behaviors
preliminary results more on evaluation what
next?
26
Data. Features

For each system action EC, IC, ICT
Initial Confidence score
Other indicators about current state
How well has the dialog been going
Which concept are we talking about
How far back was this concept acquired
Features on user response
Confirmation and Disconfirmation markers
Acoustic / Prosodic f0 (min, max, range,
maxslope, etc) normalized versions
Num words turn length (secs)
Concept information expected / repeated / new
concepts and grammar slots
Confidence
Barge-in Timeout info
Lexical features (preselected by MI with target
or confirm/disconfirm markers)

data problem/approach user behaviors
preliminary results more on evaluation what
next?
27
Results

Actually using a 1-level logistic model-tree
Split on answer_type yes, no, other, no_parse
Perform step-wise logistic regression on the 4
leaves
P-entry 0.05
P-reject 0.30
BIC stopping criterion
Also tried full-blown model tree, results are
similar, maybe marginally worse

data problem/approach user behaviors
preliminary results more on evaluation what
next?
28
Explicit Confirmation
data problem/approach user behaviors
preliminary results more on evaluation what
next?
29
Implicit Confirmation
data problem/approach user behaviors
preliminary results more on evaluation what
next?
30
Outline

Introduction
Data
A simplified version of the problem. Approach
User behaviors
Learning preliminary results
More on evaluation
Where to from here?

data problem/approach user behaviors
preliminary results more on evaluation what
next?
31
What can Logistic Regression / AVG-LL do for you?

D d1, d2, d3, d4, di 1/0
P(D) ?P(di1 xi)
Express density P(di1 xi) as
P(d1 x) 1 / (1 exp(-wx))
You can actually derive this if you start with
P(x d) gaussian
Find parameters w to max(P(D))
argmax(P(D)) argmax ?P(di1 xi)
argmax(P(D)) argmin ?-log(P(di1 xi))
Hence we maximize the average log-likelihood
But what does that mean?

data problem/approach user behaviors
preliminary results more on evaluation what
next?
32
Loss function in Logistic Regression

Log-likelihood loss function

If d1, then P(d1)0.01 is ten times worse than
P(d1)0.1, but P(d1)0.7 is about the same as
P(d1)0.8 Things are mirrored for d0
0.01 0.1 0.7
0.8 1
d1
data problem/approach user behaviors
preliminary results more on evaluation what
next?
33
A New Loss Function T2

A loss function that better matches our domain
T2 (or even T3)

d1
d0
C3
C1
C4
C2
0 t1 t2
1
0 t1 t2
1

Optimize argmax ? T2(P(dic xi))
Not differentiable ?
Not convex ?

data problem/approach user behaviors
preliminary results more on evaluation what
next?
34
Smoothed version

A loss function that better matches our domain
T2 (or even T3)

d1
SmoothT2(p) s1(p) s2(p) si(p) 1 /
(1exp(ki(p-?i))) with ks and ?s chosen
accordingly
C1
C2
0 t1 t2
1

Optimize argmax ? SmoothT2(P(dic xi))
Differentiable! ?
But still not convex ? multiple local maxima

data problem/approach user behaviors
preliminary results more on evaluation what
next?
35
Costs Thresholds

Costs where from?
Expert knowledge
Derive from data (might be tricky)
Thresholds where from?
Fixed
Actually optimize at the same time
SmoothT2 SmoothT2(w, th1, th2)
Differentiable in th1 and th2, so we can do
gradient search for it
Calibrates in one step both the belief updating
and the threshold to minimize loss

data problem/approach user behaviors
preliminary results more on evaluation what
next?
36
Questions What Next?