constructing accurate beliefs in task-oriented spoken dialog systems - PowerPoint PPT Presentation

About This Presentation

Title:

constructing accurate beliefs in task-oriented spoken dialog systems

Description:

constructing accurate beliefs in task-oriented spoken dialog systems Dan Bohus Computer Science Department www.cs.cmu.edu/~dbohus Carnegie Mellon University – PowerPoint PPT presentation

Number of Views:131

Avg rating:3.0/5.0

Slides: 47

Provided by: DanB118

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: constructing accurate beliefs in task-oriented spoken dialog systems

1
constructing accurate beliefs in task-oriented
spoken dialog systems

Dan Bohus Computer Science Department
www.cs.cmu.edu/dbohus Carnegie Mellon
University
dbohus_at_cs.cmu.edu Pittsburgh, PA 15213

2
problem

spoken language interfaces lack robustness when
faced with understanding errors

errors stem mostly from speech recognition
typical word error rates 20-30
significant negative impact on interactions

3
more concretely

S What city are you leaving from?
U Urbana Champaign OKAY IN THAT SAME PAY
S Sorry Im not sure I understood what you said.
What city are you leaving from ?
U Urbana Champaign FOR MINUTE SINCE HEY
S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from?
U Chicago CHICAGO
S traveling from Chicago. Where would you like
to go?
U Huntsville SEOUL
S traveling to Seoul. What day did you need to
travel?
U no no Im traveling to Birmingham THE
TRAVELING TO BERLIN P_M
S traveling in the afternoon. Okay what day
would you be departing chicago
U /uh/ the tenth of august AT THE TENTH OF
AUGUST
S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that?
U my destination is Birmingham FLIGHT
DESTINATION MR WEEKEND
S traveling on Saturday, August 12th I have a
flight departing Chicago at 140pm arrives Seoul
at

4
two types of understanding errors

S What city are you leaving from?
U Urbana Champaign OKAY IN THAT SAME PAY
S Sorry Im not sure I understood what you said.
What city are you leaving from ?
U Urbana Champaign FOR MINUTE SINCE HEY
S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from?
U Chicago CHICAGO
S traveling from Chicago. Where would you like
to go?
U Huntsville SEOUL
S traveling to Seoul. What day did you need to
travel?
U no no Im traveling to Birmingham THE
TRAVELING TO BERLIN P_M
S traveling in the afternoon. Okay what day
would you be departing chicago
U /uh/ the tenth of august AT THE TENTH OF
AUGUST
S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that?
U my destination is Birmingham FLIGHT
DESTINATION MR WEEKEND
S traveling on Saturday, August 12th I have a
flight departing Chicago at 140pm arrives Seoul
at

5
approaches for increasing robustness

improve recognition

gracefully handle errors through interaction

detect the problems
develop a set of recovery strategies
know how to choose between them (policy)

6
six not-so-easy pieces
7
todays talk
misunderstandings

construct more accurate beliefs by integrating
information over multiple turns in a conversation

detection
S Where would you like to go? U Huntsville SEO
UL / 0.65
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
U no no Im traveling to Birmingham
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
8
belief updating problem statement
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
THE TRAVELING TO BERLIN P_M / 0.60
destination ?

given
an initial belief Pinitial(C) over concept C
a system action SA
a user response R
construct an updated belief
Pupdated(C) ? f (Pinitial(C), SA, R)

9
outline

related work
a restricted version
data
user response analysis
experiments and results
current and future work

10
current solutions

most systems only track values, not beliefs
new values overwrite old values
use confidence scores
yes ? trust hypothesis
explicit confirm no ? delete hypothesis
other ? non-understanding
implicit confirm not much
users who discover errors through incorrect
implicitconfirmations have a harder time getting
back on track
Shin et al, 2002

related work restricted version data user
response analysis results current and future
work
11
confidence / detecting misunderstandings

traditionally focused on word-level errors
Chase, Cox, Bansal, Ravinshankar, and many
others
recently detecting misunderstandingsWalker,
Wright, Litman, Bosch, Swerts, San-Segundo, Pao,
Gurevych, Bohus, and many others
machine learning approach binary classification
in-domain, labeled dataset
features from different knowledge sources
acoustic, language model, parsing, dialog
management
50 relative reduction in classification error

related work restricted version data user
response analysis results current and future
work
12
detecting corrections

detect if the user is trying to correct the
system Litman, Swerts, Hirschberg, Krahmer,
Levow
machine learning approach binary classification
in-domain, labeled dataset
features from different knowledge sources
acoustic, prosody, language model, parsing,
dialog management
50 relative reduction in classification error

related work restricted version data user
response analysis results current and future
work
13
integration

confidence annotation and correction detection
are useful tools
but separately, neither solves the problem
bridge together in a unified approach to
accurately track beliefs

related work restricted version data user
response analysis results current and future
work
14
outline

related work
a restricted version
data
user response analysis
experiments and results
current and future work

related work restricted version data user
response analysis results current and future
work
15
belief updating general form

given
an initial belief Pinitial(C) over concept C
a system action SA
a user response R
construct an updated belief
Pupdated(C) ? f (Pinitial(C), SA, R)

related work restricted version data user
response analysis results current and future
work
16
two simplifications

1. belief representation
system unlikely to hear more than 3 or 4 values
for a concept within a dialog session
in our data considering only top hypothesis from
recognition
max 3 (conflicting values heard)
only in 6.9 of cases, more than 1 value heard
compressed beliefs top-K concept hypotheses
other
for now, K1
2. updates following system confirmation actions

related work restricted version data user
response analysis results current and future
work
17
belief updating reduced version

given
an initial confidence score for the current top
hypothesis Confinit(thC) for concept C
a system confirmation action SA
a user response R
construct an updated confi-dence score for that
hypothesis
Confupd(thC) ? f (Confinit(thC), SA, R)

related work restricted version data user
response analysis results current and future
work
18
outline

related work
a restricted version
data
user response analysis
experiments and results
current and future work

related work restricted version data user
response analysis results current and future
work
19
data

collected with RoomLine
a phone-based mixed-initiative spoken dialog
system
conference room reservation
explicit and implicit confirmations
confidence threshold model ( some exploration)
unplanned implicit confirmations

I found 10 rooms for Friday between 1 and 3 p.m.
Would like a small room or a large one?

I found 10 rooms for Friday between 1 and 3 p.m.
Would like a small room or a large one?

related work restricted version data user
response analysis results current and future
work
20
corpus

user study
46 participants (naïve users)
10 scenario-based interactions each
compensated per task success
corpus
449 sessions, 8848 user turns
orthographically transcribed
manually annotated
misunderstandings
corrections
correct concept values

related work restricted version data user
response analysis results current and future
work
21
outline

related work
a restricted version
data
user response analysis
experiments and results
current and future work

related work restricted version data user
response analysis results current and future
work
22
user response types

following Krahmer and Swerts, 2000
study on Dutch train-table information system
3 user response types
YES yes, right, thats right, correct, etc.
NO no, wrong, etc.
OTHER
cross-tabulated against correctness of system
confirmations

related work restricted version data user
response analysis results current and future
work
23
user responses to explicit confirmations
YES NO Other
CORRECT 94 93 0 0 5 7
INCORRECT 1 6 72 57 27 37

numbers in brackets from KrahmerSwerts

related work restricted version data user
response analysis results current and future
work
24
other responses to explicit confirmations

70 users repeat the correct value
15 users dont address the question
attempt to shift conversation focus
how often users correct the system?

User does not correct User corrects
CORRECT 1159 0
INCORRECT 29 10 of incor 250 90 of incor
related work restricted version data user
response analysis results current and future
work
25
user responses to implicit confirmations
YES NO Other
CORRECT 30 0 7 0 63 100
INCORRECT 6 0 33 15 61 85

numbers in brackets from KrahmerSwerts

related work restricted version data user
response analysis results current and future
work
26
ignoring errors in implicit confirmations

how often users correct the system?

User does not correct User corrects
CORRECT 552 2
INCORRECT 118 51 of incor 111 49 of incor

explanation
users correct later (40 of 118)
users interact strategically / correct only if
essential

correct later correct later
critical 55 2
critical 14 47
related work restricted version data user
response analysis results current and future
work
27
outline

related work
a restricted version
data
user response analysis
experiments and results
current and future work

related work restricted version data user
response analysis results current and future
work
28
machine learning approach

problem Confupd(thC) ? f (Confinit(thC), SA, R)
need good probability outputs
low cross-entropy between model predictions and
reality
logistic regression
sample efficient
stepwise approach ? feature selection
logistic model tree for each action
root splits on response-type

related work restricted version data user
response analysis results current and future
work
29
features. target.
Initial Initial initial confidence score of top hypothesis, of initial hypotheses, concept type (bool / non-bool), concept identity
System action System action indicators describing other system actions in conjunction with current confirmation
User response Acoustic / prosodic acoustic and language scores, duration, pitch (min, max, mean, range, std.dev, min and max slope, plus normalized versions), voiced-to-unvoiced ratio, speech rate, initial pause
User response Lexical number of words, lexical terms highly correlated with corrections (MI)
User response Grammatical number of slots (new, repeated), parse fragmentation, parse gaps
User response Dialog dialog state, turn number, expectation match, new value for concept, timeout, barge-in.

target was the top hypothesis correct?

related work restricted version data user
response analysis results current and future
work
30
baselines

initial baseline
accuracy of system beliefs before the update
heuristic baseline
accuracy of heuristic update rule used by the
system
oracle baseline
accuracy if we knew exactly what the user said

related work restricted version data user
response analysis results current and future
work
31
results explicit confirmation
initial heuristic logistic model tree oracle
Hard error ()
Soft error
31.15
30
0.6
0.51
20
0.4
0.19
10
0.2
8.41
0.12
3.57
2.71
0
0.0
related work restricted version data user
response analysis results current and future
work
32
results implicit confirmation
initial heuristic logistic model tree oracle
Hard error ()
Soft error
30.40
1.0
30
0.8
23.37
0.67
0.61
20
0.6
16.15
15.33
0.43
0.4
10
0.2
0
0.0
related work restricted version data user
response analysis results current and future
work
33
results unplanned implicit confirmation
initial heuristic logistic model tree oracle
Hard error ()
Soft error
20
0.6
15.40
0.46
14.36
0.43
12.64
0.4
0.34
10.37
10
0.2
0
0.0
related work restricted version data user
response analysis results current and future
work
34
informative features

initial confidence score
prosody features
barge-in
expectation match
repeated grammar slots
concept identity

related work restricted version data user
response analysis results current and future
work
35
summary

data-driven approach for constructing accurate
system beliefs
integrate information across multiple turns
bridge together detection of misunderstandings
and corrections
performs better than current heuristics
user response analysis
users dont correct unless the error is critical

related work restricted version data user
response analysis results current and future
work
36
outline

related work
a restricted version
data
user response analysis
experiments and results
current and future work

related work restricted version data user
response analysis results current and future
work
37
current extensions
belief representation

top hypothesis other
logistic regression model

system action

confirmation actions

related work restricted version data user
response analysis results current and future
work
38
2 hypotheses other
15.49
30.83
30.46
30
30
15.15
14.02
26.16
12.95
22.69
12
21.45
10.72
20
20
17.56
16.17
8
10
10
7.86
4
6.06
5.52
0
0
0
implicit confirmation
unplanned impl. conf.
explicit confirmation
80.00
98.14
initial heuristic lmt(basic) lmt(basicconcept) or
acle
45.03
12
40
9.64
9.49
8
25.66
6.08
19.23
20
4
0
0
unexpected update
request
related work restricted version data user
response analysis results current and future
work
39
other work
misunderstandings
non-understandings