constructing accurate beliefs in task-oriented spoken dialog systems - PowerPoint PPT Presentation

About This Presentation
Title:

constructing accurate beliefs in task-oriented spoken dialog systems

Description:

constructing accurate beliefs in task-oriented spoken dialog systems Dan Bohus Computer Science Department www.cs.cmu.edu/~dbohus Carnegie Mellon University – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 47
Provided by: DanB118
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: constructing accurate beliefs in task-oriented spoken dialog systems


1
constructing accurate beliefs in task-oriented
spoken dialog systems
  • Dan Bohus Computer Science Department
  • www.cs.cmu.edu/dbohus Carnegie Mellon
    University
  • dbohus_at_cs.cmu.edu Pittsburgh, PA 15213

2
problem
  • spoken language interfaces lack robustness when
    faced with understanding errors
  • errors stem mostly from speech recognition
  • typical word error rates 20-30
  • significant negative impact on interactions

3
more concretely
  • S What city are you leaving from?
  • U Urbana Champaign OKAY IN THAT SAME PAY
  • S Sorry Im not sure I understood what you said.
    What city are you leaving from ?
  • U Urbana Champaign FOR MINUTE SINCE HEY
  • S Im still having trouble understanding you. To
    go on, I need you to answer the following
    question. What city are you leaving from?
  • U Chicago CHICAGO
  • S traveling from Chicago. Where would you like
    to go?
  • U Huntsville SEOUL
  • S traveling to Seoul. What day did you need to
    travel?
  • U no no Im traveling to Birmingham THE
    TRAVELING TO BERLIN P_M
  • S traveling in the afternoon. Okay what day
    would you be departing chicago
  • U /uh/ the tenth of august AT THE TENTH OF
    AUGUST
  • S a flight tomorrow I have a Korean Airlines
    flight departing Chicago at 140pm, arrives Seoul
    at 5pm the next day. Do you want to take that?
  • U my destination is Birmingham FLIGHT
    DESTINATION MR WEEKEND
  • S traveling on Saturday, August 12th I have a
    flight departing Chicago at 140pm arrives Seoul
    at

4
two types of understanding errors
  • S What city are you leaving from?
  • U Urbana Champaign OKAY IN THAT SAME PAY
  • S Sorry Im not sure I understood what you said.
    What city are you leaving from ?
  • U Urbana Champaign FOR MINUTE SINCE HEY
  • S Im still having trouble understanding you. To
    go on, I need you to answer the following
    question. What city are you leaving from?
  • U Chicago CHICAGO
  • S traveling from Chicago. Where would you like
    to go?
  • U Huntsville SEOUL
  • S traveling to Seoul. What day did you need to
    travel?
  • U no no Im traveling to Birmingham THE
    TRAVELING TO BERLIN P_M
  • S traveling in the afternoon. Okay what day
    would you be departing chicago
  • U /uh/ the tenth of august AT THE TENTH OF
    AUGUST
  • S a flight tomorrow I have a Korean Airlines
    flight departing Chicago at 140pm, arrives Seoul
    at 5pm the next day. Do you want to take that?
  • U my destination is Birmingham FLIGHT
    DESTINATION MR WEEKEND
  • S traveling on Saturday, August 12th I have a
    flight departing Chicago at 140pm arrives Seoul
    at

5
approaches for increasing robustness
  • improve recognition
  • gracefully handle errors through interaction
  1. detect the problems
  2. develop a set of recovery strategies
  3. know how to choose between them (policy)

6
six not-so-easy pieces
7
todays talk
misunderstandings
  • construct more accurate beliefs by integrating
    information over multiple turns in a conversation

detection
S Where would you like to go? U Huntsville SEO
UL / 0.65
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
U no no Im traveling to Birmingham
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
8
belief updating problem statement
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
  • given
  • an initial belief Pinitial(C) over concept C
  • a system action SA
  • a user response R
  • construct an updated belief
  • Pupdated(C) ? f (Pinitial(C), SA, R)

9
outline
  • related work
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • current and future work

10
current solutions
  • most systems only track values, not beliefs
  • new values overwrite old values
  • use confidence scores
  • yes ? trust hypothesis
  • explicit confirm no ? delete hypothesis
  • other ? non-understanding
  • implicit confirm not much
  • users who discover errors through incorrect
    implicitconfirmations have a harder time getting
    back on track
  • Shin et al, 2002

related work restricted version data user
response analysis results current and future
work
11
confidence / detecting misunderstandings
  • traditionally focused on word-level errors
    Chase, Cox, Bansal, Ravinshankar, and many
    others
  • recently detecting misunderstandingsWalker,
    Wright, Litman, Bosch, Swerts, San-Segundo, Pao,
    Gurevych, Bohus, and many others
  • machine learning approach binary classification
  • in-domain, labeled dataset
  • features from different knowledge sources
  • acoustic, language model, parsing, dialog
    management
  • 50 relative reduction in classification error

related work restricted version data user
response analysis results current and future
work
12
detecting corrections
  • detect if the user is trying to correct the
    system Litman, Swerts, Hirschberg, Krahmer,
    Levow
  • machine learning approach binary classification
  • in-domain, labeled dataset
  • features from different knowledge sources
  • acoustic, prosody, language model, parsing,
    dialog management
  • 50 relative reduction in classification error

related work restricted version data user
response analysis results current and future
work
13
integration
  • confidence annotation and correction detection
    are useful tools
  • but separately, neither solves the problem
  • bridge together in a unified approach to
    accurately track beliefs

related work restricted version data user
response analysis results current and future
work
14
outline
  • related work
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • current and future work

related work restricted version data user
response analysis results current and future
work
15
belief updating general form
  • given
  • an initial belief Pinitial(C) over concept C
  • a system action SA
  • a user response R
  • construct an updated belief
  • Pupdated(C) ? f (Pinitial(C), SA, R)

related work restricted version data user
response analysis results current and future
work
16
two simplifications
  • 1. belief representation
  • system unlikely to hear more than 3 or 4 values
    for a concept within a dialog session
  • in our data considering only top hypothesis from
    recognition
  • max 3 (conflicting values heard)
  • only in 6.9 of cases, more than 1 value heard
  • compressed beliefs top-K concept hypotheses
    other
  • for now, K1
  • 2. updates following system confirmation actions

related work restricted version data user
response analysis results current and future
work
17
belief updating reduced version
  • given
  • an initial confidence score for the current top
    hypothesis Confinit(thC) for concept C
  • a system confirmation action SA
  • a user response R
  • construct an updated confi-dence score for that
    hypothesis
  • Confupd(thC) ? f (Confinit(thC), SA, R)

related work restricted version data user
response analysis results current and future
work
18
outline
  • related work
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • current and future work

related work restricted version data user
response analysis results current and future
work
19
data
  • collected with RoomLine
  • a phone-based mixed-initiative spoken dialog
    system
  • conference room reservation
  • explicit and implicit confirmations
  • confidence threshold model ( some exploration)
  • unplanned implicit confirmations
  • I found 10 rooms for Friday between 1 and 3 p.m.
    Would like a small room or a large one?
  • I found 10 rooms for Friday between 1 and 3 p.m.
    Would like a small room or a large one?

related work restricted version data user
response analysis results current and future
work
20
corpus
  • user study
  • 46 participants (naïve users)
  • 10 scenario-based interactions each
  • compensated per task success
  • corpus
  • 449 sessions, 8848 user turns
  • orthographically transcribed
  • manually annotated
  • misunderstandings
  • corrections
  • correct concept values

related work restricted version data user
response analysis results current and future
work
21
outline
  • related work
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • current and future work

related work restricted version data user
response analysis results current and future
work
22
user response types
  • following Krahmer and Swerts, 2000
  • study on Dutch train-table information system
  • 3 user response types
  • YES yes, right, thats right, correct, etc.
  • NO no, wrong, etc.
  • OTHER
  • cross-tabulated against correctness of system
    confirmations

related work restricted version data user
response analysis results current and future
work
23
user responses to explicit confirmations
YES NO Other
CORRECT 94 93 0 0 5 7
INCORRECT 1 6 72 57 27 37
  • numbers in brackets from KrahmerSwerts

related work restricted version data user
response analysis results current and future
work
24
other responses to explicit confirmations
  • 70 users repeat the correct value
  • 15 users dont address the question
  • attempt to shift conversation focus
  • how often users correct the system?

User does not correct User corrects
CORRECT 1159 0
INCORRECT 29 10 of incor 250 90 of incor
related work restricted version data user
response analysis results current and future
work
25
user responses to implicit confirmations
YES NO Other
CORRECT 30 0 7 0 63 100
INCORRECT 6 0 33 15 61 85
  • numbers in brackets from KrahmerSwerts

related work restricted version data user
response analysis results current and future
work
26
ignoring errors in implicit confirmations
  • how often users correct the system?

User does not correct User corrects
CORRECT 552 2
INCORRECT 118 51 of incor 111 49 of incor
  • explanation
  • users correct later (40 of 118)
  • users interact strategically / correct only if
    essential

correct later correct later
critical 55 2
critical 14 47
related work restricted version data user
response analysis results current and future
work
27
outline
  • related work
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • current and future work

related work restricted version data user
response analysis results current and future
work
28
machine learning approach
  • problem Confupd(thC) ? f (Confinit(thC), SA, R)
  • need good probability outputs
  • low cross-entropy between model predictions and
    reality
  • logistic regression
  • sample efficient
  • stepwise approach ? feature selection
  • logistic model tree for each action
  • root splits on response-type

related work restricted version data user
response analysis results current and future
work
29
features. target.
Initial Initial initial confidence score of top hypothesis, of initial hypotheses, concept type (bool / non-bool), concept identity
System action System action indicators describing other system actions in conjunction with current confirmation
User response Acoustic / prosodic acoustic and language scores, duration, pitch (min, max, mean, range, std.dev, min and max slope, plus normalized versions), voiced-to-unvoiced ratio, speech rate, initial pause
User response Lexical number of words, lexical terms highly correlated with corrections (MI)
User response Grammatical number of slots (new, repeated), parse fragmentation, parse gaps
User response Dialog dialog state, turn number, expectation match, new value for concept, timeout, barge-in.
  • target was the top hypothesis correct?

related work restricted version data user
response analysis results current and future
work
30
baselines
  • initial baseline
  • accuracy of system beliefs before the update
  • heuristic baseline
  • accuracy of heuristic update rule used by the
    system
  • oracle baseline
  • accuracy if we knew exactly what the user said

related work restricted version data user
response analysis results current and future
work
31
results explicit confirmation
initial heuristic logistic model tree oracle
Hard error ()
Soft error
31.15
30
0.6
0.51
20
0.4
0.19
10
0.2
8.41
0.12
3.57
2.71
0
0.0
related work restricted version data user
response analysis results current and future
work
32
results implicit confirmation
initial heuristic logistic model tree oracle
Hard error ()
Soft error
30.40
1.0
30
0.8
23.37
0.67
0.61
20
0.6
16.15
15.33
0.43
0.4
10
0.2
0
0.0
related work restricted version data user
response analysis results current and future
work
33
results unplanned implicit confirmation
initial heuristic logistic model tree oracle
Hard error ()
Soft error
20
0.6
15.40
0.46
14.36
0.43
12.64
0.4
0.34
10.37
10
0.2
0
0.0
related work restricted version data user
response analysis results current and future
work
34
informative features
  • initial confidence score
  • prosody features
  • barge-in
  • expectation match
  • repeated grammar slots
  • concept identity

related work restricted version data user
response analysis results current and future
work
35
summary
  • data-driven approach for constructing accurate
    system beliefs
  • integrate information across multiple turns
  • bridge together detection of misunderstandings
    and corrections
  • performs better than current heuristics
  • user response analysis
  • users dont correct unless the error is critical

related work restricted version data user
response analysis results current and future
work
36
outline
  • related work
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • current and future work

related work restricted version data user
response analysis results current and future
work
37
current extensions
belief representation
  • top hypothesis other
  • logistic regression model

system action
  • confirmation actions

related work restricted version data user
response analysis results current and future
work
38
2 hypotheses other
15.49
30.83
30.46
30
30
15.15
14.02
26.16
12.95
22.69
12
21.45
10.72
20
20
17.56
16.17
8
10
10
7.86
4
6.06
5.52
0
0
0
implicit confirmation
unplanned impl. conf.
explicit confirmation
80.00
98.14
initial heuristic lmt(basic) lmt(basicconcept) or
acle
45.03
12
40
9.64
9.49
8
25.66
6.08
19.23
20
4
0
0
unexpected update
request
related work restricted version data user
response analysis results current and future
work
39
other work
misunderstandings
non-understandings
  • belief updating ASRU-05
  • costs for errors
  • rejection threshold adaptation
  • nonu impact on performance Interspeech-05
  • transfering confidence annotators across domains
    in progress

detection
  • comparative analysis of 10 recovery strategies
    SIGdial-05

strategies
  • impact of policy on performance
  • towards learning non-understanding recovery
    policies SIGdial-05

policy
  • RavenClaw dialog management for task-oriented
    systems - RoomLine, Lets Go Public!, Vera,
    LARRI, TeamTalk, Sublime EuroSpeech-03, HLT-05

related work restricted version data user
response analysis results current and future
work
40
thank you! questions
41
a more subtle caveat
  • distribution of training data
  • confidence annotator heuristic update rules
  • distribution of run-time data
  • confidence annotator learned model
  • always a problem when interacting with the world!
  • hopefully, distribution shift will not cause
    large degradation in performance
  • remains to validate empirically
  • maybe a bootstrap approach?

42
KL-divergence cross-entropy
  • KL divergence D(pq)
  • Cross-entropy CH(p, q) H(p) D(pq)
  • Negative log likelihood

43
logistic regression
  • regression model for binomial (binary) dependent
    variables
  • fit a model using max likelihood (avg
    log-likelihood)
  • any stats package will do it for you
  • no R2 measure
  • test fit using likelihood ratio test
  • stepwise logistic regression
  • keep adding variables while data likelihood
    increases signif.
  • use Bayesian information criterion to avoid
    overfitting

44
logistic regression
45
logistic model tree
  • regression tree, but with logistic models on
    leaves

f
f0
f1
g
ggt10
glt10
46
user study
  • 46 participants, 1st time users
  • 10 scenarios, fixed order
  • presented graphically (explained during briefing)
  • participants compensated per task success
Write a Comment
User Comments (0)
About PowerShow.com