k hypotheses other belief updating in spoken dialog systems - PowerPoint PPT Presentation

About This Presentation
Title:

k hypotheses other belief updating in spoken dialog systems

Description:

'k hypotheses other' belief updating in spoken dialog systems ... uch1, ... uchk, ucoth fSA(C)( ich1, ... ichk, icoth , R) Bupdated(C) f(Binitial(C), SA(C), R) ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 33
Provided by: danb7
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: k hypotheses other belief updating in spoken dialog systems


1
k hypotheses other belief updating in spoken
dialog systems
  • Dialogs on Dialogs Talk, March 2006
  • Dan Bohus Computer Science Department
  • www.cs.cmu.edu/dbohus Carnegie Mellon
    University
  • dbohus_at_cs.cmu.edu Pittsburgh, PA 15213

2
problem
  • spoken language interfaces lack robustness when
    faced with understanding errors
  • errors stem mostly from speech recognition
  • typical word error rates 20-30
  • significant negative impact on interactions

3
guarding against understanding errors
  • use confidence scores
  • machine learning approaches for detecting
    misunderstadings Walker, Litman, San-Segundo,
    Wright, and others
  • engage in confirmation actions
  • explicit confirmation
  • did you say you wanted to fly to Seoul?
  • yes ? trust hypothesis
  • no ? delete hypothesis
  • other ? non-understanding
  • implicit confirmation
  • traveling to Seoul what day did you need to
    travel?
  • rely on new values overwriting old values

related work data user response analysis
proposed approach experiments and results
conclusion
4
todays talk
  • construct accurate beliefs by integrating
    information over multiple turns in a conversation

S Where would you like to go? U Huntsville SEO
UL / 0.65
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
U no no Im traveling to Birmingham
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
5
belief updating problem statement
  • given
  • an initial belief Binitial(C) over concept C
  • a system action SA
  • a user response R
  • construct an updated belief
  • Bupdated(C) ? f (Binitial(C), SA, R)

destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
6
outline
  • proposed approach
  • data
  • experiments and results
  • effect on dialog performance
  • conclusion

proposed approach data experiments and results
effect on dialog performance conclusion
7
belief updating problem statement
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
  • given
  • an initial belief Binitial(C) over concept C
  • a system action SA(C)
  • a user response R
  • construct an updated belief
  • Bupdated(C) ? f(Binitial(C),SA(C),R)

proposed approach data experiments and results
effect on dialog performance conclusion
8
Bupdated(C) ? f(Binitial(C), SA(C), R)
belief representation
  • most accurate representation
  • probability distribution over the set of possible
    values
  • however
  • system will hear only a small number of
    conflicting values for a concept within a dialog
    session
  • in our data
  • max 3 (conflicting values heard)
  • only in 6.9 of cases, more than 1 value heard

proposed approach data experiments and results
effect on dialog performance conclusion
9
belief representation
Bupdated(C) ? f(Binitial(C), SA(C), R)
  • compressed belief representation
  • k hypotheses other
  • at each turn, the system retains the top m
    initial hypotheses and adds n new hypotheses from
    the input (mnk)

proposed approach data experiments and results
effect on dialog performance conclusion
10
belief representation
Bupdated(C) ? f(Binitial(C), SA(C), R)
  • B(C) modeled as a multinomial variable
  • h1, h2, hk, other
  • B(C) ltch1, ch2, , chk, cothergt
  • where ch1 ch2 chk cother 1
  • belief updating can be cast as multinomial
    regression problem
  • Bupdated(C) ? Binitial(C) SA(C) R

proposed approach data experiments and results
effect on dialog performance conclusion
11
system action
Bupdated(C) ? f(Binitial(C), SA(C), R)
proposed approach data experiments and results
effect on dialog performance conclusion
12
user response
Bupdated(C) ? f(Binitial(C), SA(C), R)
proposed approach data experiments and results
effect on dialog performance conclusion
13
approach
Bupdated(C) ? f(Binitial(C), SA(C), R)
  • problem
  • ltuch1, uchk, ucothgt ? f(ltich1, ichk, icothgt,
    SA(C), R)
  • approach multinomial generalized linear model
  • regression model, multinomial independent
    variable
  • sample efficient
  • stepwise approach
  • feature selection
  • BIC to control over-fitting
  • one model for each system action
  • ltuch1, uchk, ucothgt ? fSA(C)(ltich1, ichk,
    icothgt, R)

proposed approach data experiments and results
effect on dialog performance conclusion
14
outline
  • proposed approach
  • data
  • experiments and results
  • effect on dialog performance
  • conclusion

proposed approach data experiments and results
effect on dialog performance conclusion
15
data
  • collected with RoomLine
  • a phone-based mixed-initiative spoken dialog
    system
  • conference room reservation
  • explicit and implicit confirmations
  • simple heuristic rules for belief updating
  • explicit confirm yes / no
  • implicit confirm new values overwrite old ones

proposed approach data experiments and results
effect on dialog performance conclusion
16
corpus
  • user study
  • 46 participants (naïve users)
  • 10 scenario-based interactions each
  • compensated per task success
  • corpus
  • 449 sessions, 8848 user turns
  • orthographically transcribed
  • manually annotated
  • misunderstandings
  • corrections
  • correct concept values

proposed approach data experiments and results
effect on dialog performance conclusion
17
outline
  • proposed approach
  • data
  • experiments and results
  • effect on dialog performance
  • conclusion

proposed approach data experiments and results
effect on dialog performance conclusion
18
baselines
  • initial baseline
  • accuracy of system beliefs before the update
  • heuristic baseline
  • accuracy of heuristic update rule used by the
    system
  • oracle baseline
  • accuracy if we knew exactly when the user corrects

proposed approach data experiments and results
effect on dialog performance conclusion
19
k2 hypotheses other
Informative features
  • priors and confusability
  • initial confidence score
  • concept identity
  • barge-in
  • expectation match
  • repeated grammar slots

proposed approach data experiments and results
effect on dialog performance conclusion
20
outline
  • proposed approach
  • data
  • experiments and results
  • effect on dialog performance
  • conclusion

proposed approach data experiments and results
effect on dialog performance conclusion
21
a question remains
  • does this really matter?

what is the effect on global dialog performance?
proposed approach data experiments and results
effect on dialog performance conclusion
22
lets run an experiment
guinea pigs from Speech Lab for exp 0 getting
change from guys in the lab 2/3/5 real
subjects for the experiment 25 picture with
advisor of the VERY last exp at CMU
priceless!!!! courtesy of Mohit Kumar
23
a new user study
  • implemented models in RavenClaw, performed a new
    user study
  • 40 participants, first-time users
  • 10 scenario-driven interactions each
  • non-native speakers of North-American English
  • improvements more likely at higher WER
  • supported by empirical evidence
  • between-subjects 2 gender-balanced groups
  • control RoomLine using heuristic update rules
  • treatment RoomLine using runtime models

proposed approach data experiments and results
effect on dialog performance conclusion
24
effect on task success
73.6
control
task success
81.3
treatment
proposed approach data experiments and results
effect on dialog performance conclusion
25
effect on task success a closer look
probability of task success
word error rate
Task Success ? 2.09 - 0.05WER 0.69Condition
p0.001
proposed approach data experiments and results
effect on dialog performance conclusion
26
improvements at different WER
absolute Improvement in task success
word-error-rate
proposed approach data experiments and results
effect on dialog performance conclusion
27
effect on task duration (for successful tasks)
  • ANOVA on task duration for successful tasks
  • Duration ? -0.21 0.013WER - 0.106Condition
  • significant improvement, equivalent to 7.9
    absolute reduction in WER

proposed approach data experiments and results
effect on dialog performance conclusion
28
outline
  • proposed approach
  • data
  • experiments and results
  • effect on dialog performance
  • conclusion

proposed approach data experiments and results
effect on dialog performance conclusion
29
summary
  • data-driven approach for constructing accurate
    system beliefs
  • integrate information across multiple turns
  • bridge together detection of misunderstandings
    and corrections
  • significantly outperforms current heuristics
  • significantly improves effectiveness and
    efficiency

30
other advantages
  • sample efficient
  • performs a local one-turn optimization
  • good local performance leads to good global
    performance
  • scalable
  • works independently on concepts
  • 29 concepts, varying cardinalities
  • portable
  • decoupled from dialog task specification
  • doesnt make strong assumptions about dialog
    management technology

31
thank you! questions
32
user study
  • 10 scenarios, fixed order
  • presented graphically (explained during briefing)
  • participants compensated per task success
Write a Comment
User Comments (0)
About PowerShow.com