misunderstandings, corrections and beliefs in spoken language interfaces - PowerPoint PPT Presentation

About This Presentation
Title:

misunderstandings, corrections and beliefs in spoken language interfaces

Description:

probability of correction: regression problem. 13. typical result ... accuracy if we knew exactly when the user is correcting the system. 37 ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 49
Provided by: danb7
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: misunderstandings, corrections and beliefs in spoken language interfaces


1
misunderstandings, corrections and beliefs in
spoken language interfaces
  • Dan Bohus Computer Science Department
  • www.cs.cmu.edu/dbohus Carnegie Mellon
    University
  • dbohus_at_cs.cmu.edu Pittsburgh, PA 15213

2
problem
  • spoken language interfaces lack robustness when
    faced with understanding errors
  • stems mostly from speech recognition
  • spans most domains and interaction types
  • exacerbated by operating conditions

3
more concretely
  • S What city are you leaving from?
  • U Urbana Champaign OKAY IN THAT SAME PAY
  • S Sorry Im not sure I understood what you said.
    What city are you leaving from ?
  • U Urbana Champaign FOR MINUTE SINCE HEY
  • S Im still having trouble understanding you. To
    go on, I need you to answer the following
    question. What city are you leaving from?
  • U Chicago CHICAGO
  • S traveling from Chicago. Where would you like
    to go?
  • U Huntsville SEOUL
  • S traveling to Seoul. What day did you need to
    travel?
  • U no no Im traveling to Birmingham THE
    TRAVELING TO BERLIN P_M
  • S traveling in the afternoon. Okay what day
    would you be departing chicago
  • U /uh/ the tenth of august AT THE TENTH OF
    AUGUST
  • S a flight tomorrow I have a Korean Airlines
    flight departing Chicago at 140pm, arrives Seoul
    at 5pm the next day. Do you want to take that?
  • U my destination is Birmingham FLIGHT
    DESTINATION MR WEEKEND
  • S traveling on Saturday, August 12th I have a
    flight departing Chicago at 140pm arrives Seoul
    at

4
some statistics
  • semantic error rates 25-35

SpeechActs SRI 25
CU Communicator CU 27
Jupiter MIT 28
CMU Communicator CMU 32
How May I Help You? ATT 36
  • corrections Krahmer, Swerts, Litman, Levow
  • 30 of utterances correct system mistakes
  • 2-3 times more likely to be misrecognized

5
two types of understanding errors
6
misunderstandings
  • fix recognition
  • detect potential misunderstandings do something
    about them

7
outline
  • detecting misunderstandings
  • detecting user correctionslate-detection of
    misunderstandings
  • belief updatingconstruct accurate beliefs by
    integrating information from multiple turns

8
detecting misunderstandings
  • recognition confidence scores

S What city are you leaving from? U Birmingham
BERLIN PM
conf0.63
  • traditionally Bansal, Chase, Cox, Kemp, many
    others
  • speech recognition confidence scores
  • use acoustic, language model and search info
  • frame, phoneme, word-level

9
semantic confidence scores
  • were interested in semantics, not words
  • YES YEAH, NO NO WAY
  • use machine learning to build confidence
    annotators
  • in-domain, manually labeled data
  • utterance BERLIN PM Birmingham
  • labels correct / misunderstood
  • features from different knowledge sources
  • binary classification problem
  • probability of misunderstanding regression
    problem

10
a typical result
  • Identifying User Corrections Automatically in a
    Spoken Dialog System Walker, Wright, Langkilde
  • HowMayIHelpYou corpus call routing for phone
    services
  • 11787 turns
  • features
  • ASR recog, numwords, duration, dtmf, rg-grammar,
    tempo
  • understanding confidence, context-shift,
    top-task, diff-conf,
  • dialog history sys-label, confirmation,
    num-reprompts, num-confirms, num-subdials,
  • binary classification task
  • majority baseline (error) 36.5
  • RIPPER (error) 14

11
outline
  • detecting misunderstandings
  • detecting user corrections
  • late-detection of misunderstandings
  • belief updating
  • construct accurate beliefs by integrating
    information from multiple turns

12
detect user corrections
  • is the user trying to correct the system?

S Where would you like to go? U Huntsville
SEOUL S traveling to Seoul. What day did you
need to travel? U no no Im traveling to
Birmingham THE TRAVELING TO BERLIN P_M
misunderstanding
user correction
misunderstanding
  • same story use machine learning
  • in-domain, manually labeled data
  • features from different knowledge sources
  • binary classification problem
  • probability of correction regression problem

13
typical result
  • Identifying User Corrections Automatically in a
    Spoken Dialog System Hirschberg, Litman, Swerts
  • TOOT corpus access to train information
  • 2328 turns, 152 dialogs
  • features
  • prosodic f0max, f0mn, rmsmax, dur, ppau, tempo
  • ASR gram, str, conf, ynstr,
  • dialog position diadist
  • dialog history preturn, prepreturn, pmeanf
  • binary classification task
  • majority baseline 29
  • RIPPER 15.7

14
outline
  • detecting misunderstandings
  • detecting user correctionslate-detection of
    misunderstandings
  • belief updatingconstruct accurate beliefs by
    integrating information from multiple turns

15
belief updating problem an easy case
S on which day would you like to travel? U on
September 3rd AN DECEMBER THIRD CONF0.25
departure_date Dec-03/0.25
S did you say you wanted to leave on December
3rd?
U no
NO CONF0.88
departure_date Ø
16
belief updating problem a trickier case
S Where would you like to go? U Huntsville SEO
UL CONF0.65
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
U no no Im traveling to Birmingham
THE TRAVELING TO BERLIN P_M CONF0.60
COR0.35
destination ?
17
belief updating problem formalized
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
  • given
  • an initial belief Pinitial(C) over concept C
  • a system action SA
  • a user response R
  • construct an updated belief
  • Pupdated(C) ? f (Pinitial(C), SA, R)

THE TRAVELING TO BERLIN P_M CONF0.60
COR0.35
destination ?
18
outline
  • detecting misunderstandings
  • detecting user corrections
  • late-detection of misunderstandings
  • belief updating
  • construct accurate beliefs by integrating
    information from multiple turns
  • current solutions
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • discussion. caveats. future work

19
belief updating current solutions
  • most systems only track values, not beliefs
  • new values overwrite old values
  • explicit confirm yes ? trust hypothesis
  • explicit confirm no ? kill hypothesis
  • explicit confirm other ? non-understanding
  • implicit confirm not much
  • users who discover errors through incorrect
    implicitconfirmations have a harder time getting
    back on track
  • Shin et al, 2002

20
outline
  • detecting misunderstandings
  • detecting user corrections
  • late-detection of misunderstandings
  • belief updating
  • construct accurate beliefs by integrating
    information from multiple turns
  • current solutions
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • discussion. caveats. future work

21
belief updating general form
  • given
  • an initial belief Pinitial(C) over concept C
  • a system action SA
  • a user response R
  • construct an updated belief
  • Pupdated(C) ? f (Pinitial(C), SA, R)

22
restricted version 2 simplifications
  • compact belief
  • system unlikely to hear more than 3 or 4 values
  • single vs. multiple recognition results
  • in our data max 3 values, only 6.9 have gt1
    value
  • confidence score of top hypothesis
  • updates after confirmation actions
  • reduced problem
  • ConfTopupdated(C) ? f (ConfTopinitial(C), SA, R)

23
outline
  • detecting misunderstandings
  • detecting user corrections
  • late-detection of misunderstandings
  • belief updating
  • construct accurate beliefs by integrating
    information from multiple turns
  • current solutions
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • discussion. caveats. future work

24
data
  • collected with RoomLine
  • a phone-based mixed-initiative spoken dialog
    system
  • conference room reservation
  • search and negotiation
  • explicit and implicit confirmations
  • confidence threshold model ( some exploration)
  • implicit confirmation task
  • I found 10 rooms for Friday between 1 and 3 p.m.
    Would like a small room or a large one?
  • I found 10 rooms for Friday between 1 and 3 p.m.
    Would like a small room or a large one?

25
user study
  • 46 participants, 1st time users
  • 10 scenarios, fixed order
  • presented graphically (explained during briefing)
  • compensated per task success

26
corpus statistics
  • 449 sessions, 8848 user turns
  • orthographically transcribed
  • manually annotated
  • misunderstandings (concept-level)
  • non-understandings
  • user corrections
  • correct concept values

27
outline
  • detecting misunderstandings
  • detecting user corrections
  • late-detection of misunderstandings
  • belief updating
  • construct accurate beliefs by integrating
    information from multiple turns
  • current solutions
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • discussion. caveats. future work

28
user response types
  • following Krahmer and Swerts
  • study on Dutch train-table information system
  • 3 user response types
  • YES yes, right, thats right, correct, etc.
  • NO no, wrong, etc.
  • OTHER
  • cross-tabulated against correctness of
    confirmations

29
user responses to explicit confirmations
  • from transcripts
  • numbers in brackets from KrahmerSwerts
  • from decoded

YES NO Other
CORRECT 94 93 0 0 5 7
INCORRECT 1 6 72 57 27 37
YES NO Other
CORRECT 87 1 12
INCORRECT 1 61 38
30
other responses to explicit confirmations
  • 70 users repeat the correct value
  • 15 users dont address the question
  • attempt to shift conversation focus

User does not correct User corrects
CORRECT 1159 0
INCORRECT 29 10 of incor 250 90 of incor
31
user responses to implicit confirmations
  • transcripts
  • numbers in brackets from KrahmerSwerts
  • decoded

YES NO Other
CORRECT 30 0 7 0 63 100
INCORRECT 6 0 33 15 61 85
YES NO Other
CORRECT 28 5 67
INCORRECT 7 27 66
32
ignoring errors in implicit confirmations
User does not correct User corrects
CORRECT 552 2
INCORRECT 118 51 of incor 111 49 of incor
  • users correct later (40 of 118)
  • users interact strategically
  • correct only if essential

correct later correct later
critical 55 2
critical 14 47
33
outline
  • detecting misunderstandings
  • detecting user corrections
  • late-detection of misunderstandings
  • belief updating
  • construct accurate beliefs by integrating
    information from multiple turns
  • current solutions
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • discussion. caveats. future work

34
machine learning approach
  • need good probability outputs
  • low cross-entropy between model predictions and
    reality
  • cross-entropy negative average log posterior
  • logistic regression
  • sample efficient
  • stepwise approach ? feature selection
  • logistic model tree for each action
  • root splits on response-type

35
features. target.
  • initial situation
  • initial confidence score
  • concept identity, dialog state, turn number
  • system action
  • other actions performed in parallel
  • features of the user response
  • acoustic / prosodic features
  • lexical features
  • grammatical features
  • dialog-level features
  • target was the value correct?

36
baselines
  • initial baseline
  • accuracy of system beliefs before the update
  • heuristic baseline
  • accuracy of heuristic rule currently used in the
    system
  • oracle baseline
  • accuracy if we knew exactly when the user is
    correcting the system

37
results explicit confirmation
Hard error ()
Soft error
38
results implicit confirmation
Hard error ()
Soft error
39
results unplanned implicit confirmation
Hard error ()
Soft error
40
informative features
  • initial confidence score
  • prosody features
  • barge-in
  • expectation match
  • repeated grammar slots
  • concept id
  • priors on concept values not included in these
    results

41
outline
  • detecting misunderstandings
  • detecting user corrections
  • late-detection of misunderstandings
  • belief updating
  • construct accurate beliefs by integrating
    information from multiple turns
  • current solutions
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • discussion. caveats. future work

42
discussion
  • evaluation
  • does it make sense?
  • what would be a better evaluation?
  • current limitation belief compression
  • extending models to N hypothesis other
  • current limitation system actions
  • extending models to cover all system actions

43
thank you!
44
a more subtle caveat
  • distribution of training data
  • confidence annotator heuristic update rules
  • distribution of run-time data
  • confidence annotator learned model
  • always a problem when interacting with the world!
  • hopefully, distribution shift will not cause
    large degradation in performance
  • remains to validate empirically
  • maybe a bootstrap approach?

45
KL-divergence cross-entropy
  • KL divergence D(pq)
  • Cross-entropy CH(p, q) H(p) D(pq)
  • Negative log likelihood

46
logistic regression
  • regression model for binomial (binary) dependent
    variables
  • fit a model using max likelihood (avg
    log-likelihood)
  • any stats package will do it for you
  • no R2 measure
  • test fit using likelihood ratio test
  • stepwise logistic regression
  • keep adding variables while data likelihood
    increases signif.
  • use Bayesian information criterion to avoid
    overfitting

47
logistic regression
48
logistic model tree
  • regression tree, but with logistic models on
    leaves

f
f0
f1
g
ggt10
glt10
Write a Comment
User Comments (0)
About PowerShow.com