Title: sorry, I didn
1sorry, I didnt catch that! an investigation
of non-understandings and recovery strategies
- Dan Bohus www.cs.cmu.edu/dbohus
- Alexander I. Rudnicky www.cs.cmu.edu/air
- Computer Science Department
- Carnegie Mellon University
- Pittsburgh, PA, 15213
2systems often do not understand correctly
- non-understandings and misunderstandings
3systems often do not understand correctly
- detection
- strategies
- policy (knowing how to engage the strategies)
- typically trivial although diagnosis is not
- large space of strategies
- tradeoffs between them not well understood
- simple heuristics incremental prompting
4questions under investigation
- what are the main causes of non-understandings?
- how large is their impact on performance?
- how do various recovery strategies compare to
each other? - what are the relationships between strategies and
user behaviors?
- can we improve global dialog performance by using
a smarter policy? - if yes, can we learn a better policy from data?
5data collection
- Roomline
- phone-based, mixed-initiative system
- conference room reservations
- experimental design
- control group uninformed recovery policy
- wizard group recovery policy implemented by
wizard - 46 participants, first-time users
- tasks experimental procedure
- up to 10 scenario-driven interactions
6non-understanding recovery strategies
S For when do you need the conference room? 1.
ASK REPEAT Could you please repeat that? 2.
ASK REPHRASE Could you please try to
rephrase that? 3. NOTIFY (NTFY) Sorry, I
didnt catch that ... 4. YIELD TURN (YLD)
5. REPROMPT (RP) For when do you need the
conference room? 6. DETAILED REPROMPT (DRP)
Right now I need to know the date and time for
when you need the reservation 7. MOVE-ON
Sorry, I didnt catch that. For which day you
need the room? 8. YOU CAN SAY (YCS) Sorry, I
didnt catch that. For when do you need the
conference room? You can say something
like tomorrow at 10 am 9. TERSE YOU CAN SAY
(TYCS) Sorry, I didnt catch that. You can
say something like tomorrow at 10 am 10. FULL
HELP (HELP) Sorry, I didnt catch that. I
am currently trying to make a conference room
reservation for you. Right now I need to
know the date and time for when you need the
reservation. You can say something like
tomorrow at 10 am
7corpus statistics
- 449 sessions
- 8278 user turns
- utterances transcribed and checked
- manual annotations
- misunderstandings
- correct concept values at each turn
- sources of understanding errors
- user response-types to recovery strategies
8questions under investigation
- data
- what are the main causes of non-understandings?
- how large is their impact on performance?
- how do various recovery strategies compare to
each other? - what are the relationships between strategies and
user behaviors?
9causes of non-understandings
user
system
conversationlevel
intentionlevel
signallevel
channellevel
10causes of non-understandings
out-of-application
conversationlevel
16
out-of-grammar
intentionlevel
16
ASR error
signallevel
62
endpointer error
channellevel
11questions under investigation
- data
- what are the main causes of non-understandings?
- how large is their impact on performance?
- how do various recovery strategies compare to
each other? - what are the relationships between strategies and
user behaviors?
data causes of non-understandings impact on
performance strategy comparison user behaviors
12modeling impact on performance
- logistic regression
- P(Task Success)
1
1 e-(a ßFNON)
13questions under investigation
- data
- what are the main causes of non-understandings?
- how large is their impact on performance?
- how do various recovery strategies compare to
each other? - what are the relationships between strategies and
user behaviors?
data causes of non-understandings impact on
performance strategy comparison user behaviors
14strategy performance recovery rate
recovery rate
Help
Yield
Notify
MoveOn
RePrompt
YouCanSay
AskRepeat
AskRephrase
TerseYouCanSay
DetailedReprompt
- overall logistic ANOVA
- significant differences in mean recovery rates
- all pairs comparison (corrected using FDR)
15questions under investigation
- data
- what are the main causes of non-understandings?
- how large is their impact on performance?
- how do various recovery strategies compare to
each other? - what are the relationships between strategies and
user behaviors?
data causes of non-understandings impact on
performance strategy comparison user behaviors
16user response types
- tagging scheme by Shin
- also used by Choularton, Raux
- 5 categories
- repeat
- rephrase
- contradict
- change
- other
17response types after non-understaning
50
Communicator (Shin et al.)
40
Pizza (choularton dale)
Roomline (this study)
30
20
10
0
contradict
change
other
rephrase
repeat
18user response types by strategy
100
Other
80
Change
Rephrase
60
Repeat
40
20
0
Help
Yield
Notify
MoveOn
RePrompt
AskRepeat
YouCanSay
AskRephrase
TerseYouCanSay
DetailedReprompt
19summary
- sources of non-understandings
- impact on performance
- strategy comparison
- user responses
- asr, but also language errors ? more shaping
strategies
- regression model allows better quantitative
assessment
- help, move-on ? further investigate move-on
- margin for improving control over user responses
- can we improve global dialog performance by using
a smarter policy? - can we learn a better policy from data?
- preliminary results promising ?
20thank you! questions
21rejections
22strategy performance assessment
- recovery rate
- recovery utility
- weighted sum of correctly and incorrectly
acquired concepts - weights are determined in a data-driven fashion
- recovery efficiency
- also takes time to recovery into account
23experimental design scenarios
- 10 scenarios, fixed order
- presented graphically (explained during briefing)
24strategy pair-wise comparison
- recovery performance ranked list, based on
pair-wise t-tests
RNK MOVE HELP TYCS RP YCS ARPH DRP NTFY AREP YLD
MOVE 1 MOVE - - - 1.31 1.33 1.35 1.71 1.8 1.91 2.06
HELP 2 HELP - - - - - - 1.55 1.64 1.73 1.87
HELP 3 TYCS - - - - - - 1.5 1.58 1.68 1.81
SIG 4 RP - - - - - - - - 1.46 1.58
HELP 5 YCS - - - - - - - - 1.44 1.55
SIG 6 ARPH - - - - - - - - 1.42 1.53
SIG ? DRP - - - - - - - - - -
SIG ? NTFY - - - - - - - - - -
SIG ? AREP - - - - - - - - - -
SIG ? YLD - - - - - - - - - -
- CER evaluation shows similar results
25recovery for various response-types
26(No Transcript)
27impact of recovery rate on performance
- recovery next turn is correctly understood
- P(Task Success)
1
1 e-(a ßRecoveryRate)