online supervised learning of nonunderstanding recovery policies

About This Presentation

Title:

online supervised learning of nonunderstanding recovery policies

Description:

two step approach. step 1: learn to estimate probability of success for each strategy, in ... probability of success for each strategy, in a given situation ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 35

Provided by: DanB92

Category:

more less

Transcript and Presenter's Notes

Title: online supervised learning of nonunderstanding recovery policies

1
online supervised learning of non-understanding
recovery policies

Dan Bohus
www.cs.cmu.edu/dbohus
dbohus_at_cs.cmu.edu
Computer Science Department
Carnegie Mellon University
Pittsburgh, PA 15213

with thanks to Alex Rudnicky Brian
Langner Antoine Raux Alan Black Maxine Eskenazi
2
understanding-errors in spoken dialog
MIS-understanding
NON-understanding
System constructs an incorrect semantic
representation of the users turn
System fails to construct a semantic
representation of the users turn
S Where are you flying from? U Birmingham
BERLIN PM
S Where are you flying from? U Urbana
Champaign OKAY IN THAT SAME PAY
3
recovery strategies

large set of strategies (strategy 1-step
action)
tradeoffs not well understood
some strategies are more appropriate at certain
times
OOV -gt ask repeat is not a good idea
door slam -gt ask repeat might work well

Sorry, I didnt catch that
Can you repeat that?
Can you rephrase that?
Where are you flying from?
Please tell me the name of the city you are
leaving from
Could you please go to a quieter place?
Sorry, I didnt catch that tell me the state
first

S
4
recovery policy

policy method for choosing between strategies
difficult to handcraft
especially over a large set of recovery
strategies
common approaches
heuristic
three strikes and youre out Balentine
1st non-understanding ask user to repeat
2nd non-understanding provide more help,
including examples
3rd non-understanding transfer to an operator

5
this talk

an online, supervised method for learning a
non-understanding recovery policy from data

6
overview

introduction
approach
experimental setup
results
discussion

7
overview

introduction
approach
experimental setup
results
discussion

8
intuition

if we knew the probability of success for each
strategy in the current situation, we could
easily construct a policy

S Where are you flying from? U OKAY IN THAT
SAME PAY Urbana Champaign
S

Sorry, I didnt catch that
Can you repeat that?
Can you rephrase that?
Where are you flying from?
Please tell me the name of the city you are
leaving from
Could you please go to a quieter place?
Sorry, I didnt catch that tell me the state
first

32 15 20 30 45 25 43
9
two step approach

step 1 learn to estimate probability of success
for each strategy, in a given situation
step 2 use these estimates to choose between
strategies (and hence build a policy)

10
learning predictors for strategy success

supervised learning logistic regression
target strategy recovery successfully or not
success next turn is correctly understood
labeled semi-automatically
features describe current situation
extracted from different knowledge sources
recognition features
language understanding features
dialog-level features state, history

11
logistic regression

well-calibrated class-posterior probabilities
predictions reflect empirical probability of
success
x of cases where P(SF)x are indeed successful
sample efficient
one model per strategy, so data will be sparse
stepwise construction
automatic feature selection
provide confidence bounds
very useful for online learning

12
two step approach

step 1 learn to estimate probability of success
for each strategy, in a given situation
step 2 use these estimates to choose between
strategies (and hence build a policy)

13
policy learning

choose strategy most likely to succeed

1
0
S1 S2 S3 S4

BUT
we want to learn online
we have to deal with the exploration /
exploitation tradeoff

14
highest-upper-bound learning

choose strategy with highest-upper-bound
proposed by Kaelbling 93
empirically shown to do well in various problems
intuition

1
1
0
0
S1 S2 S3 S4
S1 S2 S3 S4
exploration
exploitation
15
highest-upper-bound learning

choose strategy with highest upper bound
proposed by Kaelbling 93
empirically shown to do well in various problems
intuition

1
1
0
0
S1 S2 S3 S4
S1 S2 S3 S4
exploration
exploitation
16
highest-upper-bound learning

choose strategy with highest upper bound
proposed by Kaelbling 93
empirically shown to do well in various problems
intuition

1
1
0
0
S1 S2 S3 S4
S1 S2 S3 S4
exploration
exploitation
17
highest-upper-bound learning

choose strategy with highest upper bound
proposed by Kaelbling 93
empirically shown to do well in various problems
intuition

1
1
0
0
S1 S2 S3 S4
S1 S2 S3 S4
exploration
exploitation
18
highest-upper-bound learning

choose strategy with highest upper bound
proposed by Kaelbling 93
empirically shown to do well in various problems
intuition

1
1
0
0
S1 S2 S3 S4
S1 S2 S3 S4
exploration
exploitation
19
overview

introduction
approach
experimental setup
results
discussion

20
system

Lets Go! Public bus information system
connected to PAT customer service line during
non-business hours
30-50 calls / night

21
strategies
22
constraints

constraints
dont AREP more than twice in a row
dont ARPH if words lt 3
dont ASA unless words gt 5
dont ASO unless (4 nonu in a row) and
(ratio.nonu gt 50)
dont GUP unless (dialog gt 30 turns) and
(ratio.nonu gt 80)
capture expert knowledge ensure system doesnt
use an unreasonable policy
4.2/11 strategies available on average
min1, max9

23
features

current non-understanding
recognition, lexical, grammar, timing info
current non-understanding segment
length, which strategies already taken
current dialog state and history
encoded dialog states
how good things have been going

24
learning

baseline period 2 weeks, 3/11 -gt 3/25, 2006
system randomly chose a strategy, while obeying
constraints
in effect, a heuristic / stochastic policy
learning period 5 weeks, 3/26 -gt 5/5, 2006
each morning labeled data from previous night
retrained likelihood of success predictors
installed in the system for the next night

25
2 strategies eliminated
26
overview

introduction
approach
experimental setup
results
discussion

27
results

average non-understanding recovery rate (ANNR)
improvement 33.6 ? 37.8 (p0.03) (12.5rel)
fitted learning curve

A 0.3385 B 0.0470 C 0.5566 D -11.44
28
policy evolution

MOVE, HLP, ASA engaged more often
AREP, ARPH engaged less often

29
overview

introduction
approach
experimental setup
results
discussion

30
are the predictors learning anything?

AREP(653), IT(273), SLL(300)
no informative features
ARPH(674), MOVE(1514)
1 informative feature (prev.nonu, words)
ASA(637), RP(2532), HLP(3698), HLP_R(989)
4 or more informative features in the model
dialog state (especially explicit confirm states)
dialog history

31
more features, more (specific) strategies

more features would be useful
day-of-week
clustered dialog states
? (any ideas?) ?
more strategies / variants
approach might be able to filter out bad versions
more specific strategies, features
ask short answers worked well
speak less loud didnt (why?)

32
noise in the experiment

15-20 of responses following non-understandings
are non-user-responses
transient noises
secondary speech
primary speech not directed to the system
this might affect training, in a future
experiment we want to eliminate that

33
unsupervised learning

supervised version
success next turn is correctly
understoodi.e. no misunderstanding, no
non-understanding
unsupervised version
success next turn is not a non-understanding
success confidence score of next turn
training labels automatically available
performance improvements might still be possible

34
thank you!

Write a Comment

User Comments (0)