online supervised learning of nonunderstanding recovery policies - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

online supervised learning of nonunderstanding recovery policies

Description:

two step approach. step 1: learn to estimate probability of success for each strategy, in ... probability of success for each strategy, in a given situation ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 35
Provided by: DanB92
Category:

less

Transcript and Presenter's Notes

Title: online supervised learning of nonunderstanding recovery policies


1
online supervised learning of non-understanding
recovery policies
  • Dan Bohus
  • www.cs.cmu.edu/dbohus
  • dbohus_at_cs.cmu.edu
  • Computer Science Department
  • Carnegie Mellon University
  • Pittsburgh, PA 15213

with thanks to Alex Rudnicky Brian
Langner Antoine Raux Alan Black Maxine Eskenazi
2
understanding-errors in spoken dialog
MIS-understanding
NON-understanding
System constructs an incorrect semantic
representation of the users turn
System fails to construct a semantic
representation of the users turn
S Where are you flying from? U Birmingham
BERLIN PM
S Where are you flying from? U Urbana
Champaign OKAY IN THAT SAME PAY
3
recovery strategies
  • large set of strategies (strategy 1-step
    action)
  • tradeoffs not well understood
  • some strategies are more appropriate at certain
    times
  • OOV -gt ask repeat is not a good idea
  • door slam -gt ask repeat might work well
  • Sorry, I didnt catch that
  • Can you repeat that?
  • Can you rephrase that?
  • Where are you flying from?
  • Please tell me the name of the city you are
    leaving from
  • Could you please go to a quieter place?
  • Sorry, I didnt catch that tell me the state
    first

S
4
recovery policy
  • policy method for choosing between strategies
  • difficult to handcraft
  • especially over a large set of recovery
    strategies
  • common approaches
  • heuristic
  • three strikes and youre out Balentine
  • 1st non-understanding ask user to repeat
  • 2nd non-understanding provide more help,
    including examples
  • 3rd non-understanding transfer to an operator

5
this talk
  • an online, supervised method for learning a
    non-understanding recovery policy from data

6
overview
  • introduction
  • approach
  • experimental setup
  • results
  • discussion

7
overview
  • introduction
  • approach
  • experimental setup
  • results
  • discussion

8
intuition
  • if we knew the probability of success for each
    strategy in the current situation, we could
    easily construct a policy

S Where are you flying from? U OKAY IN THAT
SAME PAY Urbana Champaign
S
  • Sorry, I didnt catch that
  • Can you repeat that?
  • Can you rephrase that?
  • Where are you flying from?
  • Please tell me the name of the city you are
    leaving from
  • Could you please go to a quieter place?
  • Sorry, I didnt catch that tell me the state
    first

32 15 20 30 45 25 43
9
two step approach
  • step 1 learn to estimate probability of success
    for each strategy, in a given situation
  • step 2 use these estimates to choose between
    strategies (and hence build a policy)

10
learning predictors for strategy success
  • supervised learning logistic regression
  • target strategy recovery successfully or not
  • success next turn is correctly understood
  • labeled semi-automatically
  • features describe current situation
  • extracted from different knowledge sources
  • recognition features
  • language understanding features
  • dialog-level features state, history

11
logistic regression
  • well-calibrated class-posterior probabilities
  • predictions reflect empirical probability of
    success
  • x of cases where P(SF)x are indeed successful
  • sample efficient
  • one model per strategy, so data will be sparse
  • stepwise construction
  • automatic feature selection
  • provide confidence bounds
  • very useful for online learning

12
two step approach
  • step 1 learn to estimate probability of success
    for each strategy, in a given situation
  • step 2 use these estimates to choose between
    strategies (and hence build a policy)

13
policy learning
  • choose strategy most likely to succeed

1
0
S1 S2 S3 S4
  • BUT
  • we want to learn online
  • we have to deal with the exploration /
    exploitation tradeoff

14
highest-upper-bound learning
  • choose strategy with highest-upper-bound
  • proposed by Kaelbling 93
  • empirically shown to do well in various problems
  • intuition

1
1
0
0
S1 S2 S3 S4
S1 S2 S3 S4
exploration
exploitation
15
highest-upper-bound learning
  • choose strategy with highest upper bound
  • proposed by Kaelbling 93
  • empirically shown to do well in various problems
  • intuition

1
1
0
0
S1 S2 S3 S4
S1 S2 S3 S4
exploration
exploitation
16
highest-upper-bound learning
  • choose strategy with highest upper bound
  • proposed by Kaelbling 93
  • empirically shown to do well in various problems
  • intuition

1
1
0
0
S1 S2 S3 S4
S1 S2 S3 S4
exploration
exploitation
17
highest-upper-bound learning
  • choose strategy with highest upper bound
  • proposed by Kaelbling 93
  • empirically shown to do well in various problems
  • intuition

1
1
0
0
S1 S2 S3 S4
S1 S2 S3 S4
exploration
exploitation
18
highest-upper-bound learning
  • choose strategy with highest upper bound
  • proposed by Kaelbling 93
  • empirically shown to do well in various problems
  • intuition

1
1
0
0
S1 S2 S3 S4
S1 S2 S3 S4
exploration
exploitation
19
overview
  • introduction
  • approach
  • experimental setup
  • results
  • discussion

20
system
  • Lets Go! Public bus information system
  • connected to PAT customer service line during
    non-business hours
  • 30-50 calls / night

21
strategies
22
constraints
  • constraints
  • dont AREP more than twice in a row
  • dont ARPH if words lt 3
  • dont ASA unless words gt 5
  • dont ASO unless (4 nonu in a row) and
    (ratio.nonu gt 50)
  • dont GUP unless (dialog gt 30 turns) and
    (ratio.nonu gt 80)
  • capture expert knowledge ensure system doesnt
    use an unreasonable policy
  • 4.2/11 strategies available on average
  • min1, max9

23
features
  • current non-understanding
  • recognition, lexical, grammar, timing info
  • current non-understanding segment
  • length, which strategies already taken
  • current dialog state and history
  • encoded dialog states
  • how good things have been going

24
learning
  • baseline period 2 weeks, 3/11 -gt 3/25, 2006
  • system randomly chose a strategy, while obeying
    constraints
  • in effect, a heuristic / stochastic policy
  • learning period 5 weeks, 3/26 -gt 5/5, 2006
  • each morning labeled data from previous night
  • retrained likelihood of success predictors
  • installed in the system for the next night

25
2 strategies eliminated
26
overview
  • introduction
  • approach
  • experimental setup
  • results
  • discussion

27
results
  • average non-understanding recovery rate (ANNR)
  • improvement 33.6 ? 37.8 (p0.03) (12.5rel)
  • fitted learning curve

A 0.3385 B 0.0470 C 0.5566 D -11.44
28
policy evolution
  • MOVE, HLP, ASA engaged more often
  • AREP, ARPH engaged less often

29
overview
  • introduction
  • approach
  • experimental setup
  • results
  • discussion

30
are the predictors learning anything?
  • AREP(653), IT(273), SLL(300)
  • no informative features
  • ARPH(674), MOVE(1514)
  • 1 informative feature (prev.nonu, words)
  • ASA(637), RP(2532), HLP(3698), HLP_R(989)
  • 4 or more informative features in the model
  • dialog state (especially explicit confirm states)
  • dialog history

31
more features, more (specific) strategies
  • more features would be useful
  • day-of-week
  • clustered dialog states
  • ? (any ideas?) ?
  • more strategies / variants
  • approach might be able to filter out bad versions
  • more specific strategies, features
  • ask short answers worked well
  • speak less loud didnt (why?)

32
noise in the experiment
  • 15-20 of responses following non-understandings
    are non-user-responses
  • transient noises
  • secondary speech
  • primary speech not directed to the system
  • this might affect training, in a future
    experiment we want to eliminate that

33
unsupervised learning
  • supervised version
  • success next turn is correctly
    understoodi.e. no misunderstanding, no
    non-understanding
  • unsupervised version
  • success next turn is not a non-understanding
  • success confidence score of next turn
  • training labels automatically available
  • performance improvements might still be possible

34
thank you!
Write a Comment
User Comments (0)
About PowerShow.com