baselines for recovering from nonunderstandings - PowerPoint PPT Presentation

About This Presentation
Title:

baselines for recovering from nonunderstandings

Description:

System cannot form a valid interpretation for the user's input non-understanding ... Handcraft a policy. Learn it: for instance in a reinforcement learning framework ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 41
Provided by: dbo1
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: baselines for recovering from nonunderstandings


1
baselines for recovering from non-understandings
  • a planned investigationDan Bohus
  • DoD Reading Group, May 2004

2
understanding errors
  • Two main types of errors
  • System cannot form a valid interpretation for the
    users input ? non-understanding
  • System forms an incorrect interpretation of the
    users input ? misunderstanding

3
non-understandings
  • Can be a big pain, theres not much to act upon
  • S When do you need a room?
  • U blah bla blahaha 3 unparsed words
  • S Im sorry I didnt understand you. can you
    please repeat that?
  • What would you do in the systems place?

4
an example
  • 1 S Ok. I will now make the reservation for
    Newell Simon thirty three oh five. Could I have
    your name please? If you are not registered with
    the system, just say 'guest'.
  • U guest YES
  • 2 S Sorry, I'm not sure I understood what you
    said
  • U guest DATE
  • 3 S I am currently trying to make a room
    reservation for Newell Simon thirty three oh
    five. Right now I need to know your name in order
    to make a conference room reservation. Tell me
    your name, or say 'guest' if you are not
    registered with the system. If you do not want to
    make this reservation, say 'cancel reservation'.
  • U guest THE
  • 4 S I apologize, I still did not understand you
     Could you try to rephrase that please?
  • U guest YEAH
  • 5 S Sorry, I didn't catch that.
  • U oh my god guest we're a guest .WHAT. .YES.
    .YES. FOR TUESDAY
  • 6 S Newell Simon thirty five oh one is the only
    room available Tuesday, April twentieth between
    ten and noon. This room can accommodate seventeen
    people, and has a projector , a whiteboard and
    network access. Do you want a reservation for
    Newell Simon thirty five oh one?
  • U yes YES
  • 7 S Ok. I will now make the reservation for
    Newell Simon thirty five oh one.  Could I have
    your name please? If you are not registered with
    the system, just say 'guest'.
  • U guest OKAY
  • 8 S I'm sorry, I don't think I understood you
    correctly. Could you please repeat that?
  • U I'm a f guest UP AGAIN
  • 9 S Sorry, I still did not understand you.

5
causes
  • Users language does not match systems input
    language
  • Languages match, but recognition problems
  • Recognition OK, but parsing problems
  • Parsing OK, but interpretation problems

6
causes
  • Users language does not match systems input
    language
  • Languages match, but recognition problems
  • Recognition OK, but parsing problems
  • Parsing OK, but interpretation problems
  • To Do a more detailed analysis!

7
what can one do ?
  • Notify non-understanding
  • Repeat system prompt
  • Ask repeat
  • Ask rephrase
  • Help give state specific help
  • Help give help about what the user can say
  • Help establish the context

8
what else?
  • Try an alternative dialog plan to achieve the
    same goal
  • including ignore, take default value
  • Extract more information/content from the
    non-understanding, and do something smarter with
    that
  • Use fall-back parses on the recognition
    hypothesis
  • Explicit confirm turn (Antoine)
  • Targeted help
  • Other ideas?

9
the decision process
POLICY
Strategies
True causes
  • Handcraft a policy
  • Learn it for instance in a reinforcement
    learning framework

10
markov decision processes
  • States
  • Various non-understanding states
  • 1 understanding state (final)
  • Actions
  • Recovery strategies
  • Rewards
  • -10 on each transition to a non-understanding
    state

-10
NU2
Repeat
NU3
NU1
-10
U
0
11
pros and cons of learning
  • Cons
  • Would a heuristic be good enough?
  • Is there going to be enough data?
  • Pros
  • Adaptive (different levels)
  • Harder to devise heuristics with a large number
    of strategies () more justification
  • Less development effort (?)

12
better policy or strategies?
POLICY
Strategies
True causes
?
?
  • Hypothesis
  • This set of strategies is sufficient, and a good
    policy would make a whole lot of difference

13
a checkpoint experiment
  • Run an experiment
  • Let a human make the non-understanding recovery
    decisions
  • Goal can we do significantly better than a
    random policy? (given a fixed set of strategies)
  • Create a second, higher (upper-bound) baseline,
    and hence a frame for the learning approach
  • Validating the set of strategies/ Green light
    for concentrating on the policy (?)

14
experimental design
  • Goal
  • How well does random do? Preliminary results
  • Variables
  • System / Setup
  • Participants
  • Tasks
  • Potential outcomes, alternatives, discussion

15
random baseline (preliminary)
  • 103 sessions (1040 utterances) RoomLine
  • 274 non-understandings (26.3)
  • 172 non-understanding segments
  • 1 6 turns (distribution on next slide)
  • avg. segment length 1.6 turns
  • To Do more stats
  • Identify trouble spots
  • Correlation of success to various indicators

16
random baseline (preliminary)
17
random baseline (preliminary)
18
random baseline (preliminary)
19
random baseline (preliminary)
20
confidence intervals
21
experimental design
  • Goal
  • How well does random do? Preliminary results
  • Variables
  • System / Setup
  • Participants
  • Tasks
  • Potential outcomes, alternatives, discussion

22
variables
  • Independent variable recovery policy
  • 2 levels random and human
  • 3 levels? expert-designed policy?
  • Dependent variable recovery performance
  • Evaluating efficiencies of each strategy
  • Data requirements are problematic in WoZ
    condition
  • Evaluating global, dialog-level metrics
  • Task completion rates
  • Various statistics of error segments
  • To Do Assess data requirements

23
variables (2)
  • Potential confounding variable response time
  • Wizard response will be slower (how much so?)
  • Compensate?
  • Using distribution of wait times from pilot
    experiments
  • Conditions would be consistent, but both
    different from reality (lowered performance)
  • Dont compensate? (it will presumably lower the
    performance)
  • Hmm Other ideas?

24
experimental design
  • Goal
  • How well does random do? Preliminary results
  • Variables
  • System / Setup
  • Participants
  • Tasks
  • Potential outcomes, alternatives, discussion

25
system setup
  • Random condition
  • RoomLine current system
  • Wizard condition
  • RoomLine guides all interaction, except for the
    non-understanding recovery decisions ? wizard
  • Physical setup all in speech lab, wizard _at_ rack
  • noise conditions okay?
  • Alternative for random condition, call from home
  • can be done for both between and within-subjects
  • are there other confounding variables? (phone
    line?)

26
system setup / strategies
  • Notify non-understanding
  • Repeat prompt / w. notify
  • Ask repeat / w. notify
  • Ask rephrase / w. notify
  • Help state dependent / w. notify
  • Help you can say / w. notify
  • Help full help / w. notify
  • To Do add Alternative plans

27
system setup / who is the wizard
  • Me?
  • Pros already familiar with the process
  • Cons might already be biased in various ways
  • does bias matter if Im trying to do my best?
  • should I avoid biasing myself?
  • or should I actively try and do my homework?
  • Someone else?
  • Cons will have to train, explain
  • Multiple wizards?
  • Would probably be the way to go, but too expensive

28
system setup / what should the wizard see?
  • Full Knowledge
  • audio
  • recognition results, conf scores, etc
  • parsing results
  • non-understanding type
  • System Knowledge
  • no audio only what the system knows
  • that seems like a hard task for a human

29
experimental design
  • Goal
  • How well does random do? Preliminary results
  • Variables
  • System / Setup
  • Participants
  • Tasks
  • Potential outcomes, alternatives, discussion

30
participants / data
  • 100 trials / strategy (0.15 conf interval) ?
    200 sessions for each condition (this is _at_ 7
    strategies)
  • Within subjects (?)
  • 40 users, 5 session in each condition
    (randomized)
  • Between subjects (?)
  • 2x20 users, 10 sessions
  • 20 random condition can they call from home?
  • System could still have simulated response delay
    (?)
  • Balance for gender, computer-saviness(?)
  • Anything else?

31
experimental design
  • Goal
  • How well does random do? Preliminary results
  • Variables
  • System / Setup
  • Participants
  • Tasks
  • Potential outcomes, alternatives, discussion

32
tasks
  • 5/10 scenarios (out of a pool of multiple?)
  • How does one design those?
  • Any papers? Any rules?
  • Use graphical representation? to avoid lexical
    entrainment
  • 2 free interactions, 1 _at_ beginning, 1 _at_ end
  • Briefing
  • Debriefing SASSI

33
experimental design
  • Goal
  • How well does random do? Preliminary results
  • Variables
  • System / Setup
  • Participants
  • Tasks
  • Potential outcomes, alternatives, discussion

34
outcomes /when wizard knows all
  • There is a statistically significant improvement
  • We have a frame for learning
  • Theres space for improvement given this set of
    strategies
  • But we cant really claim an upper baseline!
  • Can use data for further analysis
  • correlation of indicators to strategy invocation
    success
  • There is no statistically significant difference
  • Not guaranteed what that means
  • Is the set of strategies too inefficient?
  • Are strategies insensitive to conditions?
  • Is task too complex for a human? (least likely)

35
outcomes /when wizard knows system
  • There is a statistically significant improvement
  • That result is even stronger than before
  • There is no statistically significant difference
  • Probably task is inappropriate for a human, but
    other explanations could be valid, too

36
most likely plan (as of before this talk)
  • wizard has full audio
  • i am the wizard
  • train myself
  • add the alternative plan strategy
  • between-subjects experiments

37
most likely plan (as of now)

38
alternative directions
POLICY
Observables / Indicators
Strategies
True causes
True causes
  • Concentrate more on strategies
  • A comparative experiment to assess the benefits
    of having more strategies

39
alternative directions
POLICY
True causes
Observables / Indicators
Strategies
?
  • Different approach
  • Infer true causes and use a simple policy

40
conclusion next time
Write a Comment
User Comments (0)
About PowerShow.com