roadmap to a thesis proposal - PowerPoint PPT Presentation

About This Presentation
Title:

roadmap to a thesis proposal

Description:

S: What is your full name ? U: Adam Schumacher [ATHENS IN AKRON] ... Which destination did you want ? U: Neither [NEITHER] S: I'm sorry, I didn't catch that. ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 65
Provided by: csC76
Learn more at: http://www.cs.cmu.edu
Category:
Tags: did | get | how | its | jupiter | name | proposal | roadmap | thesis

less

Transcript and Presenter's Notes

Title: roadmap to a thesis proposal


1
Increased Robustness in Spoken Dialog Systems
  • (roadmap to a thesis proposal)
  • Dan Bohus,
  • SPHINX Lunch, May 2003

2
The problem
  • S Are you a registered user ?
  • U No I'm not. No NO I'M NOT NO
  • S What is your full name ?
  • U Adam Schumacher ATHENS IN AKRON
  • S There is an Athens in Georgia and in Greece.
    Which destination did you want ?
  • U Neither NEITHER
  • S I'm sorry, I didn't catch that. There is an
    Athens in Georgia and in Greece. Which
    destination did you want ?
  • U Georgia GEORGIA
  • S A flight from Athens... Where do you want to
    go ?
  • U Start over START OVER
  • S Do you really want to start over ?
  • U Yes YES
  • S What city are you leaving from ?
  • U Hamilton, Ontario HILTON ONTARIO
  • S Sorry, I'm not sure I understood what you
    said. Where are you leaving from ?
  • U Hamilton HILTON
  • S I'm still having trouble understanding you. To
    go on, I need you to answer the following
    question. What city are you leaving from ?
  • U Toronto TORONTO

3
Is this really a big problem?
  • Lots of anecdotal evidence
  • And some statistics
  • CMU Communicator
  • 66 of the sessions contain a serious
    misunderstanding
  • 26 result in a complete breakdown in interaction
  • Remaining 40 are still frustrating experiences

4
More statistics
  • USC study Shin et al (1)
  • Labeled errors and user behavior on Communicator
    (multi-site) corpus
  • Average 1.66 error segments/session
  • 78 error segments get back on track
  • 37 of the sessions have errors leading to
    complete breakdown in interaction

Failed
37
5
Yet more statistics
  • Utterance level understanding error rates
  • CMU Communicator 32.4 ? 66 of sess.
  • Rudnicky, Bohus et al (2)
  • CU Communicator 27.5 ? of sess.
  • Segundo (3)
  • HMIHY (ATT) 36.5 ? of sess.
  • Walker (4)
  • Jupiter (MIT) 28.5 ? of sess.
  • Hazen (5)

6
It is a significant problem !
  • Roughly


10-30 lead to interaction breakdowns
60-70 contain misunderstandings
7
Goal of proposed work

interaction breakdowns
sessions containing misunderstandings
8
Outline
  • The problem
  • Sources of the problem
  • The approach
  • Infrastructure the RavenClaw framework
  • Proposed work, in detail
  • Discussion

9
The problems in more detail
  • S Are you a registered user ?
  • U No I'm not. No NO I'M NOT NO
  • S What is your full name ?
  • U Adam Schumacher ATHENS IN AKRON
  • S There is an Athens in Georgia and in Greece.
    Which destination did you want ?
  • U Neither NEITHER
  • S I'm sorry, I didn't catch that. There is an
    Athens in Georgia and in Greece. Which
    destination did you want ?
  • U Georgia GEORGIA
  • S A flight from Athens... Where do you want to
    go ?
  • U Start over START OVER
  • S Do you really want to start over ?
  • U Yes YES
  • S What city are you leaving from ?
  • U Hamilton, Ontario HILTON ONTARIO
  • S Sorry, I'm not sure I understood what you
    said. Where are you leaving from ?
  • U Hamilton HILTON
  • S I'm still having trouble understanding you. To
    go on, I need you to answer the following
    question. What city are you leaving from ?
  • U Toronto TORONTO

10
Three contributing factors
  • 1. Low accuracy of speech recognition
  • 2. Inability to assess reliability of beliefs
  • 3. Lack of efficient error recovery and
    prevention mechanisms

11
Factor 1 Low recognition accuracy
  • ASR still imperfect at best
  • Variability environmental, speaker
  • 10-30 WER in spoken language systems
  • Tradeoff Accuracy vs. System Flexibility
  • Effect Main source of errors in SDS
  • WER ? most important predictor of user
    satisfaction Walker et al (6,7)
  • Users prefer less flexible, more accurate systems
    Walker et al (8)

12
Factor 2 Inability to assess reliability of
beliefs
  • Errors typically propagate to the upper levels of
    the system, leading to
  • Non-understandings
  • Misunderstandings
  • Effect Misunderstandings are taken as facts and
    acted upon
  • At best extra turns, user-initiated repairs,
    frustration
  • At worst complete breakdown in interaction

13
Factor 3 Lack of recovery mechanisms
  • Small number of strategies
  • Implicit and explicit verifications most popular
  • Sub-optimal implementations
  • Triggered in an ad-hoc / heuristic manner
  • Problem is often regarded as an add-on
  • Non-uniform, domain-specific treatment
  • Effect Systems prone to complete breakdowns in
    interaction

14
Outline
  • The problem
  • Sources of the problem
  • The approach
  • Infrastructure the RavenClaw framework
  • Proposed work, in detail
  • Discussion

15
Three contributing factors
  • 1. Low accuracy of speech recognition
  • 2. Inability to assess reliability of beliefs
  • 3. Lack of efficient error recovery and
    prevention mechanisms

16
Approach 1
  • 1. Low accuracy of speech recognition
  • 2. Inability to assess reliability of beliefs
  • 3. Lack of efficient error recovery and
    prevention mechanisms

17
Approach 2
  • 1. Low accuracy of speech recognition
  • 2. Inability to assess reliability of beliefs
  • 3. Lack of efficient error recovery and
    prevention mechanisms

18
Why not just fix ASR?
  • ASR performance is improving, but requirements
    are increasing too
  • ASR will not become perfect anytime soon
  • ASR is not the only source of errors
  • Approach 2 ensure robustness under a large
    variety of conditions

19
Proposed solution
  • Assuming the inputs are unreliable

A.Make systems able to assess the reliability of
their beliefs
B.Optimally deploy a set of error prevention and
recovery strategies
20
Proposed solution more precisely
  • Assuming the inputs are unreliable

1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
B.Optimally deploy a set of error prevention and
recovery strategies
21
Proposed solution more precisely
  • Assuming the inputs are unreliable

1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
  • Do it in a domain-independent manner !

22
Outline
  • The problem
  • Sources of the problem
  • The approach
  • Infrastructure the RavenClaw framework
  • Proposed work, in detail
  • Discussion

23
The RavenClaw DM framework
  • Dialog Management framework for complex,
    task-oriented dialog systems
  • Separation between Dialog Task and Generic
    Conversational Skills
  • Developer focuses only on Dialog Task description
  • Dialog Engine automatically ensures a minimum set
    of conversational skills
  • Dialog Engine automatically ensures the grounding
    behaviors

24
RavenClaw architecture
Communicator
Welcome
Login
Travel
Locals
Bye
AskRegistered
GreetUser
GetProfile
Leg1
AskName
DepartLocation
ArriveLocation
  • Dialog Task implemented by a hierarchy of agents
  • Information captured in concepts
  • Probability distributions over sets of values
  • Support for belief assessment grounding
    mechanisms

25
Domain-Independent Grounding
26
RavenClaw-based systems
  • LARRISymphony Language-based Assistant for
    Retrieval of Repair Information
  • IPANASA Ames Intelligent Procedure Assistant
  • BusLineLets Go! Pittsburgh bus route
    information
  • RoomLine conference room reservation at CMU
  • TeamTalk11-754 spoken command and control for
    a team of robots

27
Outline
  • The problem
  • Sources of the problem
  • The approach
  • Infrastructure the RavenClaw framework
  • Proposed work, in detail
  • Discussion

28
Previous/Proposed Work Overview
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
29
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
30
Reliability of beliefs
  • Continuously assess reliability of beliefs
  • Two sub-problems
  • Computing the initial confidence in a concept
  • Confidence annotation problem
  • Update confidence based on events in the dialog
  • User reaction to implicit or explicit
    verifications
  • Domain reasoning

31
Confidence annotation
  • Traditionally focused on ASR Chase(9),
  • More recently, interest in CA geared towards use
    in SDS Walker(4), Segundo(3), Hazen(5),
    Rudnicky, Bohus et al (2)
  • Utterance-level, Concept-level CA
  • Integrating multiple features
  • ASR acoustic lm scores, lattice, n-best
  • Parser various measures of parse goodness
  • Dialog Management state, expectations, history,
    etc
  • 50 relative improvement in classification error

32
Confidence annotation To Do List
  • Improve accuracy even more ???
  • More features / Less features / Better features
  • Study transferability across domains ???
  • Q Can we identify a set of features that
    transfer well?
  • Q Can we use un- or semi-supervised learning or
    bootstrap from little data and an annotator in a
    different domain?

33
Confidence updating ???
  • To my knowledge, not really studied yet!

34
Confidence updating approaches
  • Naïve Bayesian updating ???
  • Assumptions do not match reality
  • Analytical model ???
  • Set of heuristic / probabilistic rules
  • Data-driven model ???
  • Define events as features
  • Learning task
  • Initial Conf. E1 E2 E3 ? Current Conf.
    1/0
  • Bypass confidence updating ???
  • Keep all events as grounding state indicators
    (doesnt lose that much information)

35
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
36
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
37
Correction Detection
  • Automatically detect at run-time correction sites
    or aware sites
  • Another data-driven classification task
  • Prosodic features, bag-of-words features, lexical
    markers Litman(10), Bosch(11), Swerts(12),
    Lewov(13)
  • Useful for
  • implementation of implicit / explicit
    verifications
  • belief assessment / updating
  • as direct indicator for grounding decisions

38
Correction Detection To Do List
  • Build an aware site detector ???
  • Q Can we identify what is the user correcting?
  • Study transferability across domains
    ???
  • Q Can we identify a set of features that
    transfer well?
  • Q Can we use un- or semi-supervised learning or
    bootstrap from little data and a detector in a
    different domain?

39
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
40
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
???
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
41
Goodness-of-dialog indicators ???
  • Assessing how well a conversation is advancing
  • Non-understandings
  • Q Can we identify the cause?
  • Q Can we relate a non-understoodutterance to a
    dialog expectation?
  • Dialog State related indicators / Stay_Here
  • Q Can we expand this to some distance to
    optimal dialog trace?
  • Overall confidence in beliefs within topic
  • Q How to aggregate? Entropy-based measures?
  • Allow for task-specific metrics of goodness

42
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
???
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
43
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
???
???
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
???
44
Grounding Actions
  • Design and evaluate a rich set of strategies for
    preventing and recovering from errors (both
    misunderstandings and non-understandings)
  • Current status few strategies used / analyzed
  • Explicit verification Did you say Pittsburgh?
  • Implicit verification traveling from
    Pittsburgh when do you want to leave?

45
Explicit Implicit Verifications
  • Analysis of user behavior following these 2
    strategies Krahmer(10), Swerts(11)
  • User behavior is rich, correction detectors are
    important!
  • Design is important!
  • Did you say Pittsburgh?
  • Did you say Pittsburgh? Please respond yes or
    no.
  • Do you want to fly from Pittsburgh?
  • Correct implementation adequate support is
    important!
  • Users discovering errors through implicit
    confirmations are less likely to get back on
    track hmm

46
Strategies for misunderstandings
  • Explicit verification (w/ variants)
  • Implicit verification (w/ variants)
  • Disambiguation
  • Im sorry, are you flying out of Pittsburgh or
    San Francisco?
  • Rejection
  • Im not sure I understood what you said. Can you
    tell me again where are you flying from?

47
Strategies for non-understandings - I
  • Lexically entrain
  • Right now I need you to tell me the departure
    city You can say for instance, Id like to fly
    from Pittsburgh.
  • Ask repeat
  • Im not sure I understood you. Can you repeat
    that please?
  • Ask reformulate
  • Can you please rephrase that?
  • Diagnose
  • If non-understanding source can be
    known/estimated, give that information to the
    user
  • I cant hear you very well. Can you please speak
    closer to the microphone?

48
Strategies for non-understandings - II
  • Select alternative plan Domain specific
    strategies
  • E.g. try to get state name first, then city name
  • Establish context ( Confirm context variant)
  • Right now Im trying to gather enough
    information to make a room reservation. So far I
    know you want a room on Tuesday. Now I need to
    know for what time you need the room.
  • Give targeted help
  • Give help on the topic / focus of the
    conversation / estimated user goal
  • Constrain language model / recognition

49
Strategies for non-understandings - III
  • Switch input modality (i.e. DTMF, pen, etc)
  • Restart topic / backup dialog
  • Start-over
  • Switch to operator
  • Terminate session

50
Grounding Strategies To Do List
  • Design, implement, analyze, iterate
  • Human-Human dialog analysis ???
  • Design the strategies, with variants
    andappropriate support ???
  • Implement in the RavenClaw framework ???
  • Perform data-driven analysis ???
  • Q User behaviors
  • Q Applicability conditions
  • Q Costs, Success rates

51
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
???
???
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
???
???
52
Grounding decision model
  • Decide which is the best grounding action to take
    at a certain time
  • Goals / Desired properties
  • Domain Independent
  • Adaptive
  • Learn and target any dialog performance metric
  • Adjust to large variations in the reliability of
    inputs
  • Accept any new strategies on the fly
  • Scalable

53
Previous work
  • Conversation as action under uncertainty
    Horvitz(14), Paek(15)
  • Bayesian decision theory with assumed utilities
  • Reinforcement learning in spoken dialog systems
    Kearns(16), Singh(17), Pieraccini(18),
    Litman(19), Walker(20)
  • Learning dialog policies
  • Heuristic approaches add refs
  • Predominant in todays systems

54
Grounding Decision Theoretic Approach
  • Given
  • Set of states Ss and a probabilistic model of
    state given some evidence e, P(se) ? grounding
    state indicators
  • Set of actions Aa ? grounding actions
  • Model describing the utility of each action from
    each state U(s,a) ? grounding model
  • Take action that maximizes expected utility
  • EU(ae) ?S U(a,s)p(se)

55
The missing ingredient Utilities
  • Utilities matrix (S x A)
  • Handcraft ???
  • Learn from data ???

56
Learning utilities
  • Essentially a POMDP problem
  • Hidden state
  • Belief dictated by grounding state indicator
    models
  • Actions
  • Strategies
  • Rewards
  • Targeted optimization measures

EV
IC
IV
NGA
EV
C
IV
NGA
U
NGA
57
A possible overall architecture
  • 2 types of grounding models
  • Dealing with misunderstandings, one grounding
    model per concept
  • Dealing with non-understandings, one grounding
    model per agent

Communicator
Welcome
Login
Travel
Locals
Bye
AskRegistered
GreetUser
GetProfile
Leg1
AskName
DepartLocation
ArriveLocation
58
A possible overall architecture
  • Q How to combine the decisions? ???
  • Identify a small set of rules
  • E.g. concepts first, then agents focused-to-top
  • Hierarchical POMDP approaches ? Roy, Pineau,
    Thrun

Communicator
Welcome
Login
Travel
Locals
Bye
AskRegistered
GreetUser
GetProfile
Leg1
AskName
DepartLocation
ArriveLocation
59
A possible overall architecture
  • Q Formulate parallel learning problem ???
  • Large numbers of small models are good in
    principle
  • Need to clearly identify assumptions
  • Or hierarchical learning problem

Communicator
Welcome
Login
Travel
Locals
Bye
AskRegistered
GreetUser
GetProfile
Leg1
AskName
DepartLocation
ArriveLocation
60
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
???
???
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
???
???
61
Evaluation

interaction breakdowns
sessions containing misunderstandings
62
Evaluation
  • Evaluate proposed framework across a large
    variety of domains
  • RoomLine, BusLine, LARRI, TeamTalk, etc
  • Grounding state indicators evalution ???
  • Internal metrics, e.g. accuracy, etc
  • Grounding strategies analysis ???
  • Empirical analysis
  • Quantitative assessments costs, success rates
  • Qualitative insights user behaviors, best
    variants

63
Evaluation
  • Grounding model / framework evaluation (in terms
    of chosen performance metric)
  • Against expert heuristic strategy ???
  • Against smaller number of strategies
    ???
  • Against non-adaptive system ???

64
? and !
Write a Comment
User Comments (0)
About PowerShow.com