A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces - PowerPoint PPT Presentation

About This Presentation
Title:

A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces

Description:

Okay what day would you be departing chicago ... I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 32
Provided by: danb7
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces


1
A Scalable Reinforcement Learning Approach to
Error Handling in Spoken Language Interfaces
  • Dan Bohus
  • www.cs.cmu.edu/dbohus
  • dbohus_at_cs.cmu.edu
  • Computer Science Department
  • Carnegie Mellon University
  • Pittsburgh, PA, 15217

2
problem
spoken language interfaces lack robustness when
faced with understanding errors.
3
more concretely
  • S What city are you leaving from?
  • U Urbana Champaign OKAY IN THAT SAME PAY
  • S Sorry Im not sure I understood what you said.
    What city are you leaving from ?
  • U Urbana Champaign FOR MINUTE SINCE HEY
  • S Im still having trouble understanding you. To
    go on, I need you to answer the following
    question. What city are you leaving from?
  • U Chicago CHICAGO
  • S traveling from Chicago. Where would you like
    to go?
  • U Huntsville SEOUL
  • S traveling to Seoul. What day did you need to
    travel?
  • U no no Im traveling to Birmingham THE
    TRAVELING TO BERLIN P_M
  • S traveling in the afternoon. Okay what day
    would you be departing chicago
  • U /uh/ the tenth of august AT THE TENTH OF
    AUGUST
  • S a flight tomorrow I have a Korean Airlines
    flight departing Chicago at 140pm, arrives Seoul
    at 5pm the next day. Do you want to take that?
  • U my destination is Birmingham FLIGHT
    DESTINATION MR WEEKEND
  • S traveling on Saturday, August 12th I have a
    flight departing Chicago at 140pm arrives Seoul
    at

4
problem source
  • stems mostly from speech recognition
  • spans most domains and interaction types
  • exacerbated by operating conditions
  • spontaneous speech
  • medium / large vocabularies
  • large, varied, and changing user populations

5
speech recognition impact
  • typical word-error-rates
  • 10-20 for natives (novice users)
  • 40 and above for non-native users
  • significant negative impact on performanceWalker
    , Sanders

task success
word-error-rate
6
approaches for increasing robustness
  • fix recognition
  • gracefully handle errors through interaction
  • detect the problems
  • develop a set of recovery strategies
  • know how to choose between them (policy)

a closer look RL in spoken dialog systems
current challenges RL for error handling
7
outline
  • a closer look at the problem
  • RL in spoken dialog systems
  • current challenges
  • a proposed RL approach for error handling

a closer look RL in spoken dialog systems
current challenges RL for error handling
8
non- and misunderstandings
  • S What city are you leaving from?
  • U Urbana Champaign OKAY IN THAT SAME PAY
  • S Sorry Im not sure I understood what you said.
    What city are you leaving from ?
  • U Urbana Champaign FOR MINUTE SINCE HEY
  • S Im still having trouble understanding you. To
    go on, I need you to answer the following
    question. What city are you leaving from?
  • U Chicago CHICAGO
  • S traveling from Chicago. Where would you like
    to go?
  • U Huntsville SEOUL
  • S traveling to Seoul. What day did you need to
    travel?
  • U no no Im traveling to Birmingham THE
    TRAVELING TO BERLIN P_M
  • S traveling in the afternoon. Okay what day
    would you be departing chicago
  • U /uh/ the tenth of august AT THE TENTH OF
    AUGUST
  • S a flight tomorrow I have a Korean Airlines
    flight departing Chicago at 140pm, arrives Seoul
    at 5pm the next day. Do you want to take that?
  • U my destination is Birmingham FLIGHT
    DESTINATION MR WEEKEND
  • S traveling on Saturday, August 12th I have a
    flight departing Chicago at 140pm arrives Seoul
    at

a closer look RL in spoken dialog systems
current challenges RL for error handling
9
six not-so-easy pieces
recognition or semantic confidence scores
typically trivial some exceptions may apply
explicit confirmation Did you say 10am? implicit
confirmation Starting at 10am until what
time? accept, reject
Sorry, I didnt catch that Can you repeat
that? Can you rephrase that? You can say
something like at 10 a.m. MoveOn
Handcrafted heuristics first notify, then ask
repeat, then give help, then give up
a closer look RL in spoken dialog systems
current challenges RL for error handling
10
outline
  • a closer look at the problem
  • RL in spoken dialog systems
  • current challenges
  • a proposed RL approach for error handling

a closer look RL in spoken dialog systems
current challenges RL for error handling
11
spoken dialog system architecture
LanguageUnderstanding
Dialog Manager
Domain Back-end
Language Generation
a closer look RL in spoken dialog systems
current challenges RL for error handling
12
reinforcement learning in dialog systems
  • debate over design choices
  • learn choices using reinforcement learning
  • agent interacting with an environment
  • noisy inputs
  • temporal / sequential aspect
  • task success / failure

LanguageUnderstanding
noisy semantic input
Dialog Manager
Domain Back-end
actions (semantic output)
Language Generation
a closer look RL in spoken dialog systems
current challenges RL for error handling
13
NJFun
  • Optimizing Dialog Management with Reinforcement
    Learning Experiments with the NJFun System
  • Singh, Litman, Kearns, Walker
  • provides information about fun things to do in
    New Jersey
  • slot-filling dialog
  • type-of-activity
  • location
  • time
  • provide information from a database

a closer look RL in spoken dialog systems
current challenges RL for error handling
14
NJFun as an MDP
  • define state-space
  • define action-space
  • define reward structure
  • collect data for training learn policy
  • evaluate learned policy

a closer look RL in spoken dialog systems
current challenges RL for error handling
15
NJFun as an MDP state-space
  • internal system state 14 variables
  • state for RL ? vector of 7 variables
  • greet has the system greeted the user
  • attribute which attribute the system is
    currently querying
  • confidence recognition confidence level (binned)
  • value value has been obtained for current
    attribute
  • tries how many times the current attribute was
    asked
  • grammar non-restrictive or restrictive grammar
    was used
  • history was there any trouble on previous
    attributes
  • 62 different states

a closer look RL in spoken dialog systems
current challenges RL for error handling
16
NJFun as an MDP actions rewards
  • type of initiative (3 types)
  • system initiative
  • mixed initiative
  • user initiative
  • confirmation strategy (2 types)
  • explicit confirmation
  • no confirmation
  • resulting MDP has only 2 action choices / state
  • reward binary task success

a closer look RL in spoken dialog systems
current challenges RL for error handling
17
NJFun as an MDP learning a policy
  • training data 311 complete dialogs
  • collected using exploratory policy
  • learned the policy using value iteration
  • begin with user initiative
  • back-off to mixed or system initiative when
    re-asking for an attribute
  • specific type of back-off is different for
    different attributes
  • confirm when confidence is low

a closer look RL in spoken dialog systems
current challenges RL for error handling
18
NJFun as an MDP evaluation
  • evaluated policy on 124 testing dialogs
  • task success rate 52 ? 64
  • weak task completion 1.72 ? 2.18
  • subjective evaluation no significant
    improvements, but move-to-the-mean effect
  • learned policy better than hand-crafted policies
  • comparatively evaluated policies on learned MDP

a closer look RL in spoken dialog systems
current challenges RL for error handling
19
outline
  • a closer look at the problem
  • RL in spoken dialog systems
  • current challenges
  • a proposed RL approach for error handling

a closer look RL in spoken dialog systems
current challenges RL for error handling
20
challenge 1 scalability
  • contrast NJFun with RoomLine
  • conference room reservation and scheduling
  • mixed-initiative task-oriented interaction
  • system obtains list or rooms matching initial
    constraints
  • system negotiates with user to identify room that
    best matches their needs
  • 37 concepts (slots), 25 questions that can be
    asked
  • another example LARRI
  • full-blown MDP is intractable
  • not clear how to do state-abstraction

a closer look RL in spoken dialog systems
current challenges RL for error handling
21
challenge 2 reusability
  • underlying MDP is system-specific
  • MDP design still requires a lot of human
    expertise
  • new MDP for each system
  • new training new evaluation
  • are we really saving time expertise?
  • maybe were asking for too much?

a closer look RL in spoken dialog systems
current challenges RL for error handling
22
addressing the scalability problem
  • approach 1 user models / simulations
  • costly to obtain real data ? simulate
  • simplistic simulators Eckert, Levin
  • more complex, task-specific simulators Scheffler
    Young
  • real-world evaluation becomes paramount
  • approach 2 value function approximation
  • data-driven state abstraction / state aggregation
    Denecke

a closer look RL in spoken dialog systems
current challenges RL for error handling
23
outline
  • a closer look at the problem
  • RL in spoken dialog systems
  • current challenges
  • a proposed RL approach for error handling

a closer look RL in spoken dialog systems
current challenges RL for error handling
24
reinforcement learning in dialog systems
LanguageUnderstanding
semantic input
Dialog Manager
Domain Back-end
actions / semantic output
Language Generation
  • Focus RL only on the difficult decisions!

a closer look RL in spoken dialog systems
current challenges RL for error handling
25
task-decoupled approach
  • use reinforcement learning
  • decouple
  • use your favorite DM framework
  • advantages
  • reduces the size of the learning problem
  • favors reusability of learned policies
  • lessens system authoring effort

a closer look RL in spoken dialog systems
current challenges RL for error handling
26
RavenClaw
Dialogue Task (Specification)
Domain-Independent Dialogue Engine
a closer look RL in spoken dialog systems
current challenges RL for error handling
27
decision process architecture
RoomLine
Login
Welcome
GreetUser
Gating Mechanism
AskRegistered
AskName
  • Favors reusability of policies
  • Initial policies can be easily handcrafted
  • Small-size models
  • Parameters can be tied across models
  • Accommodate dynamic task generation
  • Independence assumption

a closer look RL in spoken dialog systems
current challenges RL for error handling
28
reward structure learning
Global, post-gate rewards
Local rewards
Reward
Action
Action
Gating Mechanism
Gating Mechanism
Reward
Reward
Reward
MDP
MDP
MDP
MDP
MDP
MDP
  • Rewards based on any dialogue performance metric
  • Atypical, multi-agent reinforcement learning
    setting
  • Multiple, standard RL problems
  • Risk solving local problems, but not the global
    one

a closer look RL in spoken dialog systems
current challenges RL for error handling
29
conclusion
  • reinforcement learning very appealing approach
    for dialog control
  • in practical systems, scalability is a big issue
  • how to leverage knowledge we have?
  • state-space design
  • solutions that account or handle sparse data
  • bounds on policies
  • hierarchical models

30
thankyou!
31
Structure of Individual MDPs
  • Concept MDPs
  • State-space belief indicators
  • Action-space concept scoped system actions

ExplConf
ExplConf
ExplConf
ImplConf
ImplConf
ImplConf
HC
LC
MC
NoAct
NoAct
NoAct
NoAct
0
  • Topic MDPs
  • State-space non-understanding, dialogue-on-track
    indicators
  • Action-space non-understanding actions,
    topic-level actions
Write a Comment
User Comments (0)
About PowerShow.com