Title: A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces
1A Scalable Reinforcement Learning Approach to
Error Handling in Spoken Language Interfaces
- Dan Bohus
- www.cs.cmu.edu/dbohus
- dbohus_at_cs.cmu.edu
- Computer Science Department
- Carnegie Mellon University
- Pittsburgh, PA, 15217
2problem
spoken language interfaces lack robustness when
faced with understanding errors.
3more concretely
- S What city are you leaving from?
- U Urbana Champaign OKAY IN THAT SAME PAY
- S Sorry Im not sure I understood what you said.
What city are you leaving from ? - U Urbana Champaign FOR MINUTE SINCE HEY
- S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from? - U Chicago CHICAGO
- S traveling from Chicago. Where would you like
to go? - U Huntsville SEOUL
- S traveling to Seoul. What day did you need to
travel? - U no no Im traveling to Birmingham THE
TRAVELING TO BERLIN P_M - S traveling in the afternoon. Okay what day
would you be departing chicago - U /uh/ the tenth of august AT THE TENTH OF
AUGUST - S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that? - U my destination is Birmingham FLIGHT
DESTINATION MR WEEKEND - S traveling on Saturday, August 12th I have a
flight departing Chicago at 140pm arrives Seoul
at
4problem source
- stems mostly from speech recognition
- spans most domains and interaction types
- exacerbated by operating conditions
- spontaneous speech
- medium / large vocabularies
- large, varied, and changing user populations
5speech recognition impact
- typical word-error-rates
- 10-20 for natives (novice users)
- 40 and above for non-native users
- significant negative impact on performanceWalker
, Sanders
task success
word-error-rate
6approaches for increasing robustness
- gracefully handle errors through interaction
- detect the problems
- develop a set of recovery strategies
- know how to choose between them (policy)
a closer look RL in spoken dialog systems
current challenges RL for error handling
7outline
- a closer look at the problem
- RL in spoken dialog systems
- current challenges
- a proposed RL approach for error handling
a closer look RL in spoken dialog systems
current challenges RL for error handling
8non- and misunderstandings
- S What city are you leaving from?
- U Urbana Champaign OKAY IN THAT SAME PAY
- S Sorry Im not sure I understood what you said.
What city are you leaving from ? - U Urbana Champaign FOR MINUTE SINCE HEY
- S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from? - U Chicago CHICAGO
- S traveling from Chicago. Where would you like
to go? - U Huntsville SEOUL
- S traveling to Seoul. What day did you need to
travel? - U no no Im traveling to Birmingham THE
TRAVELING TO BERLIN P_M - S traveling in the afternoon. Okay what day
would you be departing chicago - U /uh/ the tenth of august AT THE TENTH OF
AUGUST - S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that? - U my destination is Birmingham FLIGHT
DESTINATION MR WEEKEND - S traveling on Saturday, August 12th I have a
flight departing Chicago at 140pm arrives Seoul
at
a closer look RL in spoken dialog systems
current challenges RL for error handling
9six not-so-easy pieces
recognition or semantic confidence scores
typically trivial some exceptions may apply
explicit confirmation Did you say 10am? implicit
confirmation Starting at 10am until what
time? accept, reject
Sorry, I didnt catch that Can you repeat
that? Can you rephrase that? You can say
something like at 10 a.m. MoveOn
Handcrafted heuristics first notify, then ask
repeat, then give help, then give up
a closer look RL in spoken dialog systems
current challenges RL for error handling
10outline
- a closer look at the problem
- RL in spoken dialog systems
- current challenges
- a proposed RL approach for error handling
a closer look RL in spoken dialog systems
current challenges RL for error handling
11spoken dialog system architecture
LanguageUnderstanding
Dialog Manager
Domain Back-end
Language Generation
a closer look RL in spoken dialog systems
current challenges RL for error handling
12reinforcement learning in dialog systems
- debate over design choices
- learn choices using reinforcement learning
- agent interacting with an environment
- noisy inputs
- temporal / sequential aspect
- task success / failure
LanguageUnderstanding
noisy semantic input
Dialog Manager
Domain Back-end
actions (semantic output)
Language Generation
a closer look RL in spoken dialog systems
current challenges RL for error handling
13NJFun
- Optimizing Dialog Management with Reinforcement
Learning Experiments with the NJFun System - Singh, Litman, Kearns, Walker
- provides information about fun things to do in
New Jersey - slot-filling dialog
- type-of-activity
- location
- time
- provide information from a database
a closer look RL in spoken dialog systems
current challenges RL for error handling
14NJFun as an MDP
- define state-space
- define action-space
- define reward structure
- collect data for training learn policy
- evaluate learned policy
a closer look RL in spoken dialog systems
current challenges RL for error handling
15NJFun as an MDP state-space
- internal system state 14 variables
- state for RL ? vector of 7 variables
- greet has the system greeted the user
- attribute which attribute the system is
currently querying - confidence recognition confidence level (binned)
- value value has been obtained for current
attribute - tries how many times the current attribute was
asked - grammar non-restrictive or restrictive grammar
was used - history was there any trouble on previous
attributes - 62 different states
a closer look RL in spoken dialog systems
current challenges RL for error handling
16NJFun as an MDP actions rewards
- type of initiative (3 types)
- system initiative
- mixed initiative
- user initiative
- confirmation strategy (2 types)
- explicit confirmation
- no confirmation
- resulting MDP has only 2 action choices / state
- reward binary task success
a closer look RL in spoken dialog systems
current challenges RL for error handling
17NJFun as an MDP learning a policy
- training data 311 complete dialogs
- collected using exploratory policy
- learned the policy using value iteration
- begin with user initiative
- back-off to mixed or system initiative when
re-asking for an attribute - specific type of back-off is different for
different attributes - confirm when confidence is low
a closer look RL in spoken dialog systems
current challenges RL for error handling
18NJFun as an MDP evaluation
- evaluated policy on 124 testing dialogs
- task success rate 52 ? 64
- weak task completion 1.72 ? 2.18
- subjective evaluation no significant
improvements, but move-to-the-mean effect - learned policy better than hand-crafted policies
- comparatively evaluated policies on learned MDP
a closer look RL in spoken dialog systems
current challenges RL for error handling
19outline
- a closer look at the problem
- RL in spoken dialog systems
- current challenges
- a proposed RL approach for error handling
a closer look RL in spoken dialog systems
current challenges RL for error handling
20challenge 1 scalability
- contrast NJFun with RoomLine
- conference room reservation and scheduling
- mixed-initiative task-oriented interaction
- system obtains list or rooms matching initial
constraints - system negotiates with user to identify room that
best matches their needs - 37 concepts (slots), 25 questions that can be
asked - another example LARRI
- full-blown MDP is intractable
- not clear how to do state-abstraction
a closer look RL in spoken dialog systems
current challenges RL for error handling
21challenge 2 reusability
- underlying MDP is system-specific
- MDP design still requires a lot of human
expertise - new MDP for each system
- new training new evaluation
- are we really saving time expertise?
- maybe were asking for too much?
a closer look RL in spoken dialog systems
current challenges RL for error handling
22addressing the scalability problem
- approach 1 user models / simulations
- costly to obtain real data ? simulate
- simplistic simulators Eckert, Levin
- more complex, task-specific simulators Scheffler
Young - real-world evaluation becomes paramount
- approach 2 value function approximation
- data-driven state abstraction / state aggregation
Denecke
a closer look RL in spoken dialog systems
current challenges RL for error handling
23outline
- a closer look at the problem
- RL in spoken dialog systems
- current challenges
- a proposed RL approach for error handling
a closer look RL in spoken dialog systems
current challenges RL for error handling
24reinforcement learning in dialog systems
LanguageUnderstanding
semantic input
Dialog Manager
Domain Back-end
actions / semantic output
Language Generation
- Focus RL only on the difficult decisions!
a closer look RL in spoken dialog systems
current challenges RL for error handling
25task-decoupled approach
- use reinforcement learning
- use your favorite DM framework
- advantages
- reduces the size of the learning problem
- favors reusability of learned policies
- lessens system authoring effort
a closer look RL in spoken dialog systems
current challenges RL for error handling
26RavenClaw
Dialogue Task (Specification)
Domain-Independent Dialogue Engine
a closer look RL in spoken dialog systems
current challenges RL for error handling
27decision process architecture
RoomLine
Login
Welcome
GreetUser
Gating Mechanism
AskRegistered
AskName
- Favors reusability of policies
- Initial policies can be easily handcrafted
- Small-size models
- Parameters can be tied across models
- Accommodate dynamic task generation
a closer look RL in spoken dialog systems
current challenges RL for error handling
28reward structure learning
Global, post-gate rewards
Local rewards
Reward
Action
Action
Gating Mechanism
Gating Mechanism
Reward
Reward
Reward
MDP
MDP
MDP
MDP
MDP
MDP
- Rewards based on any dialogue performance metric
- Atypical, multi-agent reinforcement learning
setting
- Multiple, standard RL problems
- Risk solving local problems, but not the global
one
a closer look RL in spoken dialog systems
current challenges RL for error handling
29conclusion
- reinforcement learning very appealing approach
for dialog control - in practical systems, scalability is a big issue
- how to leverage knowledge we have?
- state-space design
- solutions that account or handle sparse data
- bounds on policies
- hierarchical models
30thankyou!
31Structure of Individual MDPs
- Concept MDPs
- State-space belief indicators
- Action-space concept scoped system actions
ExplConf
ExplConf
ExplConf
ImplConf
ImplConf
ImplConf
HC
LC
MC
NoAct
NoAct
NoAct
NoAct
0
- Topic MDPs
- State-space non-understanding, dialogue-on-track
indicators - Action-space non-understanding actions,
topic-level actions