A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces

About This Presentation

Title:

A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces

Description:

Okay what day would you be departing chicago ... I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 32

Provided by: danb7

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces

1
A Scalable Reinforcement Learning Approach to
Error Handling in Spoken Language Interfaces

Dan Bohus
www.cs.cmu.edu/dbohus
dbohus_at_cs.cmu.edu
Computer Science Department
Carnegie Mellon University
Pittsburgh, PA, 15217

2
problem
spoken language interfaces lack robustness when
faced with understanding errors.
3
more concretely

S What city are you leaving from?
U Urbana Champaign OKAY IN THAT SAME PAY
S Sorry Im not sure I understood what you said.
What city are you leaving from ?
U Urbana Champaign FOR MINUTE SINCE HEY
S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from?
U Chicago CHICAGO
S traveling from Chicago. Where would you like
to go?
U Huntsville SEOUL
S traveling to Seoul. What day did you need to
travel?
U no no Im traveling to Birmingham THE
TRAVELING TO BERLIN P_M
S traveling in the afternoon. Okay what day
would you be departing chicago
U /uh/ the tenth of august AT THE TENTH OF
AUGUST
S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that?
U my destination is Birmingham FLIGHT
DESTINATION MR WEEKEND
S traveling on Saturday, August 12th I have a
flight departing Chicago at 140pm arrives Seoul
at

4
problem source

stems mostly from speech recognition
spans most domains and interaction types
exacerbated by operating conditions
spontaneous speech
medium / large vocabularies
large, varied, and changing user populations

5
speech recognition impact

typical word-error-rates
10-20 for natives (novice users)
40 and above for non-native users
significant negative impact on performanceWalker
, Sanders

task success
word-error-rate
6
approaches for increasing robustness

fix recognition

gracefully handle errors through interaction

detect the problems
develop a set of recovery strategies
know how to choose between them (policy)

a closer look RL in spoken dialog systems
current challenges RL for error handling
7
outline

a closer look at the problem
RL in spoken dialog systems
current challenges
a proposed RL approach for error handling

a closer look RL in spoken dialog systems
current challenges RL for error handling
8
non- and misunderstandings

S What city are you leaving from?
U Urbana Champaign OKAY IN THAT SAME PAY
S Sorry Im not sure I understood what you said.
What city are you leaving from ?
U Urbana Champaign FOR MINUTE SINCE HEY
S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from?
U Chicago CHICAGO
S traveling from Chicago. Where would you like
to go?
U Huntsville SEOUL
S traveling to Seoul. What day did you need to
travel?
U no no Im traveling to Birmingham THE
TRAVELING TO BERLIN P_M
S traveling in the afternoon. Okay what day
would you be departing chicago
U /uh/ the tenth of august AT THE TENTH OF
AUGUST
S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that?
U my destination is Birmingham FLIGHT
DESTINATION MR WEEKEND
S traveling on Saturday, August 12th I have a
flight departing Chicago at 140pm arrives Seoul
at

a closer look RL in spoken dialog systems
current challenges RL for error handling
9
six not-so-easy pieces
recognition or semantic confidence scores
typically trivial some exceptions may apply
explicit confirmation Did you say 10am? implicit
confirmation Starting at 10am until what
time? accept, reject
Sorry, I didnt catch that Can you repeat
that? Can you rephrase that? You can say
something like at 10 a.m. MoveOn
Handcrafted heuristics first notify, then ask
repeat, then give help, then give up
a closer look RL in spoken dialog systems
current challenges RL for error handling
10
outline

a closer look at the problem
RL in spoken dialog systems
current challenges
a proposed RL approach for error handling

a closer look RL in spoken dialog systems
current challenges RL for error handling
11
spoken dialog system architecture
LanguageUnderstanding
Dialog Manager
Domain Back-end
Language Generation
a closer look RL in spoken dialog systems
current challenges RL for error handling
12
reinforcement learning in dialog systems

debate over design choices
learn choices using reinforcement learning
agent interacting with an environment
noisy inputs
temporal / sequential aspect
task success / failure

LanguageUnderstanding
noisy semantic input
Dialog Manager
Domain Back-end
actions (semantic output)
Language Generation
a closer look RL in spoken dialog systems
current challenges RL for error handling
13
NJFun

Optimizing Dialog Management with Reinforcement
Learning Experiments with the NJFun System
Singh, Litman, Kearns, Walker
provides information about fun things to do in
New Jersey
slot-filling dialog
type-of-activity
location
time
provide information from a database

a closer look RL in spoken dialog systems
current challenges RL for error handling
14
NJFun as an MDP

define state-space
define action-space
define reward structure
collect data for training learn policy
evaluate learned policy

a closer look RL in spoken dialog systems
current challenges RL for error handling
15
NJFun as an MDP state-space

internal system state 14 variables
state for RL ? vector of 7 variables
greet has the system greeted the user
attribute which attribute the system is
currently querying
confidence recognition confidence level (binned)
value value has been obtained for current
attribute
tries how many times the current attribute was
asked
grammar non-restrictive or restrictive grammar
was used
history was there any trouble on previous
attributes
62 different states

a closer look RL in spoken dialog systems
current challenges RL for error handling
16
NJFun as an MDP actions rewards

type of initiative (3 types)
system initiative
mixed initiative
user initiative
confirmation strategy (2 types)
explicit confirmation
no confirmation
resulting MDP has only 2 action choices / state
reward binary task success

a closer look RL in spoken dialog systems
current challenges RL for error handling
17
NJFun as an MDP learning a policy

training data 311 complete dialogs
collected using exploratory policy
learned the policy using value iteration
begin with user initiative
back-off to mixed or system initiative when
re-asking for an attribute
specific type of back-off is different for
different attributes
confirm when confidence is low

a closer look RL in spoken dialog systems
current challenges RL for error handling
18
NJFun as an MDP evaluation

evaluated policy on 124 testing dialogs
task success rate 52 ? 64
weak task completion 1.72 ? 2.18
subjective evaluation no significant
improvements, but move-to-the-mean effect
learned policy better than hand-crafted policies
comparatively evaluated policies on learned MDP

a closer look RL in spoken dialog systems
current challenges RL for error handling
19
outline

a closer look at the problem
RL in spoken dialog systems
current challenges
a proposed RL approach for error handling

a closer look RL in spoken dialog systems
current challenges RL for error handling
20
challenge 1 scalability

contrast NJFun with RoomLine
conference room reservation and scheduling
mixed-initiative task-oriented interaction
system obtains list or rooms matching initial
constraints
system negotiates with user to identify room that
best matches their needs
37 concepts (slots), 25 questions that can be
asked
another example LARRI
full-blown MDP is intractable
not clear how to do state-abstraction

a closer look RL in spoken dialog systems
current challenges RL for error handling
21
challenge 2 reusability

underlying MDP is system-specific
MDP design still requires a lot of human
expertise
new MDP for each system
new training new evaluation
are we really saving time expertise?
maybe were asking for too much?

a closer look RL in spoken dialog systems
current challenges RL for error handling
22
addressing the scalability problem

approach 1 user models / simulations
costly to obtain real data ? simulate
simplistic simulators Eckert, Levin
more complex, task-specific simulators Scheffler
Young
real-world evaluation becomes paramount
approach 2 value function approximation
data-driven state abstraction / state aggregation
Denecke

a closer look RL in spoken dialog systems
current challenges RL for error handling
23
outline

a closer look at the problem
RL in spoken dialog systems
current challenges
a proposed RL approach for error handling

a closer look RL in spoken dialog systems
current challenges RL for error handling
24
reinforcement learning in dialog systems
LanguageUnderstanding
semantic input
Dialog Manager
Domain Back-end
actions / semantic output
Language Generation

Focus RL only on the difficult decisions!

a closer look RL in spoken dialog systems
current challenges RL for error handling
25
task-decoupled approach

use reinforcement learning

decouple

use your favorite DM framework

advantages
reduces the size of the learning problem
favors reusability of learned policies
lessens system authoring effort

a closer look RL in spoken dialog systems
current challenges RL for error handling
26
RavenClaw
Dialogue Task (Specification)
Domain-Independent Dialogue Engine
a closer look RL in spoken dialog systems
current challenges RL for error handling
27
decision process architecture
RoomLine
Login
Welcome
GreetUser
Gating Mechanism
AskRegistered
AskName

Favors reusability of policies
Initial policies can be easily handcrafted

Small-size models
Parameters can be tied across models
Accommodate dynamic task generation

Independence assumption

a closer look RL in spoken dialog systems
current challenges RL for error handling
28
reward structure learning
Global, post-gate rewards
Local rewards
Reward
Action
Action
Gating Mechanism
Gating Mechanism
Reward
Reward
Reward
MDP
MDP
MDP
MDP
MDP
MDP

Rewards based on any dialogue performance metric
Atypical, multi-agent reinforcement learning
setting

Multiple, standard RL problems
Risk solving local problems, but not the global
one

a closer look RL in spoken dialog systems
current challenges RL for error handling
29
conclusion

reinforcement learning very appealing approach
for dialog control
in practical systems, scalability is a big issue
how to leverage knowledge we have?
state-space design
solutions that account or handle sparse data
bounds on policies
hierarchical models

30
thankyou!
31
Structure of Individual MDPs

Concept MDPs
State-space belief indicators
Action-space concept scoped system actions

ExplConf
ExplConf
ExplConf
ImplConf
ImplConf
ImplConf
HC
LC
MC
NoAct
NoAct
NoAct
NoAct
0

Topic MDPs
State-space non-understanding, dialogue-on-track
indicators
Action-space non-understanding actions,
topic-level actions

Write a Comment

User Comments (0)

About PowerShow.com

A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces - PowerPoint PPT Presentation

A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces

Okay what day would you be departing chicago ... I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. ... – PowerPoint PPT presentation