Exploring adaptation in humancomputer dialogs

About This Presentation

Title:

Exploring adaptation in humancomputer dialogs

Description:

User rephrases category 'Turkish restaurant' - 'restaurant' ... Domain: air travel, travel assistance. Contains ~3000 dialogs with 9 systems ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 67

Provided by: svetlanast

Category:

more less

Transcript and Presenter's Notes

Title: Exploring adaptation in humancomputer dialogs

1
Exploring adaptation in human-computer dialogs

Svetlana Stoyanchev
Advisor Amanda Stent
Thesis Proposal
February 8, 2008

2
Outline

Dialog examples
Responsive and directive adaptation in dialog
systems
Analysis of system errors
Summary of experiments
My system building work
Proposed experiments
Schedule

3
Have you tried the new free 411 voice interfaces?

System is too quick to make a selection
Considering a number of matching choices, the
system should ask a user to narrow down

User rephrases category Turkish restaurant -gt
restaurant
System looses an important information about the
category

4
An example from Switchboard human-human corpus

A but uh see that our whole system is built on
on owing and borrowing
B that's just
A uh true but uh uh uh without it people
wouldn't be able to own automobiles or they
wouldn't be able to own a house
B i don't have a master charge thank you
laughter
A uh but i still go right back to what i said
when is the last time you had uh fifteen thousand
dollars all at one time to go out and buy an
automobile
B right but that's the problem see as our system
shouldn't be based on owing and borrowing and all
that

Psycholinguistic Research on Adaptation Garrod
Anderson 1987. Production and comprehension in
dialog become tightly coupled Brennan Clark
1996 While there is great variability across
conversations, there is less variability within
Brennan 1998 finds that speakers form
conceptual pacts with particular addressees by
using consistent terminology Branigan 2004 finds
effects of priming in syntactic structure in
dialog utterances
Owing and borrowing getting in debt Borrowing
and owing Etc.
5
Example communication with a non-adaptive dialog
system
6
Outline

Dialog examples
Responsive and directive adaptation in dialog
systems
Analysis of system errors
Summary of experiments
My system building work
Proposed experiments
Schedule

7
(No Transcript)
8
Examples of responsive adaptation
If a user hyper-articulates, ASR switches to an
acoustic model trained on hyper-articulated
data (Soltau and Waibel 2000) Adapting to a
particular users accent (Humphries and
Woodland 97) Adapting language model to a state
of a system (most system)
9
Example communication with a Lets Go dialog
system
10
Examples of responsive adaptation
S Which topic is the most important to you? U
instructor S Was the instructor good, average,
or excellent? Here the system could have used
teacher or lecturer
In a grammar-based NLU, adjust rule probabilities
based on the statistics from users past
utterances
Adapting syntax of an utterance based on a
users preference (Walker, Stent 2004)
Adapting to users knowledge level, whether user
is in a hurry (Komatani, 2005)
11
Directive adaptation
System guides a user into using grammar and
vocabulary that is best understood by the system.

Example systems confirmation prompt Traveling
from A to B, is this correct? Possible user
responses to correct the error Adaptive No,
traveling from A to C No, from A to
C Non-adaptive No, I will fly from A to
C Arriving at C
12
How to evaluate impact of adaptation

Speech Recognition Performance
Dialog length ( for some task-oriented dialogs)
User satisfaction subjective user survey
How quickly user recovers from errors in a dialog

13
System errors

In my experiments I look at places of system
errors because
In my directive adaptation experiments it gives
me an opportunity
To prime a user in a rejection prompt and to
detect reaction in response to varying prompts.
To see the difference between users utterance
before and after the correction prompt.
In my responsive adaptation experiment, the goal
is to minimize the time to recover from errors
in dialog

14
Analysis of system errors

Description of errors
How do we find them
How long do they go on
What are their causes
Response to errors
System
User

15
Analysis of system errors in Communicator corpus

Domain air travel, travel assistance
Contains 3000 dialogs with 9 systems
Each user calls one system at least 5 times
System utterances are labeled
One type of error is a system non-understanding.
It is marked by slu_reject
Total number of systems rejections 4118

16
Communicator annotations
17
Example from Communicator corpus
18
Analysis of system rejections

Prob(reject next reject) 39

19
Hypothesizing about causes of rejections

Errors due to out-of-grammar utterances

User attempts to take the initiative by asking a
question
User initiates a correction

20
Hypothesizing about causes of rejections

Errors due to out-of-grammar utterances

User attempts to take the initiative by asking a
question
User initiates a correction

21
Systems actions on non-understandings (dialog
acts)
22
Change in systems dialog act after first reject

Example of OMIT
Before
ltimp conf, depart-arrive dategt,
ltreq info, depart-arrive timegt
After
ltreq info, depart-arrive timegt
Example of Change
Before
ltExp-conf depart-arrive timegt
After
ltexp-conf orig-citygt

23
Change in systems dialog act after first reject

What do users do when encounter a system error?

24
Partner model
Users build a partner model of a system a
users perception about the systems knowledge
and capabilities
Great! I can say anything!
S Hello, how can I help you? U I want to fly
from New York to London S I am sorry I did not
understand, where are you leaving from? U
Leaving from New York
Maybe I have to specify the cities separately
S What time would you like to leave U at ten
oclock S I am sorry, I did not understand,
please specify the time of your departure U ten
a m
It did not understand 10 oclock. I should try
to rephrase
25
Examples of users paraphrases

Observation on rejections users rephrase trying
to guess what system recognizes
Study users behavior on the level of single
concept
how users vary their choices of the form of
concepts
do prompts affect users choices

26
Motivation
Better recognition
Ideal
Poor recognition
Limited interaction
Natural interaction
27
Motivation
Better recognition
Ideal
Free speech (Lets Go)
Poor recognition
Limited interaction
Natural interaction
28
Motivation
Better recognition
Ideal
Speech Graffiti (S. Tomko)
Users can use a limited set of keywords and
concepts
Free speech (Lets Go)
Poor recognition
Limited interaction
Natural interaction
29
Motivation
Better recognition
Ideal
Speech Graffiti (S. Tomko)
Users can use a limited set of keywords and
concepts
Using Adaptation
Free speech (Lets Go)
Poor recognition
Limited interaction
Natural interaction
30
Computer science studies on adaptation

Church 2000. introduced a method for measuring
lexical adaptation in text.
A. Dubey, P. Sturt, and F. Keller 2006. Use
Churchs measures and detect both between and
within a speaker
Reitter, F. Keller, and J. Moore 2006.
Computational modeling of structural priming in
dialogue. Show rapid degradation of priming
effect in a dialog over time.
E. Reitter, and J. Moore 2007 show that lexical
adaptation positively correlates with task
success in human-human task-oriented Maptask
corpus
S. Stenchikova, A. Stent 2007 create new
technique for measuring adaptation between
dialogs, compare partner-specific and recency
adaptation.

31
Summary of experiments
32
Outline

Dialog examples
Responsive and directive adaptation in dialog
systems
Analysis of system errors
Summary of experiments
My system building work
Proposed experiments
Schedule

33
RavenCalendar

Built at Stony Brook
Provides voice interface for manipulating a
calendar
Built using distributed Olympus architecture
Ability to replace components
Application is suited for a long-term users

RavenCalendar A Multimodal Dialog System for
Managing a Personal Calendar S. Stenchikova, B.
Mucha, S. Hoffman, A. Stent NAACL HLT
Demonstration Program, pages 15-16, Rochester,
New York, USA, April 2007
34
Rate-a-Course

Survey system for evaluating courses
Built at Stony Brook
Uses Voice XML
Ran in-lab experiments on 48 subjects

Dialog Systems for Surveys the Rate-a-Course
System A. Stent, S. Stenchikova, and M.Marge
Proceedings of the 1st IEEE/ACL Workshop on
Spoken Language Technology. SLT 2006.
35
Lets Go System

Provides local bus information to people
Developed and deployed at CMU
Has a constant pool of real users
We received permission to run the proposed
experiments on the system

36
Question Answering

Retrieves answers to natural language questions
Experiments are performed with speech interface
to the system

Name-Aware Speech Recognition for Interactive
Question Answering S. Stenchikova, D.
Hakkani-Tur, and G. Tur ICASSP 2008
QASR Question Answering Using Semantic Roles for
Speech Interface S. Stenchikova, D. Hakkani-Tur,
and G. Tur Proceedings of ICSLP-Interspeech 2006,
Pittsburgh, PA
37
Outline

Dialog Examples
Responsive and directive adaptation in dialog
systems
Analysis of system errors
Summary of experiments
My system building work
Proposed experiments
Schedule

38
Summary of experiments
39
Proposed experiment

Match Natural Language Understanding Natural
Language Generation for the form of time concept
Time concept appears in majority of systems, has
multiple realizations
Explore non-understanding prompt strategies and
the power of directive prompts.

40
Experimental questions

How do users form models of the system?
Can prompts be helpful in assisting users to
build a user model matching reality?
Explore effect of variation in the form of
concepts in systems non-understanding prompt

41
Prompt variation experiment

System grammar is limited to understand a
particular format X oclock
Method
See whether users next utterance will use the
systems format
How long will it take the user to say guess the
recognized format

42
Experiment

Variable1 systems utterance at
non-understanding/explicit confirmation
Generic I did not understand, please repeat
Specific Did you say ltanother time in format Ygt
?
Variable 2 systems ASR grammar for time
Specific
A. Hour pm
B. Hour oclock
C. Hour
Flexible
D. All of the above
Dependent variable users grammar.
Measure whether users grammar matches systems
prompt
How long will it take for the user to figure out
ASRs grammar

43
Systems experiment 1
X is the default or users preferred form
either original users utterance, or general most
frequent Y and Z are different from X Open
questions How to choose X Y and Z
Optional
44
Follow up experiment
Vary prompts on the system to check if there is
an effect on the users (Maybe try more
variations) Hypothesis Even if NLU is not
biased, users follow systems prompt
45
Complication

System may need to say a time in a generic prompt
condition when doing implicit confirmation or
presenting result
Solution throw away dialogs where confirmation
happens and the time concept is specified
If using a real system for an experiment system
may really misunderstand the user, even if they
speak using the format understood by the system,
then it may confuse the user
The number of unforced errors will have to be
taken into account when analyzing the data.

46
Systems
Option 1

I plan to perform this experiment on Lets Go
unless NLU limitation impair the system
performance significantly
Pros short conversations, no overhead for more
users (can throw away dialogs where confirmation
happens and concept is specified)
Model additional conditions in the same
experiment
Specifying date
Relative day (today tomorrow )
Weekday
December first
First of december
December one
Variation in verb
Add/new/create
Remove/delete
Change/modify

Option 2 (back-up)
47
Implications of the study

I limit the scope of the test for this
experiment, but I hope it generalizes. Using
directive prompts can be a powerful tool for
guiding users into using particular syntax and
lexicon.
If I can reliably predict the form of a concept,
it may be possible to do concept spotting in ASR.
Concept spotting refers to identifying concepts
instead of full utterances in ASR stage.
If I find that prompts affect users form of
concept, this will support an argument for using
a shared grammar in NLG and NLU.

48
Summary of experiments
49
Proposed experiment on responsive adaptation in
ASR
Method two-pass ASR by classifying whether a
users utterance is a correction
50
Motivation

Examples of correction utterances from
COMMUNICATOR where a system requests a date

S Leaving Chicago on what date?
S OK, from Zurich to Denver. What date will you
be traveling?
51
Related work

Characterizing and Predicting Corrections in
Spoken Dialogue Systems D. Litman, and
J.Hirschberg, and M. Swerts Computational
Linguistics 2006
Describes Machine Learning Experiment on
Predicting Corrections
Features
Prosodic features (frequency, duration, etc.)
ASR features (Recognized string, Grammar used)
System Features (confirmation strategy,
initiative)
History features (features of turn-1 and turn-2,
prior mentions of keywords like cancel, lengths
of prior turns)
Result Best feature set Prosodic ASR SYS
POS History (previous turn) error rate 15.72
(F-measure .72 .89 )

52
Method training stage

Data for ASR training transcribed and annotated
users utterances from past communication.
Hypothesis Splitting users utterances into
correction and non-correction may benefit ASR
performance
Classify users utterances from training data as
correction or non-correction
Use unsupervised clustering using lexical and
history features
Use unsupervised clustering using prosodic
features
Use rules learned by Litman et. al. (may need to
be adjusted)
Train 2 models for each dialog systems state
one on utterances classified as corrections and
one on the rest of utterances

53
Method runtime

Run 2 ASRs models per dialog state in parallel
1) Language Model trained on non-corrections
2) Language Model trained on corrections
Predict probability of an utterance being a
correction
When probability of correction is gt threshold use
ASR output from model 2, else from model 1
Evaluate ASR performance

54
Summary of experiments
55
Schedule
56
List of publications

Published work
Name-Aware Speech Recognition for Interactive
Question Answering S. Stenchikova, D.
Hakkani-Tur, and G. Tur ICASSP 2008
Measuring Adaptation Between Dialogs S.
Stenchikova, A. Stent SIGDIAL, 2007
Dialog Systems for Surveys the Rate-a-Course
System A. Stent, S. Stenchikova, and M.Marge
Proceedings of the 1st IEEE/ACL Workshop on
Spoken Language Technology. SLT 2006.
QASR Question Answering Using Semantic Roles for
Speech Interface S. Stenchikova, D. Hakkani-Tur,
and G. Tur Proceedings of ICSLP-Interspeech 2006,
Pittsburgh, PA
Demo Papers
RavenCalendar A Multimodal Dialog System for
Managing a Personal Calendar S. Stenchikova, B.
Mucha, S. Hoffman, A. Stent NAACL HLT
Demonstration Program, pages 15-16, Rochester,
New York, USA, April 2007
QASR Spoken Question Answering Using Semantic
Role Labeling. S. Stenchikova, D. Hakkani-Tur, G.
Tur ASRU-2005, 9th biannual IEEE workshop on
Automatic Speech Recognition and Understanding,
Cancun, Mexico, December, 2005
Planned publications
Exploring Directive Adaptation in Spoken Dialog
Systems (Submitted to Student Workshop at ACL
2008)
Analysis of rejections in Communicator (to submit
for SIGDIAL 2008, March 14)
Adaptation in the Rate-a-Course system experiment
(to submit a short paper to HLT 2008, March 14)
Responsive adaptation ASR experiments on Lets Go
Corpus (to submit to GOTAL, April 4)
Directive adaptation experiment (Fall 2008)

Thank you
Questions? Comments

58
Definitions

Adaptation the process of bringing one thing
into correspondence with another implies a
modification according to changing circumstances
Merriam-Webster
Convergence when dialog participants change
their language use to be more similar to each
other over time
Priming A process that influences linguistic
decision-making. An instance of priming occurs
when a syntactic structure or lexical item giving
evidence of a linguistic choice (prime)
influences the recipient to make the same
decision, i.e. re-use the structure, at a later
choice-point (Reitter)
I differentiate between Directive and Responsive
adaptation

Activating parts of particular representations
or associations in memory just before carrying
out an action or task. It is considered to be
one of the manifestations of implicit memory. A
property of priming is that the remembered item
is remembered best in the form in which it was
originally encountered.' (Wikipedia)

59
Proposed corpus study

What are users strategies in rephrasing
Try to detect a pattern
Does the form of concept used in prompts affect
users choices?
Hypotheses some forms on concept are recognized
correctly more frequently than others
Hypotheses through trial and error, user finds
optimal (most frequently recognized) phrasing.

60
Variation in date-time confirmations in
Communicator
61
Rate-a-course experiment
Aspects of the courses
Initiative condition System/mixed/user Adaptive
Condition Adaptive / non-adaptive
62
Rate-a-course experiment
Aspects of the courses
Initiative condition System/mixed/user Adaptive
Condition Adaptive / non-adaptive
63
Rate-a-course experiment comparing adaptation
Compare using inference on proportions System
Adaptive vs. System Non-adaptive
No significant difference
System initiative vs. User initiative vs. Mixed
initiative samples are too small
64
Rate-a-course experiments detecting adaptation
Hypothesis users are more likely to follow
systems lexical choice
Problem unbalanced data. Can not make conclusion
65
QA ASR experi-ment
66
Evaluation
Evaluation 3 speakers read 40 questions Set 3
40 questions with a named entity Set 4 40
questions without a named entity
67
Evaluation Result
Cheating model (contains questions)
68
The End!

Write a Comment

User Comments (0)