Title: Exploring adaptation in humancomputer dialogs
1Exploring adaptation in human-computer dialogs
- Svetlana Stoyanchev
- Advisor Amanda Stent
- Thesis Proposal
- February 8, 2008
2Outline
- Dialog examples
- Responsive and directive adaptation in dialog
systems - Analysis of system errors
- Summary of experiments
- My system building work
- Proposed experiments
- Schedule
3Have you tried the new free 411 voice interfaces?
- System is too quick to make a selection
- Considering a number of matching choices, the
system should ask a user to narrow down
- User rephrases category Turkish restaurant -gt
restaurant - System looses an important information about the
category
4An example from Switchboard human-human corpus
- A but uh see that our whole system is built on
on owing and borrowing - B that's just
- A uh true but uh uh uh without it people
wouldn't be able to own automobiles or they
wouldn't be able to own a house - B i don't have a master charge thank you
laughter - A uh but i still go right back to what i said
when is the last time you had uh fifteen thousand
dollars all at one time to go out and buy an
automobile - B right but that's the problem see as our system
shouldn't be based on owing and borrowing and all
that
Psycholinguistic Research on Adaptation Garrod
Anderson 1987. Production and comprehension in
dialog become tightly coupled Brennan Clark
1996 While there is great variability across
conversations, there is less variability within
Brennan 1998 finds that speakers form
conceptual pacts with particular addressees by
using consistent terminology Branigan 2004 finds
effects of priming in syntactic structure in
dialog utterances
Owing and borrowing getting in debt Borrowing
and owing Etc.
5Example communication with a non-adaptive dialog
system
6Outline
- Dialog examples
- Responsive and directive adaptation in dialog
systems - Analysis of system errors
- Summary of experiments
- My system building work
- Proposed experiments
- Schedule
7(No Transcript)
8Examples of responsive adaptation
If a user hyper-articulates, ASR switches to an
acoustic model trained on hyper-articulated
data (Soltau and Waibel 2000) Adapting to a
particular users accent (Humphries and
Woodland 97) Adapting language model to a state
of a system (most system)
9Example communication with a Lets Go dialog
system
10Examples of responsive adaptation
S Which topic is the most important to you? U
instructor S Was the instructor good, average,
or excellent? Here the system could have used
teacher or lecturer
In a grammar-based NLU, adjust rule probabilities
based on the statistics from users past
utterances
Adapting syntax of an utterance based on a
users preference (Walker, Stent 2004)
Adapting to users knowledge level, whether user
is in a hurry (Komatani, 2005)
11Directive adaptation
System guides a user into using grammar and
vocabulary that is best understood by the system.
Example systems confirmation prompt Traveling
from A to B, is this correct? Possible user
responses to correct the error Adaptive No,
traveling from A to C No, from A to
C Non-adaptive No, I will fly from A to
C Arriving at C
12How to evaluate impact of adaptation
- Speech Recognition Performance
- Dialog length ( for some task-oriented dialogs)
- User satisfaction subjective user survey
- How quickly user recovers from errors in a dialog
13System errors
- In my experiments I look at places of system
errors because - In my directive adaptation experiments it gives
me an opportunity - To prime a user in a rejection prompt and to
detect reaction in response to varying prompts. - To see the difference between users utterance
before and after the correction prompt. - In my responsive adaptation experiment, the goal
is to minimize the time to recover from errors
in dialog
14Analysis of system errors
- Description of errors
- How do we find them
- How long do they go on
- What are their causes
- Response to errors
- System
- User
15Analysis of system errors in Communicator corpus
- Domain air travel, travel assistance
- Contains 3000 dialogs with 9 systems
- Each user calls one system at least 5 times
- System utterances are labeled
- One type of error is a system non-understanding.
It is marked by slu_reject - Total number of systems rejections 4118
16Communicator annotations
17Example from Communicator corpus
18Analysis of system rejections
- Prob(reject next reject) 39
19Hypothesizing about causes of rejections
- Errors due to out-of-grammar utterances
- User attempts to take the initiative by asking a
question - User initiates a correction
20Hypothesizing about causes of rejections
- Errors due to out-of-grammar utterances
- User attempts to take the initiative by asking a
question - User initiates a correction
21Systems actions on non-understandings (dialog
acts)
22Change in systems dialog act after first reject
- Example of OMIT
- Before
- ltimp conf, depart-arrive dategt,
- ltreq info, depart-arrive timegt
- After
- ltreq info, depart-arrive timegt
- Example of Change
- Before
- ltExp-conf depart-arrive timegt
- After
- ltexp-conf orig-citygt
23Change in systems dialog act after first reject
- What do users do when encounter a system error?
24Partner model
Users build a partner model of a system a
users perception about the systems knowledge
and capabilities
Great! I can say anything!
S Hello, how can I help you? U I want to fly
from New York to London S I am sorry I did not
understand, where are you leaving from? U
Leaving from New York
Maybe I have to specify the cities separately
S What time would you like to leave U at ten
oclock S I am sorry, I did not understand,
please specify the time of your departure U ten
a m
It did not understand 10 oclock. I should try
to rephrase
25Examples of users paraphrases
- Observation on rejections users rephrase trying
to guess what system recognizes - Study users behavior on the level of single
concept - how users vary their choices of the form of
concepts - do prompts affect users choices
26Motivation
Better recognition
Ideal
Poor recognition
Limited interaction
Natural interaction
27Motivation
Better recognition
Ideal
Free speech (Lets Go)
Poor recognition
Limited interaction
Natural interaction
28Motivation
Better recognition
Ideal
Speech Graffiti (S. Tomko)
Users can use a limited set of keywords and
concepts
Free speech (Lets Go)
Poor recognition
Limited interaction
Natural interaction
29Motivation
Better recognition
Ideal
Speech Graffiti (S. Tomko)
Users can use a limited set of keywords and
concepts
Using Adaptation
Free speech (Lets Go)
Poor recognition
Limited interaction
Natural interaction
30Computer science studies on adaptation
- Church 2000. introduced a method for measuring
lexical adaptation in text. - A. Dubey, P. Sturt, and F. Keller 2006. Use
Churchs measures and detect both between and
within a speaker - Reitter, F. Keller, and J. Moore 2006.
Computational modeling of structural priming in
dialogue. Show rapid degradation of priming
effect in a dialog over time. - E. Reitter, and J. Moore 2007 show that lexical
adaptation positively correlates with task
success in human-human task-oriented Maptask
corpus - S. Stenchikova, A. Stent 2007 create new
technique for measuring adaptation between
dialogs, compare partner-specific and recency
adaptation.
31Summary of experiments
32Outline
- Dialog examples
- Responsive and directive adaptation in dialog
systems - Analysis of system errors
- Summary of experiments
- My system building work
- Proposed experiments
- Schedule
33RavenCalendar
- Built at Stony Brook
- Provides voice interface for manipulating a
calendar - Built using distributed Olympus architecture
- Ability to replace components
- Application is suited for a long-term users
RavenCalendar A Multimodal Dialog System for
Managing a Personal Calendar S. Stenchikova, B.
Mucha, S. Hoffman, A. Stent NAACL HLT
Demonstration Program, pages 15-16, Rochester,
New York, USA, April 2007
34Rate-a-Course
- Survey system for evaluating courses
- Built at Stony Brook
- Uses Voice XML
- Ran in-lab experiments on 48 subjects
Dialog Systems for Surveys the Rate-a-Course
System A. Stent, S. Stenchikova, and M.Marge
Proceedings of the 1st IEEE/ACL Workshop on
Spoken Language Technology. SLT 2006.
35Lets Go System
- Provides local bus information to people
- Developed and deployed at CMU
- Has a constant pool of real users
- We received permission to run the proposed
experiments on the system
36Question Answering
- Retrieves answers to natural language questions
- Experiments are performed with speech interface
to the system
Name-Aware Speech Recognition for Interactive
Question Answering S. Stenchikova, D.
Hakkani-Tur, and G. Tur ICASSP 2008
QASR Question Answering Using Semantic Roles for
Speech Interface S. Stenchikova, D. Hakkani-Tur,
and G. Tur Proceedings of ICSLP-Interspeech 2006,
Pittsburgh, PA
37Outline
- Dialog Examples
- Responsive and directive adaptation in dialog
systems - Analysis of system errors
- Summary of experiments
- My system building work
- Proposed experiments
- Schedule
38Summary of experiments
39Proposed experiment
- Match Natural Language Understanding Natural
Language Generation for the form of time concept - Time concept appears in majority of systems, has
multiple realizations - Explore non-understanding prompt strategies and
the power of directive prompts.
40Experimental questions
- How do users form models of the system?
- Can prompts be helpful in assisting users to
build a user model matching reality? - Explore effect of variation in the form of
concepts in systems non-understanding prompt
41Prompt variation experiment
- System grammar is limited to understand a
particular format X oclock - Method
- See whether users next utterance will use the
systems format - How long will it take the user to say guess the
recognized format
42Experiment
- Variable1 systems utterance at
non-understanding/explicit confirmation - Generic I did not understand, please repeat
- Specific Did you say ltanother time in format Ygt
? - Variable 2 systems ASR grammar for time
- Specific
- A. Hour pm
- B. Hour oclock
- C. Hour
- Flexible
- D. All of the above
- Dependent variable users grammar.
- Measure whether users grammar matches systems
prompt - How long will it take for the user to figure out
ASRs grammar
43Systems experiment 1
X is the default or users preferred form
either original users utterance, or general most
frequent Y and Z are different from X Open
questions How to choose X Y and Z
Optional
44Follow up experiment
Vary prompts on the system to check if there is
an effect on the users (Maybe try more
variations) Hypothesis Even if NLU is not
biased, users follow systems prompt
45Complication
- System may need to say a time in a generic prompt
condition when doing implicit confirmation or
presenting result - Solution throw away dialogs where confirmation
happens and the time concept is specified - If using a real system for an experiment system
may really misunderstand the user, even if they
speak using the format understood by the system,
then it may confuse the user - The number of unforced errors will have to be
taken into account when analyzing the data.
46Systems
Option 1
- I plan to perform this experiment on Lets Go
unless NLU limitation impair the system
performance significantly - Pros short conversations, no overhead for more
users (can throw away dialogs where confirmation
happens and concept is specified) - Model additional conditions in the same
experiment - Specifying date
- Relative day (today tomorrow )
- Weekday
- December first
- First of december
- December one
- Variation in verb
- Add/new/create
- Remove/delete
- Change/modify
Option 2 (back-up)
47Implications of the study
- I limit the scope of the test for this
experiment, but I hope it generalizes. Using
directive prompts can be a powerful tool for
guiding users into using particular syntax and
lexicon. - If I can reliably predict the form of a concept,
it may be possible to do concept spotting in ASR. - Concept spotting refers to identifying concepts
instead of full utterances in ASR stage. - If I find that prompts affect users form of
concept, this will support an argument for using
a shared grammar in NLG and NLU.
48Summary of experiments
49Proposed experiment on responsive adaptation in
ASR
Method two-pass ASR by classifying whether a
users utterance is a correction
50Motivation
- Examples of correction utterances from
COMMUNICATOR where a system requests a date
S Leaving Chicago on what date?
S OK, from Zurich to Denver. What date will you
be traveling?
51 Related work
- Characterizing and Predicting Corrections in
Spoken Dialogue Systems D. Litman, and
J.Hirschberg, and M. Swerts Computational
Linguistics 2006 - Describes Machine Learning Experiment on
Predicting Corrections - Features
- Prosodic features (frequency, duration, etc.)
- ASR features (Recognized string, Grammar used)
- System Features (confirmation strategy,
initiative) - History features (features of turn-1 and turn-2,
prior mentions of keywords like cancel, lengths
of prior turns) - Result Best feature set Prosodic ASR SYS
POS History (previous turn) error rate 15.72
(F-measure .72 .89 )
52Method training stage
- Data for ASR training transcribed and annotated
users utterances from past communication. - Hypothesis Splitting users utterances into
correction and non-correction may benefit ASR
performance - Classify users utterances from training data as
correction or non-correction - Use unsupervised clustering using lexical and
history features - Use unsupervised clustering using prosodic
features - Use rules learned by Litman et. al. (may need to
be adjusted) - Train 2 models for each dialog systems state
one on utterances classified as corrections and
one on the rest of utterances
53Method runtime
- Run 2 ASRs models per dialog state in parallel
- 1) Language Model trained on non-corrections
- 2) Language Model trained on corrections
- Predict probability of an utterance being a
correction - When probability of correction is gt threshold use
ASR output from model 2, else from model 1 - Evaluate ASR performance
54Summary of experiments
55Schedule
56List of publications
- Published work
- Name-Aware Speech Recognition for Interactive
Question Answering S. Stenchikova, D.
Hakkani-Tur, and G. Tur ICASSP 2008 - Measuring Adaptation Between Dialogs S.
Stenchikova, A. Stent SIGDIAL, 2007 - Dialog Systems for Surveys the Rate-a-Course
System A. Stent, S. Stenchikova, and M.Marge
Proceedings of the 1st IEEE/ACL Workshop on
Spoken Language Technology. SLT 2006. - QASR Question Answering Using Semantic Roles for
Speech Interface S. Stenchikova, D. Hakkani-Tur,
and G. Tur Proceedings of ICSLP-Interspeech 2006,
Pittsburgh, PA - Demo Papers
- RavenCalendar A Multimodal Dialog System for
Managing a Personal Calendar S. Stenchikova, B.
Mucha, S. Hoffman, A. Stent NAACL HLT
Demonstration Program, pages 15-16, Rochester,
New York, USA, April 2007 - QASR Spoken Question Answering Using Semantic
Role Labeling. S. Stenchikova, D. Hakkani-Tur, G.
Tur ASRU-2005, 9th biannual IEEE workshop on
Automatic Speech Recognition and Understanding,
Cancun, Mexico, December, 2005 - Planned publications
- Exploring Directive Adaptation in Spoken Dialog
Systems (Submitted to Student Workshop at ACL
2008) - Analysis of rejections in Communicator (to submit
for SIGDIAL 2008, March 14) - Adaptation in the Rate-a-Course system experiment
(to submit a short paper to HLT 2008, March 14) - Responsive adaptation ASR experiments on Lets Go
Corpus (to submit to GOTAL, April 4) - Directive adaptation experiment (Fall 2008)
57- Thank you
- Questions? Comments
58Definitions
- Adaptation the process of bringing one thing
into correspondence with another implies a
modification according to changing circumstances
Merriam-Webster - Convergence when dialog participants change
their language use to be more similar to each
other over time - Priming A process that influences linguistic
decision-making. An instance of priming occurs
when a syntactic structure or lexical item giving
evidence of a linguistic choice (prime)
influences the recipient to make the same
decision, i.e. re-use the structure, at a later
choice-point (Reitter) - I differentiate between Directive and Responsive
adaptation
- Activating parts of particular representations
or associations in memory just before carrying
out an action or task. It is considered to be
one of the manifestations of implicit memory. A
property of priming is that the remembered item
is remembered best in the form in which it was
originally encountered.' (Wikipedia)
59Proposed corpus study
- What are users strategies in rephrasing
- Try to detect a pattern
- Does the form of concept used in prompts affect
users choices? - Hypotheses some forms on concept are recognized
correctly more frequently than others - Hypotheses through trial and error, user finds
optimal (most frequently recognized) phrasing.
60Variation in date-time confirmations in
Communicator
61Rate-a-course experiment
Aspects of the courses
Initiative condition System/mixed/user Adaptive
Condition Adaptive / non-adaptive
62Rate-a-course experiment
Aspects of the courses
Initiative condition System/mixed/user Adaptive
Condition Adaptive / non-adaptive
63Rate-a-course experiment comparing adaptation
Compare using inference on proportions System
Adaptive vs. System Non-adaptive
No significant difference
System initiative vs. User initiative vs. Mixed
initiative samples are too small
64Rate-a-course experiments detecting adaptation
Hypothesis users are more likely to follow
systems lexical choice
Problem unbalanced data. Can not make conclusion
65QA ASR experi-ment
66Evaluation
Evaluation 3 speakers read 40 questions Set 3
40 questions with a named entity Set 4 40
questions without a named entity
67Evaluation Result
Cheating model (contains questions)
68The End!