Error Detection and Correction in SDS

About This Presentation

Title:

Error Detection and Correction in SDS

Description:

U: I would like a train to New York City from Philadelphia on Sunday at ten thirty P M ... to New York City on Sunday approximately at ten thirty p. m. ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 33

Provided by: juliahir

Learn more at: http://www1.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Error Detection and Correction in SDS

1
Error Detection and Correction in SDS

Julia Hirschberg
CS 4706

2
Today

Avoiding errors
Detecting errors
From the user side what cues does the user
provide to indicate an error?
From the system side how likely is it the
system made an error?
Dealing with Errors what can the system do when
it thinks an error has occurred?
Evaluating SDS evaluating problem dialogues

3
Avoiding misunderstandings

The problem
By imitating human performance
Timing and grounding (Clark 03)
Confirmation strategies
Clarification and repair subdialogues

4
Today

Avoiding errors
Detecting errors
From the user side what cues does the user
provide to indicate an error?
From the system side how likely is it the
system made an error?
Dealing with Errors what can the system do when
it thinks an error has occurred?
Evaluating SDS evaluating problem dialogues

5
Learning from Human Behavior Features in
repetition corrections (KTH)
50
adults
40
children
30
Percentage of all repetitions
20
10
0
more
shifting of
increased
clearly
focus
loudness
articulated
6
Learning from Human Behavior (Krahmer et al 01)

Learning from human behavior
go on and go back signals in grounding
situations (implicit/explicit verification)
Positive short turns, unmarked word order,
confirmation, answers, no corrections or
repetitions, new info
Negative long turns, marked word order,
disconfirmation, no answer, corrections,
repetitions, no new info

Hypotheses supported but
Can these cues be identified automatically?
How might they affect the design of SDS?

8
Today

Avoiding errors
Detecting errors
From the user side what cues does the user
provide to indicate an error?
From the system side how likely is it the
system made an error?
Dealing with Errors what can the system do when
it thinks an error has occurred?
Evaluating SDS evaluating problem dialogues

9
Systems Have Trouble Knowing When Theyve Made a
Mistake

Hard for humans to correct system misconceptions
(Krahmer et al 99)
User I want to go to Boston.
System What day do you want to go to Baltimore?
Easier answering explicit requests for
confirmation or responding to ASR rejections
System Did you say you want to go to Baltimore?
System I'm sorry. I didn't understand you. Could
you please repeat your utterance?

But constant confirmation or over-cautious
rejection lengthens dialogue and decreases user
satisfaction

11
And Systems Have Trouble Recognizing User
Corrections

Probability of recognition failures increases
after a misrecognition (Levow 98)
Corrections of system errors often
hyperarticulated (louder, slower, more internal
pauses, exaggerated pronunciation) ? more ASR
error (Wade et al 92, Oviatt et al 96, Swerts
Ostendorf 97, Levow 98, Bell Gustafson 99)

12
Can Prosodic Information Help Systems Perform
Better?

If errors occur where speaker turns are
prosodically marked.
Can we recognize turns that will be misrecognized
by examining their prosody?
Can we modify our dialogue and recognition
strategies to handle corrections more
appropriately?

13
Approach

Collect corpus from interactive voice response
system
Identify speaker turns
incorrectly recognized
where speakers first aware of error
that correct misrecognitions
Identify prosodic features of turns in each
category and compare to other turns
Use Machine Learning techniques to train a
classifier to make these distinctions
automatically

misrecognition
aware site
correction
14
Turn Types
TOOT Hi. This is ATT Amtrak Schedule System.
This is TOOT. How may I help you? User Hello.
I would like trains from Philadelphia to New York
leaving on Sunday at ten thirty in the evening.
TOOT Which city do you want to go to? User
New York.
misrecognition
correction
aware site
15
TOOT Dialogues

Collected to study effects of differences in
dialogue strategy on user performance and
satisfaction (Litman Pan 99)
type of initiative (system, user, mixed)
type of confirmation (explicit, implicit, none)
adaptability condition
Subjects
39 summer students
16/23 (F/M)
20/19 (native speaker/non)

Platform combined over-the-phone ASR and TTS
(Kamm et al 97) with web access to train
information
Task find train information for 4 scenarios
Corpus for current study
2328 speaker turns
52 dialogues
Misrecognitions
Overall word accuracy 61
Overall concept accuracy(CA) 71
I want to go to Boston from Philadelphia (2
domain concepts)
recognized as I want to go to Boston (one
concept) 50

17
A Successful Dialogue

S Hi, this is ATT Amtrak schedule system. This
is TOOT. How may I help you?
U I would like a train to New York City from
Philadelphia on Sunday at ten thirty P M
S I heard you say go to New York City from
Philadelphia on Sunday approximately at ten
thirty p.m." Do you want me to find the trains
from Philadelphia to New York City on Sunday
approximately at ten thirty p. m. now?
U Yes.
S I am going to get the train schedule for
you....

18
Are Misrecognitions, Aware Turns, Corrections
Measurably Different from Other Turns?

For each type of turn
For each speaker, for each prosodic feature,
calculate mean values for e.g. all correctly
recognized speaker turns and for all incorrectly
recognized turns
Perform paired t-tests on these speaker pairs of
means (e.g., for each speaker, pairing mean
values for correctly and incorrectly recognized
turns)

19
How Prosodic Features Examined per Turn

Raw prosodic/acoustic features
f0 maximum and mean (pitch excursion/range)
rms maximum and mean (amplitude)
total duration
duration of preceding silence
amount of silence within turn
speaking rate (estimated from syllables of
recognized string per second)
Normalized versions of each feature (compared to
first turn in task, to previous turn in task, Z
scores)

20
Distinguishing Correct Recognitions from
Misrecognitions (NAACL 00)

Misrecognitions differ prosodically from correct
recognitions in
F0 maximum (higher)
RMS maximum (louder)
turn duration (longer)
preceding pause (longer)
slower
Effect holds up across speakers and even when
hyperarticulated turns are excluded

21
WER-Based Results
Misrecognitions are higher in pitch, louder,
longer, more preceding pause and less internal
silence
22
Predicting Turn Types Automatically

Ripper (Cohen 96) automatically induces rule
sets for predicting turn types
greedy search guided by measure of information
gain
input vectors of feature values
output ordered rules for predicting dependent
variable and (X-validated) scores for each rule
set
Independent variables
all prosodic features, raw and normalized
experimental conditions (adaptability of system,
initiative type, confirmation style, subject,
task)
gender, native/non-native status
ASR recognized string, grammar, and acoustic
confidence score

23
ML Results WER-defined Misrecognition
24
Best Rule-Set for Predicting WER
Using prosody, ASR conf, ASR string, ASR grammar
if (conf lt -2.85 (duration gt 1.27) then
F if (conf lt -4.34) then F if (tempo lt .81)
then F If (conf lt -4.09 then F If (conf lt
-2.46 str contains help then F If conf lt
-2.47 ppau gt .77 tempo lt .25 then F If str
contains nope then F If dur gt 1.71 tempo lt
1.76 then F else T
25
Today

Avoiding errors
Detecting errors
From the user side what cues does the user
provide to indicate an error?
From the system side how likely is it the
system made an error?
Dealing with Errors what can the system do when
it thinks an error has occurred?
Evaluating SDS evaluating problem dialogues

26
Error Handling Strategies

If systems can recognize their lack of
recognition, how should they inform the user that
they dont understand (Goldberg et al 03)?
System rephrasing vs. repetitions vs. statement
of not understanding
Apologies
What behaviors might these produce?
Hyperarticulation
User frustration
User repetition vs. rephrasing

What lessons do we learn?
When users are frustrated they are generally
harder to recognize accurately
When users are increasingly misrecognized they
tend to be misrecognized more often and become
increasingly frustrated
Apologies combined with rephrasing of system
prompts tend to decrease frustration and improve
WER Dont just repeat!
Users are better recognized when they rephrase
their input

28
How does an SDS Recognize a Correction? (ICSLP
00)
TOOT Hi. This is ATT Amtrak Schedule System.
This is TOOT. How may I help you? User Hello.
I would like trains from Philadelphia to New York
leaving on Sunday at ten thirty in the evening.
TOOT Which city do you want to go to? User New
York.
correction
29
Serious Problem for Spoken Dialogue Systems

29 of turns in our corpus are corrections
52 of corrections are hyperarticulated but only
12 of other turns
Corrections are misrecognized at least twice as
often as non-corrections (60 vs. 31)
But corrections are no more likely to be rejected
than non-corrections. (9 vs. 8)
Are corrections also measurably distinct from
non-corrections?

30
Prosodic Indicators of Corrections

Corrections differ from other turns prosodically
longer, louder, higher in pitch excursion,
longer preceding pause, less internal silence
ML results
Baseline 30 error
normd prosody non-prosody 18.45 /- 0.78
automatic 21.48 /- 0.68

31
Prosodic Indicators of Corrections

Corrections differ from other turns prosodically
longer, louder, higher in pitch excursion,
longer preceding pause, less internal silence

32
ML Rules for Correction Prediction

Baseline 30 error (predict not correction)
normd prosody non-prosody 18.45 /- 0.78
automatic 21.48 /- 0.68
TRUE - gramuniversal, f0maxgt0.96, durgt6.55
TRUE - gramuniversal, zerosgt0.57, asrlt-2.95
TRUE - gramuniversal, f0maxlt1.98, durlt1.10,
tempogt1.21, zerosgt0.71
TRUE - durgt0.76, asrlt-2.97, stratUsrNoConf
TRUE - durgt2.28, ppault0.86
TRUE - rmsavgt1.11, stratMixedImplicit,
gramcityname, f0maxgt0.70
default FALSE

33
Corrections in Context

Similar in prosodic features but
What about their form and content?
How do system behaviors affect the corrections
users produce?
What sort of corrections are most, least
effective?
When users correct the same mistake more than
once, do they vary their strategy in productive
ways?

34
User Correction Behavior

Correction classes
omits and repetitions lead to fewer
misrecognitions than adds and paraphrases
Turns that correct rejections are more likely to
be repetitions, while turns correcting
misrecognitions are more likely to be omits

Type of correction sensitive to strategy
much more likely to exactly repeat their
misrecognized utterance in a system-initiative
environment
much more likely to correct by omitting
information if no system confirmation than with
explicit confirmation
omits used more in MixedImplicit and
UserNoConfirm conditions
Restarts unlikely to be recognized (77
misrecognized) and skewed in distribution
31 of corrections are restarts in MI and UNC

None for SE, where initial turns well recognized
It doesnt pay to start over!

37
Today

Avoiding errors
Detecting errors
From the user side what cues does the user
provide to indicate an error?
From the system side how likely is it the
system made an error?
Dealing with Errors what can the system do when
it thinks an error has occurred?
Evaluating SDS evaluating problem dialogues

38
Recognizing Problematic Dialogues

Hastie et al, Whats the Trouble? ACL 2002
How to define a dialogue as problematic?
User satisfaction is low
Task is not completed
How to recognize?
Train on a corpus of recorded dialogues (1242
DARPA Communicator dialogues)
Predict
User Satisfaction
Task Completion (0,1,2)

User Satisfaction features

40
Results
41
Next Class

Speech data mining
HW3c due

Write a Comment

User Comments (0)

About PowerShow.com

Error Detection and Correction in SDS - PowerPoint PPT Presentation

Error Detection and Correction in SDS

U: I would like a train to New York City from Philadelphia on Sunday at ten thirty P M ... to New York City on Sunday approximately at ten thirty p. m. ... – PowerPoint PPT presentation