Title: Classification of Discourse Functions of Affirmative Words in Spoken Dialogue
1Classification of Discourse Functions of
Affirmative Words in Spoken Dialogue
INTERSPEECH, Antwerp, August 2007
- Agustín Gravano, Stefan Benus, Julia Hirschberg
- Shira Mitchell, Ilia Vovsha
Spoken Language Processing Group Columbia
University
2Cue Words
- Ambiguous linguistic expressions used for
- Making a semantic contribution, or
- Conveying a pragmatic function.
- Examples now, well, so, alright, and, okay,
first, by the way, on the other hand. - Single affirmative cue words
- Examples alright, okay, mm-hm, right, uh-huh,
yes. - May be used to convey acknowledgment or
agreement, to change topic, to backchannel, etc.
3Research Goals
- Learn which features best characterize the
different functions of single affirmative cue
words. - Determine how these can be identified
automatically. - Important in Spoken Dialogue Systems
- Understand user input.
- Produce output appropriately.
4Previous Work
- Classification of cue words into discourse vs.
sentential use. - Hirschberg Litman 87, 93 Litman 94 Heeman,
Byron Allen 98 Zufferey Popescu-Belis 04. - In our corpus
- right 15 discourse, 85 sentential.
- All other affirmative cue words 99 disc., 1
sent. - Discourse vs. sentential distinction
insufficient. Need to define new classification
tasks.
5Talk Overview
- Columbia Games Corpus
- Classification tasks
- Experimental features
- Results
6The Columbia Games Corpus
- 12 spontaneous task-oriented dyadic conversations
in Standard American English. - 2 subjects playing computer games no eye
contact.
7The Columbia Games CorpusFunction of Affirmative
Cue Words
- Cue Words
- alright
- gotcha
- huh
- mm-hm
- okay
- right
- uh-huh
- yeah
- yep
- yes
- yup
- Functions
- Acknowledgment / Agreement
- Backchannel
- Cue beginning discourse segment
- Cue ending discourse segment
- Check with the interlocutor
- Stall / Filler
- Back from a task
- Literal modifier
- Pivot beginning Ack/Agree Cue begin
- Pivot ending Ack/Agree Cue end
7.9 of the words in our corpus
8The Columbia Games CorpusFunction of Affirmative
Cue Words
- Literal Modifier
- thats pretty much okay
- Backchannel
- Speaker 1 between the yellow mermaid and the
whaleSpeaker 2 okaySpeaker 1 and it is - Cue beginning discourse segment
- okay we gonna be placing the blue moon
9The Columbia Games CorpusFunction of Affirmative
Cue Words
- 3 trained labelers
- Inter-labeler agreement
- Fleiss Kappa 0.69 (Fleiss 71)
- In this study we use the majority label for each
affirmative cue word. - Majority label label chosen by at least two of
the three labelers.
10MethodTwo new classification tasks
- Identification of a discourse segment boundary
function - Segment beginning vs. Segment end vs. No
discourse segment boundary function - Identification of an acknowledgment function
- Acknowledgment vs. No acknowledgment
11MethodMachine Learning Experiments
- ML Algorithm
- JRip Wekas implementation of the propositional
rule learner Ripper (Cohen 95). - We also tried J4.8, Wekas implementation of the
decision tree learner C4.5 (Quinlan 93, 96),
with similar results. - 10-fold cross validation in all experiments.
12MethodExperimental features
- IPU (Inter-pausal unit)
- Maximal sequence of words delimited by pause gt
50ms. - Conversational Turn
- Maximal sequence of IPUs by the same speaker,
with no contribution from the other speaker.
13MethodExperimental features
- Text-based features
- Extracted from the text transcriptions.
- Lexical id POS tags position of word in IPU /
turn etc. - Timing features
- Extracted from the time alignment of the
transcriptions. - Word / IPU / turn duration amount of overlap
etc. - Acoustic features
- min, mean, max, stdev x pitch, intensity
- Slope of pitch, stylized pitch, and intensity,
over the whole word, and over its last 100, 200,
300ms. - Acoustic features from the end of the other
speakers previous turn.
14ResultsDiscourse segment boundary function
Feature Set Error Rate F-Measure F-Measure
Feature Set Error Rate Begin End
Text-based 11.6 .77 .30
Timing 11.3 .73 .52
Acoustic 14.2 .66 .19
Text-based Timing 9.8 .81 .53
Full set 9.6 .81 .57
Baseline (1) 19.0 .00 .00
Human labelers (2) 5.7 .94 .71
(1) Majority class baseline NO BOUNDARY. (2)
Calculated wrt each labelers agreement with the
majority labels.
15ResultsAcknowledgment function
Feature Set Error Rate F-Measure
Text-based 8.3 .94
Timing 11.0 .92
Acoustic 17.2 .87
Text-based Timing 6.2 .95
Full set 6.5 .95
Baseline (1) 16.7 .88
Human labelers (2) 5.5 .98
(1) Baseline based on lexical identity huh,
right ? no ACK all other words ? ACK (2)
Calculated wrt each labelers agreement with the
majority labels.
16Best-performing features
Discourse Segment Boundary Function Acknowledgment Function
Lexical identity POS tag of the following word Number and proportion of succeeding words in the turn Context-normalized mean intensity Lexical identity POS tag of preceding word Number and proportion of preceding words in the turn IPU and turn length
17ResultsClassification of individual words
- Classification of each individual word into its
most common functions. - alright ? Ack/Agree, Cue Begin, Other
- mm-hm ? Ack/Agree, Backchannel
- okay ? Ack/Agree, Backchannel, Cue Begin,
AckCueBegin, AckCueEnd, Other - right ? Ack/Agree, Check, Literal Modifier
- yeah ? Ack/Agree, Backchannel
18ResultsClassification of the word okay
Feature Set Error Rate F-Measure F-Measure F-Measure F-Measure F-Measure
Feature Set Error Rate Ack /Agree Back-channel Cue Begin Ack/Agree Cue Begin Ack/Agree Cue End
Text-based 31.7 .76 .16 .77 .09 .33
Acoustic 40.2 .69 .24 .64 .03 .25
Text-based Timing 25.6 .79 .31 .82 .18 .67
Full set 25.5 .80 .46 .83 .21 .66
Baseline (1) 48.3 .68 .00 .00 .00 .00
Human labelers (2) 14.0 .89 .78 .94 .56 .73
(1) Majority class baseline ACK/AGREE. (2)
Calculated wrt each labelers agreement with the
majority labels.
19Summary
- Discourse/sentential distinction is insufficient
for affirmative cue words in spoken dialogue. - Two new classification tasks
- Detection of an acknowledgment function.
- Detection of a discourse boundary function.
- Best performing ML models
- Based on textual and timing features.
- Slight improvement when using acoustic features.
20Further Work
- Gravano et al, 2007
- On the role of context and prosody in the
interpretation of okay.ACL 2007, Prague, Czech
Republic, June 2007. - Benus et al, 2007
- The prosody of backchannels in American English.
ICPhS 2007, Saarbrücken, Germany, August 2007.
21Classification of Discourse Functions of
Affirmative Words in Spoken Dialogue
INTERSPEECH, Antwerp, August 2007
- Agustín Gravano, Stefan Benus, Julia Hirschberg
- Shira Mitchell, Ilia Vovsha
Spoken Language Processing Group Columbia
University
22alright mm-hm okay right uh-huh yeah Other Total
Ack / Agree 99 61 1137 114 18 808 133 2370
Backchannel 6 402 121 14 143 72 5 763
Cue Begin 89 0 548 2 0 2 0 641
Cue End 8 0 10 0 0 0 0 18
Pivot Begin 5 0 68 0 0 0 0 73
Pivot End 13 12 232 2 0 22 17 298
Back from Task 9 1 33 0 0 0 0 43
Check 0 0 6 53 0 1 8 68
Stall 1 0 15 1 0 2 0 19
Literal Modifier 9 0 29 1079 0 0 1 1118
? 56 27 235 10 3 65 11 407
Total 295 503 2434 1275 164 972 175 5818