Title: Why%20predict%20emotions?
1Why predict emotions?
1
- Affective computing direction for improving
spoken dialogue systems - Emotion detection (prediction)
- Emotion handling
Poster by Greg Nicholas. Adapted from paper by
Greg Nicholas, Mihai Rotaru, Diane Litman
Feature granularity levels
2
Detecting emotion train a classifier on features
extracted from user turns. Types of features
Word Level
Turn Level
1 uses pitch features computed at the word-level
Previous work uses mostly features computed over
the entire turn.
Amplitude
Approximations of pitch contours
Approximation of pitch contour
Lexical
Pitch
Duration
Offers a better approximation of the pitch
contour (e.g. captures the big changes in
uttering the word great.)
Efficient but offers a coarse approximation of
the pitch contour.
We concentrate on Pitch features to detect
Uncertainty
3
4
Problems classifying the overall turn emotion
Techniques to solve this problem
- Word-level is more complicated
- Label granularity mismatch label at turn
level, features at word level - Variable number of features per turn
- Turn-level is simple
- Labeling granularity turn
- One set of features per turn
Technique 1 Word-level emotion model (WLEM)
Technique 2 Predefined subset of sub-turn
units (PSSU)
Train word-level model with turns emotion
label Predict emotion label of each
word Combine majority voting of predictions
Combine Concatenate features from 3 words
(first, middle, last) into a conglomerate
feature set Train predict turn-level model
with turns emotion
Example student turn The force of the truck
Turn-level speech
Turn-level speech
The force of the truck
The force of the truck
Word-level feature set
Word-level feature set
extract
extract
the
force
of
the
truck
the
force
of
the
truck
Word-level feature set
(Five sets)
Turn-level feature set
predict
(One set)
combine
Word-level predictions
PSSU feature set
the
force
of
the
truck
Non-uncertain
Uncertain
Non-uncertain
the force of the truck
Uncertain
Non-uncertain
predict
predict
?
the
of
truck
Overall turn prediction
(One prediction)
Overall turn prediction
Overall turn prediction
Uncertain
(One prediction)
combine
Predict
Uncertain
Overall turn prediction
Non-uncertain (3/5)
Non-uncertain
- Issues
- Turn ? Word level labeling assumption
- Majority voting is a very simple scheme
- Issues
- Might lose details from discarded words
Experimental Results
5
Recall/Precision
1 showed that the WLEM method works better
than turn-level
Comparison of recall and precision for predicting
uncertain turns
Used in 2 at breath-group level but not at
word level
Corpus
- ITSPOKE dialogues
- Domain qualitative physics tutoring
- Backend WHY2-Atlas, Sphinx2 speech recognition,
Cepstral text-to-speech
Future work
6
Overall prediction accuracy
- Many alterations could further improve these
techniques - Annotate each individual word for certainty
instead of whole turns - Include other features pictured above lexical,
amplitude, etc. - Try predicting in a human-human dialogue context
- Better combination techniques (e.g. confidence
weighting) - More selective choices for PSSU than the middle
word of the turn (e.g. longest word in the turn,
ensuring the word chosen has domain-specific
content)
Turn-level Word-level (WLEM) Word-level (PSSU)
81.97 (0.09) 82.53 (0.07) 84.11 (0.05)
Corpus comparison with previous study 1
Previous Current
of turns 220 9854
of words 511 27548
words/turns 2.32 2.80
Emotional classification Emotional/ Non-emotional Uncertain/ Non-uncertain
Class distribution 129/91 (E/nE) 2189/7665 (U/nU)
Baseline 58.64 77.79
Baseline 77.79
- WLEM word-level slightly improves upon turn-level
(0.56) - PSSU word-level show a much better improvement
(2.14) - Overall, PSSU is best according to this metric as
well
- Turn-level Medium recall/precision
- WLEM Best recall, lowest precision
- Tends to over-generalize
- PSSU Good recall, best precision
- Much less over-generalization, overall best choice
1 M. Rotaru and D. Litman, "Using Word-level
Pitch Features to Better Predict Student Emotions
during Spoken Tutoring Dialogues," Proceedings
of Interspeech, 2005. 2 J. Liscombe, J.
Hirschberg, and J. J. Venditti, "Detecting
Certainness in Spoken Tutorial Dialogues,
Proceedings of Interspeech, 2005.