Why%20predict%20emotions? - PowerPoint PPT Presentation

About This Presentation

Title:

Why%20predict%20emotions?

Description:

Turn-level speech: Overall turn prediction: (One set) Uncertain. Turn ... Backend: WHY2-Atlas, Sphinx2 speech recognition, Cepstral text-to-speech. 77.79% 58.64 ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 2

Provided by: mihair

Category:

more less

Transcript and Presenter's Notes

Title: Why%20predict%20emotions?

1
Why predict emotions?
1

Affective computing direction for improving
spoken dialogue systems
Emotion detection (prediction)
Emotion handling

Poster by Greg Nicholas. Adapted from paper by
Greg Nicholas, Mihai Rotaru, Diane Litman
Feature granularity levels
2
Detecting emotion train a classifier on features
extracted from user turns. Types of features
Word Level
Turn Level
1 uses pitch features computed at the word-level
Previous work uses mostly features computed over
the entire turn.
Amplitude
Approximations of pitch contours
Approximation of pitch contour
Lexical
Pitch
Duration
Offers a better approximation of the pitch
contour (e.g. captures the big changes in
uttering the word great.)
Efficient but offers a coarse approximation of
the pitch contour.
We concentrate on Pitch features to detect
Uncertainty
3
4
Problems classifying the overall turn emotion
Techniques to solve this problem

Word-level is more complicated
Label granularity mismatch label at turn
level, features at word level
Variable number of features per turn

Turn-level is simple
Labeling granularity turn
One set of features per turn

Technique 1 Word-level emotion model (WLEM)
Technique 2 Predefined subset of sub-turn
units (PSSU)
Train word-level model with turns emotion
label Predict emotion label of each
word Combine majority voting of predictions
Combine Concatenate features from 3 words
(first, middle, last) into a conglomerate
feature set Train predict turn-level model
with turns emotion
Example student turn The force of the truck
Turn-level speech
Turn-level speech
The force of the truck
The force of the truck
Word-level feature set
Word-level feature set

extract
extract
the
force
of
the
truck
the
force
of
the
truck
Word-level feature set
(Five sets)
Turn-level feature set

predict
(One set)
combine

Word-level predictions
PSSU feature set
the
force
of
the
truck
Non-uncertain
Uncertain
Non-uncertain
the force of the truck

Uncertain
Non-uncertain
predict
predict
?
the
of
truck
Overall turn prediction
(One prediction)
Overall turn prediction
Overall turn prediction
Uncertain
(One prediction)
combine
Predict
Uncertain
Overall turn prediction
Non-uncertain (3/5)
Non-uncertain

Issues
Turn ? Word level labeling assumption
Majority voting is a very simple scheme

Issues
Might lose details from discarded words

Experimental Results
5
Recall/Precision
1 showed that the WLEM method works better
than turn-level
Comparison of recall and precision for predicting
uncertain turns
Used in 2 at breath-group level but not at
word level
Corpus

ITSPOKE dialogues
Domain qualitative physics tutoring
Backend WHY2-Atlas, Sphinx2 speech recognition,
Cepstral text-to-speech

Future work
6
Overall prediction accuracy

Many alterations could further improve these
techniques
Annotate each individual word for certainty
instead of whole turns
Include other features pictured above lexical,
amplitude, etc.
Try predicting in a human-human dialogue context
Better combination techniques (e.g. confidence
weighting)
More selective choices for PSSU than the middle
word of the turn (e.g. longest word in the turn,
ensuring the word chosen has domain-specific
content)

Turn-level Word-level (WLEM) Word-level (PSSU)
81.97 (0.09) 82.53 (0.07) 84.11 (0.05)
Corpus comparison with previous study 1
Previous Current
of turns 220 9854
of words 511 27548
words/turns 2.32 2.80
Emotional classification Emotional/ Non-emotional Uncertain/ Non-uncertain
Class distribution 129/91 (E/nE) 2189/7665 (U/nU)
Baseline 58.64 77.79
Baseline 77.79

WLEM word-level slightly improves upon turn-level
(0.56)
PSSU word-level show a much better improvement
(2.14)
Overall, PSSU is best according to this metric as
well

Turn-level Medium recall/precision
WLEM Best recall, lowest precision
Tends to over-generalize
PSSU Good recall, best precision
Much less over-generalization, overall best choice

1 M. Rotaru and D. Litman, "Using Word-level
Pitch Features to Better Predict Student Emotions
during Spoken Tutoring Dialogues," Proceedings
of Interspeech, 2005. 2 J. Liscombe, J.
Hirschberg, and J. J. Venditti, "Detecting
Certainness in Spoken Tutorial Dialogues,
Proceedings of Interspeech, 2005.

Write a Comment

User Comments (0)