Why%20predict%20emotions? - PowerPoint PPT Presentation

About This Presentation
Title:

Why%20predict%20emotions?

Description:

Turn-level speech: Overall turn prediction: (One set) Uncertain. Turn ... Backend: WHY2-Atlas, Sphinx2 speech recognition, Cepstral text-to-speech. 77.79% 58.64 ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 2
Provided by: mihair
Category:

less

Transcript and Presenter's Notes

Title: Why%20predict%20emotions?


1
Why predict emotions?
1
  • Affective computing direction for improving
    spoken dialogue systems
  • Emotion detection (prediction)
  • Emotion handling

Poster by Greg Nicholas. Adapted from paper by
Greg Nicholas, Mihai Rotaru, Diane Litman
Feature granularity levels
2
Detecting emotion train a classifier on features
extracted from user turns. Types of features
Word Level
Turn Level
1 uses pitch features computed at the word-level
Previous work uses mostly features computed over
the entire turn.
Amplitude
Approximations of pitch contours
Approximation of pitch contour
Lexical
Pitch
Duration
Offers a better approximation of the pitch
contour (e.g. captures the big changes in
uttering the word great.)
Efficient but offers a coarse approximation of
the pitch contour.
We concentrate on Pitch features to detect
Uncertainty
3
4
Problems classifying the overall turn emotion
Techniques to solve this problem
  • Word-level is more complicated
  • Label granularity mismatch label at turn
    level, features at word level
  • Variable number of features per turn
  • Turn-level is simple
  • Labeling granularity turn
  • One set of features per turn

Technique 1 Word-level emotion model (WLEM)
Technique 2 Predefined subset of sub-turn
units (PSSU)
Train word-level model with turns emotion
label Predict emotion label of each
word Combine majority voting of predictions
Combine Concatenate features from 3 words
(first, middle, last) into a conglomerate
feature set Train predict turn-level model
with turns emotion
Example student turn The force of the truck
Turn-level speech
Turn-level speech
The force of the truck
The force of the truck
Word-level feature set
Word-level feature set










extract
extract
the
force
of
the
truck
the
force
of
the
truck
Word-level feature set
(Five sets)
Turn-level feature set





predict
(One set)
combine

Word-level predictions
PSSU feature set
the
force
of
the
truck
Non-uncertain
Uncertain
Non-uncertain
the force of the truck

Uncertain
Non-uncertain
predict
predict
?
the
of
truck
Overall turn prediction
(One prediction)
Overall turn prediction
Overall turn prediction
Uncertain
(One prediction)
combine
Predict
Uncertain
Overall turn prediction
Non-uncertain (3/5)
Non-uncertain
  • Issues
  • Turn ? Word level labeling assumption
  • Majority voting is a very simple scheme
  • Issues
  • Might lose details from discarded words

Experimental Results
5
Recall/Precision
1 showed that the WLEM method works better
than turn-level
Comparison of recall and precision for predicting
uncertain turns
Used in 2 at breath-group level but not at
word level
Corpus
  • ITSPOKE dialogues
  • Domain qualitative physics tutoring
  • Backend WHY2-Atlas, Sphinx2 speech recognition,
    Cepstral text-to-speech

Future work
6
Overall prediction accuracy
  • Many alterations could further improve these
    techniques
  • Annotate each individual word for certainty
    instead of whole turns
  • Include other features pictured above lexical,
    amplitude, etc.
  • Try predicting in a human-human dialogue context
  • Better combination techniques (e.g. confidence
    weighting)
  • More selective choices for PSSU than the middle
    word of the turn (e.g. longest word in the turn,
    ensuring the word chosen has domain-specific
    content)

Turn-level Word-level (WLEM) Word-level (PSSU)
81.97 (0.09) 82.53 (0.07) 84.11 (0.05)
Corpus comparison with previous study 1
Previous Current
of turns 220 9854
of words 511 27548
words/turns 2.32 2.80
Emotional classification Emotional/ Non-emotional Uncertain/ Non-uncertain
Class distribution 129/91 (E/nE) 2189/7665 (U/nU)
Baseline 58.64 77.79
Baseline 77.79
  • WLEM word-level slightly improves upon turn-level
    (0.56)
  • PSSU word-level show a much better improvement
    (2.14)
  • Overall, PSSU is best according to this metric as
    well
  • Turn-level Medium recall/precision
  • WLEM Best recall, lowest precision
  • Tends to over-generalize
  • PSSU Good recall, best precision
  • Much less over-generalization, overall best choice

1 M. Rotaru and D. Litman, "Using Word-level
Pitch Features to Better Predict Student Emotions
during Spoken Tutoring Dialogues," Proceedings
of Interspeech, 2005. 2 J. Liscombe, J.
Hirschberg, and J. J. Venditti, "Detecting
Certainness in Spoken Tutorial Dialogues,
Proceedings of Interspeech, 2005.
Write a Comment
User Comments (0)
About PowerShow.com