Title: EMOTIONAL SPACE IMPROVES EMOTION RECOGNITION
1EMOTIONAL SPACE IMPROVES EMOTION RECOGNITION
Man Machine Interface Lab Advance Technology
Center Stuttgart Sony International (Europe) GmbH
- Raquel Tato, Rocio Santos, Ralf Kompe
2Content
- State of the Art
- Motivation
- Goal
- Approach
- Results
- Conclusions
- Future Research
3State of the Art
State of the Art Motivation Goal Approach
Results Conclusions Future Research
- Database Professional actors
- Not really spontaneous speech.
- Exaggerated emotion following stereotypes.
- Features Prosody features.
- Easy to calculate
- Representing only one dimension in the emotional
space Arousal. - Pleasure dimension related to Voice Quality
features.
4Activation-Evaluation Theory
State of the Art Motivation Goal Approach
Results Conclusions Future Research
Activation Prosody
VERY ACTIVE
furious
excited
exhilarated
terrified
interested
angry
disgusted
delighted
happy
afraid
pleased
blissful
VERY NEGATIVE
VERY POSITIVE
neutral
Evaluation Voice Quality
sad
relaxed
bored
content
despairing
serene
depressed
VERY PASSIVE
5Prosody Features
State of the Art Motivation Goal Approach
Results Conclusions Future Research
- Acoustic Based on the speech signal.
- Ex. Intonation rising or falling, accents,
stress. - Linguistic (lexical, syntactic, semantic)
- Ex. syllable accent, sentence structure, etc.
Komm wir spielen (happy)
Komm wir spielen (bored)
6Voice Quality Features
State of the Art Motivation Goal Approach
Results Conclusions Future Research
- Articulatory precision vocal tract properties.
Ex. Formant structure
- Phonatory Quality auditory qualities
- that arise from variation in the source signal.
Ex. Glottal spectrum
7Goal
State of the Art Motivation Goal Approach
Results Conclusions Future Research
- Spontaneous Emotion Recognizer,
- Language and speaker independent.
- Only acoustic information.
- No stereotyped speech.
- New view of automatic emotion recognition.
- Need of taken into account, at least, the second
emotion dimension. - Relation of the emotional dimensions with
different types of features. - Application emotional space region recognition.
8Approach
State of the Art Motivation Goal Approach
Results Conclusions Future Research
Database
Feature Calculation
Classification
- Target scenario
- Sony entertainment robot AIBO
- One day with AIBO
- How to provoke emotions?
- Context action
- Automatic labeling
- Happy, bored, sad, angry and neutral
- Data
- 14 speakers
- 40 commands/emotion
- Prosody features
- Logarithmic F0 derivative
- Energy
- Durational aspects
- Jitter tremor
- Quality features
- Formants
- Harmonic to noise ratio
- Spectral energy distribution
- Voice to unvoiced energy ratio
- Glottal flow
- Sequential classifiers
- First classifier
- Arousal dimension, prosody features
- High ( happy/angry)
- Medium ( neutral)
- Low ( sad/bored)
- Second classifier
- Pleasure dimension, quality features
- Final decision
9SPEAKER DEPENDENT - AROUSAL.
State of the Art Motivation Goal Approach
Results Conclusions Future Research
- Speaker dependent / Prosody Features
- discrimination in the AROUSAL axis
- emotions groups according to the position in the
axis - high level happy angry
- medium level neutral
- low level bored sad
10SPEAKER DEPENDENT - AROUSAL.
State of the Art Motivation Goal Approach
Results Conclusions Future Research
- Average recognition rate 84
- No confusion along the arousal dimension.
- Confusability only with neutral emotion.
- Intermediate position
- Database properties
11SPEAKER DEPENDENT - PLEASURE.
State of the Art Motivation Goal Approach
Results Conclusions Future Research
- Discrimination between happy and angry 74
- Discrimination between bored and sad 66
Speaker-dependent happy-angry, sad-bored
classification
- More distance between happy and angry
- than between sad and bored in the pleasure axis.
12SPEAKER INDEPENDENT - AROUSAL.
State of the Art Motivation Goal Approach
Results Conclusions Future Research
- Average recognition rate 59.3
- Neutral recognition rate close to chance.
- Need of real neutral.
13SPEAKER INDEPENDENT - AROUSAL
State of the Art Motivation Goal Approach
Results Conclusions Future Research
Training with new neutrals
- Original test. (emotional neutrals) 61
- New test.
- (new neutrals) 77
14SPEAKER INDEPENDENT PLEASURE.
State of the Art Motivation Goal Approach
Results Conclusions Future Research
- Average recognition rate 60
- Discrimination between happy and angry better
than between bored and sad.
- Quality features very speaker dependent.
15Conclusions
State of the Art Motivation Goal Approach
Results Conclusions Future Research
- Prosody features Arousal
- Not enough
- Quality features Pleasure
- Further research needed
- Application
- Find a place in the emotional space
- additional information emotional state
- Pure neutral very ambiguous.
- In general emotional expression very contingent
upon environment. - Appropriate emotional database crucial
-
16Future Research
State of the Art Motivation Goal Approach
Results Conclusions Future Research
- Speaker independent voice quality features
- Improvement of the estimation reliability.
- Different features in different vowels.
- Pleasure dimension
- Quality features, but also
- ... some prosody features.
- Classification design speaker dependencies
- Speaker identification
- Specific models age, gender,
- Feature selection