Multimodal Information Analysis for Emotion Recognition - PowerPoint PPT Presentation

About This Presentation
Title:

Multimodal Information Analysis for Emotion Recognition

Description:

Demo Things to do Inference based on temporal relation between instantaneous classifications. Tests on natural audio-visual database (on-going). – PowerPoint PPT presentation

Number of Views:309
Avg rating:3.0/5.0
Slides: 22
Provided by: mal9157
Category:

less

Transcript and Presenter's Notes

Title: Multimodal Information Analysis for Emotion Recognition


1
Multimodal Information Analysis for Emotion
Recognition
  • (Tele Health Care Application)
  • Malika Meghjani
  • Gregory Dudek and Frank P. Ferrie

2
Content
  1. Our Goal
  2. Motivation
  3. Proposed Approach
  4. Results
  5. Conclusion

3
Our Goal
  • Automatic emotion recognition using audio-visual
    information analysis.
  • Create video summaries by automatically labeling
    the emotions in a video sequence.

4
Motivation
  • Map Emotional States of the Patient to Nursing
    Interventions.
  • Evaluate the role of Nursing Interventions for
    improvement in patients health.

NURSING INTERVENTIONS
5
Proposed Approach
Visual Feature Extraction
Visual based Emotion Classification
Recognized Emotional State
Data Fusion
Decision Level Fusion
Audio Feature Extraction
Audio based Emotion Classification
Feature Level Fusion
6
Visual Analysis
  • Face Detection (Voila Jones Face Detector using
    Haar Wavelets and Boosting Algorithm)
  • Feature Extraction (Gabor Filter 5 Spatial
    Frequencies and 4 orientations, 20 filter
    responses for each frame)
  • Feature Selection (Select the most discriminative
    features in all the emotional classes)
  • SVM Classification (Classification with
    probability estimates)

7
Visual Feature Extraction

X
( 5 Frequencies X 4 Orientation )
Frequency Domain Filters
Feature Selection
Automatic Emotion Classification
8
Audio Analysis
  • Audio Pre-Processing (Remove leading and trailing
    edges)
  • Feature Extraction (Statistics of Pitch and
    Intensity contours and Mel Frequency Cepstral
    Coefficients)
  • Feature Normalization (Remove inter speaker
    variability)
  • SVM Classification (Classification with
    probability estimates)

9
Audio Feature Extraction
Audio Feature Extraction
Automatic Emotion Classification
Speech Rate PitchIntensitySpectrum Analysis Mel
Frequency Cepstral Coefficients(Short-term Power
Spectrum of Sound)
Audio Signal
10
SVM Classification
11
Feature Selection
  • Feature selection method is similar to the SVM
    classification. (Wrapper Method)
  • Generates a separating plane by minimizing the
    weighted sum of distances of misclassified data
    points to two parallel planes.
  • Suppress as many components of the normal to the
    separating plane which provide consistent results
    for classification.

Average Count of Error
Features Distance Selected
Bounding Planes
12
Data Fusion



















  • Decision Level
  • Obtaining probability estimate for each emotional
    class using SVM margins.
  • The probability estimates from two modalities are
    multiplied and re-normalized to the give final
    estimation of decision level emotional
    classification.
  • 2. Feature Level
  • Concatenate the Audio and Visual feature and
    repeat feature selection and SVM classification
    process.













13
Database and Training
  • Database
  • Visual only posed database
  • Audio Visual posed database
  • Training
  • Audio segmentation based on minimum window
    required for feature extraction.
  • Corresponding visual key frame extraction in the
    segmented window.
  • Image based training and audio segment based
    training.

14
Experimental Results
Statistics Database No. of Training Examples No. of Subjects No. of Emotional State Recognition Rate Validation Method
Posed Visual Data Only (CKDB) 120 20 5Nuetral 75 Leave one subject out cross validation
Posed Audio Visual Data (EDB) 270 9 6 82 76 Decision Level Feature Level
15
Time Series Plot
Surprise Sad
Angry Disgust Happy
75 Leave One Subject Out Cross Validation Results
( Cohen Kanade Database, Posed Visual only
Database)
16
Feature Level Fusion
(eNTERFACE 2005 , Posed Audio Visual Database)
17
Decision Level Fusion
(eNTERFACE 2005 , Posed Audio Visual Database)
18
Confusion Matrix
19
Demo
20
Conclusion
  • Combining two modalities (Audio and Visual)
    improves overall recognition rates by 11 with
    Decision Level Fusion and by 6 with Feature
    Level Fusion
  • Emotions where vision wins Disgust, Happy and
    Surprise.
  • Emotions where audio wins Anger and Sadness
  • Fear was equally well recognized by the two
    modalities.
  • Automated multimodal emotion recognition is
    clearly effective.

21
Things to do
  • Inference based on temporal relation between
    instantaneous classifications.
  • Tests on natural audio-visual database (on-going).
Write a Comment
User Comments (0)
About PowerShow.com