Titelmaster - PowerPoint PPT Presentation

About This Presentation
Title:

Titelmaster

Description:

Stefan Scherer, Hansj rg Hofmann, Malte Lampmann, ... Spectrogram. Modulation. Spectrogram. Time. Echo State Networks. Recurrent artificial neural network ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 16
Provided by: lrec
Learn more at: http://www.lrec-conf.org
Category:

less

Transcript and Presenter's Notes

Title: Titelmaster


1
Stefan Scherer 24.09.2007 LREC 2008 Institute
of Neural Information Processing Ulm
University stefan.scherer_at_uni-ulm.de
Emotion Recognition from Speech Stress
Experiment Stefan Scherer, Hansjörg Hofmann,
Malte Lampmann, Martin Pfeil, Steffen Rhinow,
Friedhelm Schwenker, Günther Palm
2
Motivation
  • Why stress recognition from speech?
  • Safety and usability purposes
  • More efficient and natural interfaces
  • Several existing applications are based on speech
    only (call center applications)
  • Existing problems
  • Existing databases are limited
  • Stress induced by increasing workload missing
  • Choice of representative features difficult

3
Experimental Setup
4
Experimental Setup Summary
  • Direct planes towards corresponding exit
  • Four types of questions (personal, enumerations,
    general knowledge, Jeopardy)
  • Difficulty levels differ in plane speed, number
    of planes and exit sizes
  • Points are earned or lost and current score is
    color coded
  • One game lasts 10 minutes
  • Self-assessment of experienced stress is
    questioned three times

5
Evaluation and Labeling of Recordings
  • Everybody reacts differently towards stress
  • No common labels available for the recordings
  • ? Second labeling experiment to obtain fuzzy
    labels for each of the recordings

6
Evaluation and Labeling of Recordings
Speaker Mean P25 P75 Self-Assess. Crashes
1 35.8 24 47 1/2/4 0/4/13
2 41.9 25 59 2/4/? 0/4/30
3 45.2 29.5 61 7/6/8 1/10/37
4 31.0 20 40 1/1/2 0/2/16
5 43.2 25 61 7/8/9 0/3/28
6 43.0 23 60 4/4/6 0/3/26
7 31.2 21 37 1/3/7 0/1/23
8 33.2 21 41 1/1/3-4 0/0/8
9 38.0 23 51 1/1-2/5 0/6/31
10 35.7 22 49 1/2/5 0/3/11
11 49.6 31.75 65 7/9/10 5/9/17
12 49.1 32 65 4/4/? 0/5/27
13 43.4 26 62 1/3/4 6/22/38
14 32.1 22 41 2/5/8 1/1/26
15 41.6 26 56 2/3/7 0/2/19
7
Evaluation and Labeling of Recordings
  • Spearman correlation tests
  • Mean vs. self-assessment
  • Mean vs. crashes
  • Self-assessment vs. crashes

? p-value
M vs. SA 0.61 0.01
M vs. C 0.68 0.005
C vs. SA 0.40 0.13
8
Automatic Stress Recognition
  • Biologically motivated features
  • Representing the rate of change of frequency
  • Representative features
  • Robust against noisy conditions
  • Echo state networks
  • Easy to train using direct pseudo inverse method
  • Using sequential characteristics of features
  • Robust against noisy conditions

9
Utilized Features
  • Motivation
  • Pitch not always easy to extract
  • Statistics of Pitch may not suffice
  • Preliminary experiments show worse performance
  • Goal representative features, that do not need
    to be aggregated over time
  • Modulation spectrum based features
  • Representing the rate of change of frequency
  • Extracted at 25 Hz

10
Modulation Spectrum Features
  • Rate of change of frequency
  • Standard procedures FFT and Mel filtering
  • Most prominent energies are observed between 2
    and 16 Hz

11
Waveform
Spectrogram
Modulation Spectrogram
Time
12
Echo State Networks
  • Recurrent artificial neural network
  • Dynamic reservoir represents history ? echo state
    property
  • Wout are the connections that need to be adapted
    using pseudo inverse method

13
Experiments and Results
  • No true label ? mean for each utterance of all
    labelers as target
  • 10 fold cross validation
  • Human labelers vs. ESN
  • ESN outperforms labelers

MSE ME
Labeler 1 0.284 0.421
Labeler 2 0.151 0.281
Labeler 3 0.291 0.422
Labeler 4 0.241 0.384
Labeler 5 0.211 0.365
ESN 0.084 0.235
14
Conclusions
  • Experimental setup to record speech data with
    different levels of stress
  • Large vocabulary dataset is available (with
    additional video material and mouse movement
    data)
  • Method to label the individual stressed
    utterances by humans
  • Automatic stress recognizer based on recurrent
    neural networks
  • ? outperforming human labelers in accuracy

15
Thank you, for your attention!
Write a Comment
User Comments (0)
About PowerShow.com