Title: Emotional Speech Recognition w Gender Determination
1Emotional Speech Recognitionw/ Gender
Determination
- Kisang Pak
- E6820 Speech Audio Processing Recognition
- Professor Dan Ellis
- Columbia University
2System Overview
Project Overview
- Emotions Neutral, Sadness, Hot Anger, Happy,
(Contempt) - Language English
- Features to be used See below
- Classification See below
- Ranking the probabilities of each emotion
Feature Extraction
- Fundamental Frequency
- Jitters in Speech Energy
- Rise Duration in Speech
- Energy
- Rise/Falling Ration in
- Speech Energy
- Formant Frequency
- Pitch Contour under
- 500 Hz
Classification
Result
Input Signal
- Neutral
- Sadness
- Anger
- Happy
- (Contempt)
- Bayes/Neutral Networks
- For emotional recognition
- Pitch Track for Gender
- Determination
3Emotional Speech Samples
- Neutral
- Sadness
- Anger
- Happy
- (Contempt)
neutral
neutral, sounds like sadness
anger
anger, sounds like contempt
sadness
sadness, sounds like neutral
happy
happy, sounds like anger
4Feature Extractions 1 Fundamental Frequency
- Frequency domain analysis (adequate for highly
repetitive signals) - Time Domain Analysis (short-term
autocorrelation) 50500 Hz
Cross correlate the signal using delays from 50
Hz to 500 Hz
The delay that produces the highest amplitude can
be converted to fundamental frequency
Fundamental Frequency Median
5Feature Extractions 2 Local peaks in Speech
Energy
Feature Extractions 2, 3, and 4 Speech Energy
, fs(nm)s(n)w(m-n) s(n) speech signal,
w(m-n) window (i.e. hamming) of length Nw
Matlab windlength 301 windtype
hamming(winLen)
Feature Extractions 3 Rise Duration in Speech
Energy
Feature Extractions 4 Rise and Falling Ratio
neutral
anger
of Peaks (Male)
6Feature Extractions 5 1st Formant (via LPC
filter)
1. Simplify and minimize the signals and energy
using Linear Prediction. 2. Plot frequency
response 3. Find roots
happy
1st Formant-Male
sadness
gg_001_happy_1648.62_April-thirteenth.wav
7Classifier (BayesNeural Networks) Emotion
Recognition
Feature 1
Feature 2
Feature 3
Feature 4
Feature 5
Feature 6
w1p1n
w2p2n
w3p3n
w5p5n
w4p4n
w6p6n
NORMALIZATION
S1
N
N
N
N
N
N
w1p1s
w2p2s
w3p3s
w5p5s
w4p4s
w6p6s
S2
S
S
S
S
S
S
Gender Separ-ation
w1p1a
w2p2a
w3p3a
w5p5a
w4p4a
w6p6a
S3
A
A
A
A
A
A
w1p1c
w2p2c
w3p3c
w5p5c
w4p4c
w6p6c
S4
C
C
C
C
C
C
w1p1h
w2p2h
w3p3h
w5p5h
w4p4h
w6p6h
S5
H
H
H
H
H
H
N Neutral, S Sad, A Anger, C Contempt, H
Happy
8Normal Distribution Fitting
The samples did not really follow the Gaussian
curves
- Example) Fundamental Frequency Distribution
fo185 Hz
neutral
anger
p1
w1p1a s1
w1p1n 0
9Weights Factors
Example Happy Speech
10Results Gender Separation
Results (male) Emotional Speech Recognition
(65 trained, 100 tested)
weights5 4 1 3 7 2
weights3 1 4 6 2 9
Actual
Actual
11Probability Ranking (Example)
Actual
Example) Actual-gtContempt
Contempt got the 2nd place 38.5 of time
12Use of different weights
List of emotions
List of emotions to be recognized
Select the best weight constants
Best Results
If Happy is included weights5 4 1 3 7 2
If Contempt is included weights3 1 4 6 2 9
13Classifier Gender Separation
General Rule Fundamental Frequency lt 140 Hz Male
In emotional speech analysis, general rules do
not apply due to their wide variances.
One method Pitch Tracking Between 250 Hz and 500
Hz
14Future Works
Immediate Improvement (by the report due
date) Gender Determination Target 80
- Long-Term Improvement
- Remodeling Gaussian density distribution
- More efficient and faster processing
- Emotional speech resynthesize
- Determination of Weight Factors