Title: Robust Recognition of Emotion from Speech
1Robust Recognition of Emotion from Speech
- Mohammed E. Hoque
- Mohammed Yeasin
- Max M. Louwerse
- mhoque, myeasin, mlouwerse_at_memphis.edu
- Institute for Intelligent Systems
- University of Memphis
2Presentation Overview
- Motivation
- Methods
- Database
- Results
- Conclusion
3Motivations
- Animated agents to recognize emotion in
e-Learning environment. - Agents need to be sensitive and adaptive to
learners emotion.
4Methods
- Our method is partially motivated by the work of
Lee and Naranyan 1, who first introduced the
notion of salient words.
5Shortcomings of Lee and Narayans work
Lee et al. argued that there is one-to-one
correspondence between a word and a positive or
negative emotion. This is NOT true for every
case.
6Examples
Confusion
Flow
Normal
Delight
Figure 1 Pictorial depiction of the word okay
uttered with different intonations to express
different emotions.
7More examples..
Scar!! Scar??
8More examples
Two months!!
Two months??
9Our Hypothesis
- Lexical information extracted from combined
prosodic and acoustic features that correspond to
intonation pattern of salient words will yield
robust recognition of emotion from speech. - It also provides a framework for signal level
analysis of speech for emotion.
10Creation of Database
11Details on the Database
- 15 utterances were selected for four emotion
categories confusion/uncertain, delight, flow
(confident, encouragement), and frustration 2. - Utterances were stand-alone ambiguous expressions
in conversations, dependent on the context. - Examples are Great, Yes, Yeah, No, Ok,
Good, Right, Really, What, God.
12Details on the Database
- Three graduate students listened to the audio
clips. - They successfully distinguished between the
positive and negative emotions 65 of the time. - No specific instructions were given as to what
intonation patterns to listen to.
13High Level Diagram
Positive
Feature Extraction
Word Level Utterances
Classifiers
Data Projection
Negative
Figure 2. The high level description of the
overall emotion recognition process.
14Hierarchical Classifiers
Figure 3. The design of the hierarchical binary
classifiers.
15Emotion Models using Lexical Information
- Pitch Minimum, maximum, mean, standard
deviation, absolute value, quantile, ratio
between voiced and unvoiced frames. - Duration etime eheight
- Intensity Minimum, maximum, mean, standard
deviation, quantile. - Formant First formant, second formant, third
formant, fourth formant, fifth formant, second
formant / first formant, third formant / first
formant - Rhythm Speaking rate.
16Duration Features
Figure 4. Measures of F0 for computing parameters
(etime, eheight) which corresponds to rising and
lowering of intonation.
Inclusion of height and time accounts for
possible low or high pitch accents.
17Types of Classifiers
18Shortcomings of Lee and Narayans work. (2004)
19Results
20Summary of Results
2121 CLASSIFIERS ON POSITIVE AND NEGATIVE EMOTIONS.
22Limitations and Future work
- Algorithm
- Feature Selection
- Discourse Information
- Future efforts will include fusion of video and
audio data in a signal level framework. - Database
- Clipping arbitrary words from a conversation may
be ineffective at various cases. - May need to look words in a sequence.
23More examples..
24- M. E. Hoque, M. Yeasin, M. M. Louwerse. Robust
Recognition of Emotion from Speech, 6th
International Conference on Intelligent Virtual
Agents, Marina Del Rey, CA, August 2006. - M. E. Hoque, M. Yeasin, M. M. Louwerse. Robust
Recognition of Emotion in e-Learning Environment.
18th Annual Student Research Forum, Memphis, TN
April, 2006. 2nd Best Poster Award
25Acknowledgments
- This research was partially supported by grant
NSF-IIS-0416128 awarded to the third author. Any
opinions, findings, and conclusions or
recommendations expressed in this material are
those of the authors and do not necessarily
reflect the views of the funding institution.
26Questions?
27Robust Recognition of Emotion from Speech
- Mohammed E. Hoque
- Mohammed Yeasin
- Max Louwerse
- mhoque, myeasin, mlouwerse_at_memphis.edu
- Institute for Intelligent Systems
- University of Memphis
28References
- C. Lee and S. Narayanan, "Toward detecting
emotions in spoken dialogs," IEEE transaction on
speech and audio processing, vol.13, 2005. - B. Kort, R. Reilly, and R. W. Picard, "An
Affective Model of Interplay Between Emotions and
Learning Reengineering Educational
Pedagogy-Building a Learning Companion.,"
presented at In Proceedings of International
Conference on Advanced Learning Technologies
(ICALT 2001), Madison, Wisconsin, August 2001.