Cues%20to%20Emotion:%20Language - PowerPoint PPT Presentation

About This Presentation
Title:

Cues%20to%20Emotion:%20Language

Description:

Adductive tension: interarytenoid muscles adduct the arytenoid muscles ... The vocal folds strongly adducted. Longitudinal tension weak. Moderately high medial ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 20
Provided by: labu316
Category:

less

Transcript and Presenter's Notes

Title: Cues%20to%20Emotion:%20Language


1
Cues to Emotion Language
  • Suzanne Yuen
  • Monday Oct 5, 2009
  • COMS 6998

2
Overview
  • Two-Stream Emotion Recognition for Call Center
    Monitoring
  • Voice Quality and f0 Cues for Affect Expression
    Implications for Synthesis

3
Two Stream Emotion Recognition for Call Center
Monitoring
  • Background To aid supervisors in the evaluation
    of agents at call centers
  • Objective To present a two stream processing
    technique to detect strong emotion
  • Previous Work
  • Fernandez categorized affect into four main
    components intonation, loudness, rhythm, and
    voice quality
  • Yang studied feature selection methods in text
    categorization and suggested that information
    gain should be used
  • Petrushin and Yacoub examined agitation and calm
    states in people-machine interaction

Typical medium-sized call-center receives about
100,000 calls per day
4
Two-Stream Recognition
  • Semantic Stream
  • Performed speech-to-text conversion
  • Text classification algorithms identified phrases
    such as pleasure, thanks, useless,
    disgusting.
  • Acoustic Stream
  • Extracted features based on pitch and energy
  • Trained on 900 calls, 60hrs of speech
  • Vocabulary system of more than 10 000 words
  • TF-IDF scheme Term Frequency Inverse Document
    Frequency

5
Implementation
  • Method
  • Two streams analyzed separately
  • speech utterance/acoustic features
  • spoken text/semantics/speech recognition of
    conversation
  • Confidence levels of two streams combined
  • Examined 3 emotions
  • Neutral
  • Hot-anger
  • Happy
  • Tested two data sets
  • LDC data
  • 20 real-world call-center calls

6
Two Stream - Conclusion
  • Table 2 suggested that two-stream analysis is
    more accurate than acoustic or semantic alone
  • LDC data recognition significantly higher than
    real-world data
  • Neutral emotions had less accuracy
  • Combination of two-stream processing showed
    improvement (20) in identification of happy
    and anger emotions
  • Low acoustic stream accuracy may be attributed to
    length of sentences in real-world data. Normal
    people do not exhibit different emotions
    significantly in long sentences

7
Discussion
  • Gupta analyzed three emotions (happy, neutral,
    hot-anger) Why break it down into these
    categories? Implications? Can this technique be
    applied to a wider range of emotions? For other
    applications?
  • Speech to text may not translate the complete
    conversation. Would further examination greatly
    improve results? What are the pros and cons?
  • Pitch range was from 50-400Hz. Research may not
    be applicable outside this range. Do you think it
    necessary to examine other frequencies?
  • In this paper, TF-IDF (Term Frequency Inverse
    Document Frequency) technique is used to classify
    utterances. Accuracy for acoustics only is about
    55. Previous research suggest that alternative
    techniques may be better. Would implementation
    better results? What are the pros and cons of
    using the TF-IDF technique?

8
Voice Quality and f0 Cues for Affect Expression
Implications for Synthesis
  • Previous work
  • 1995 Mozziconacci suggested that VQ combined
    with f0 combined could create affect
  • 2002 Gobl suggested synthesized stimuli with VQ
    can add affective coloring. Study suggested that
    VQ f0 stimuli is more affective than f0
    only
  • 2003 Gobl tested VQ with large f0 range. Did
    not examine contribution of affect-related f0
    contours
  • Objective To examine affects of VQ and f0 on
    affect expression

9
Voice Quality and f0 Cues for Affect Expression
Implications for Synthesis
  • 3 series of stimuli of Sweden utterance ja
    adjo
  • Stimuli exemplifying VQ
  • Stimuli with modal voice quality with different
    affect-related f0 contours
  • Stimuli combining both
  • Tested parameters exemplifying 5 voice quality
    (VQ)
  • Modal voice
  • Breathy voice
  • Whispery voice
  • Lax-creaky voice
  • Tense voice
  • 15 synthesized stimuli test samples (see Table 1)

10
What is Voice Quality? Phonation Gestures
  • Derived from a variety of laryngeal and
    supralaryngeal features
  • Adductive tension interarytenoid muscles adduct
    the arytenoid muscles
  • Medial compression adductive force on vocal
    processes- adjustment of ligamental glottis
  • Longitudinal pressure tension of vocal folds

11
Tense Voice
  • Very strong tension of vocal folds, very high
    tension in vocal tract

12
Whispery Voice
  • Very low adductive tension
  • Medial compression moderately high
  • Longitudinal tension moderately high
  • Little or no vocal fold vibration
  • Turbulence generated by friction of air in and
    above larynx

13
Creaky Voice
  • Vocal fold vibration at low frequency, irregular
  • Low tension (only ligamental part of glottis
    vibrates)
  • The vocal folds strongly adducted
  • Longitudinal tension weak
  • Moderately high medial compression

14
Breathy Voice
  • Tension low
  • Minimal adductive tension
  • Weak medial compression
  • Medium longitudinal vocal fold tension
  • Vocal folds do not come together completely,
    leading to frication

15
Modal Voice
  • Neutral mode
  • Muscular adjustments moderate
  • Vibration of vocal folds periodic, full closing
    of glottis, no audible friction
  • Frequency of vibration and loudness in low to mid
    range for conversational speech

16
Voice Quality and f0 Cues for Affect Expression
Implications for Synthesis
  • Six sub-tests with 20 native speakers of
    Hiberno-English.
  • Rated on 12 different affective attributes
  • Sad happy
  • Intimate formal
  • Relaxed stressed
  • Bored interested
  • Apologetic indignant
  • Fearless scared
  • Participants asked to mark their response on scale

Intimate
Formal
No affective load
17
Voice Quality and f0 Test Conclusion
  • Categorized results into 4 groups. No simple
    one-to-one mapping between quality and affect
  • Happy was most difficult to synthesis
  • Suggested that, in addition to f0 ,VQ should be
    used to synthesis affectively colored speech. VQ
    appears to be crucial for expressive synthesis

18
Voice Quality and f0 Test Discussion
  • If the scale is on a 1-7, then 3.5 should be
    neutral however, most ratings are less than 2.
    Do the conclusions (see Fig 2) seem strong?
  • In terms of VQ and f0, the groupings in Fig 2
    seem to suggest that certain affects are closely
    related. What are the implications of this? For
    example, are happy and indignant affects closer
    than relaxed or formal? Do you agree?
  • Do you consider an intimate voice more breathy
    or whispery? Does your intuition agree with the
    paper?
  • Yanushevskaya found that the VQ accounts for the
    highest affect ratings overall. How to compare
    range of voice quality with frequency? Do you
    think they are comparable? Is there a different
    way to describe these qualities?

19
Questions?
Write a Comment
User Comments (0)
About PowerShow.com