Investigating the Role of Glottal Features in Classifying Clinical Depression - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Investigating the Role of Glottal Features in Classifying Clinical Depression

Description:

1School of Electrical and Computer Engineering, Georgia Tech, Atlanta, GA ... 911, Forensic analysis, psychological evaluation, customer call centers ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 29
Provided by: elliotm
Category:

less

Transcript and Presenter's Notes

Title: Investigating the Role of Glottal Features in Classifying Clinical Depression


1
Investigating the Role of Glottal Features
inClassifying Clinical Depression
Elliot Moore II1, Mark Clements1, John Peifer2
and Lydia Weisser3
  • 1School of Electrical and Computer Engineering,
    Georgia Tech, Atlanta, GA
  • 2Interactive Media Technology Center, Georgia
    Tech, Atlanta, GA
  • 3Department of Psychiatry and Behavioral Health,
    Medical College of Georgia, Augusta, GA

2
Overview
  • Research Synopsis
  • Glottal Wave and Speech Production
  • Feature Extraction
  • Analysis Method
  • Classification Procedure
  • Results / Conclusions

3
Research Synopsis
  • Part of overall research effort in stress/emotion
    in speech
  • Goals
  • Investigate extracted features of glottal
    dynamics
  • Preliminary separation of control/depressed group
  • Potential Areas of Application
  • Clinical evaluation/monitoring tools
  • Monitor improvement/digression
  • Evaluate potential warnings of depression
  • Telemedicine treatment
  • Additional data for remote clinical sessions
  • General Stress analysis
  • 911, Forensic analysis, psychological evaluation,
    customer call centers

4
Glottal Wave?
  • Three broad categories of speech analysis
  • Prosodics Relative pitch, intensity, rhythm, and
    rate of speech
  • Vocal Tract Related to shape and resonances of
    the mouth and other vocal articulators
  • Glottal Waveform
  • Glottal Waveform
  • Related to volume velocity air profile through
    vocal folds
  • Shape affected by vocal effort, vocal cord
    tension, etc.
  • High correlation with speaker states
  • Speaker characterization
  • Dialect Information
  • Speaker stress styles

5
Speech Production Face Model
6
Speech Production
  • Source Filter theory
  • S(z) R(z)G(z)V(z)
  • R(z) Lip radiation
  • G(z) Glottal volume velocity transformation
  • V(z) Upper Vocal tract (Formant frequencies)
  • Inverse filtering
  • G(z) S(z) / ( R(z)V(z) )
  • R(z) modeled as single pole (I.e., R(z) 1
    z0z-1) (z0 ? 1)
  • Estimate V(z) from acoustic waveform (all-pole
    filter)
  • P Prediction Order
  • ai ith Linear Prediction Coefficient

7
Ideal Glottal Flow
8
Glottal Flow Example
Acoustic Signal
Glottal Flow
9
Problem Areas in Emotion/stress Research
  • Data acquisition
  • Difficult to generate authentic response
  • Difficult to assess level of stress/emotion
    induced in subject
  • Stereotypical responses and exaggeration in actor
    responses
  • Lack of definite labels and definitions (i.e.,
    anger as hot anger or cold anger
  • No model
  • Wide variety of human responses make a working
    model extremely difficult
  • Lack of consistent deviations in traditional
    prosodic measures across speakers

10
Research Overview
  • Depression in speech
  • One of the more common emotional disorder
  • Can be a precursor to suicide
  • Primary Research Efforts in Prosodics
  • Objectives
  • Investigation of speech features related to
    depression and vocal affect
  • Determine a robust method of vocal affect
    analysis
  • Determine potential significance of glottal
    features

11
Database Collection
  • Experimental Group - diagnosed with some form of
    depressive illness
  • 9 females, 6 males
  • From outpatient clinic (Psychiatry and Behavioral
    Health Department, MCG)
  • Control Group no prior diagnosis of any
    depressive illness
  • 9 females, 9 males
  • From staff, students, etc. at MCG
  • 3-5 minutes of total speech per subject
  • Short story broken into 65 sentences

12
Glottal Feature Extraction
  • Timing
  • Timing Ratios
  • Amplitude Shimmer
  • Spectrum

13
Feature Extraction - Timing
  • Timing
  • Open Phase (O)
  • Opening Phase (OP) onset of airflow through
    vocal folds
  • Closing Phase (CP) termination of airflow
    through vocal folds
  • Closed Phase
  • Minimal or no airflow through vocal folds

14
Feature Extraction Timing Ratios
  • Timing Ratios
  • rCPOP - Ratio of closing phase to opening phase
  • rOTC - Ratio of open phase to total cycle (TC)
  • rCTC - Ratio of closed phase to total cycle (TC)
  • rOPO - Ratio of opening phase to open phase
  • rCPO - Ratio of closing phase to open phase

15
Feature Extraction - Amplitude
  • Amplitude Shimmer
  • Peak amplitude variation
  • Measured at peak glottal opening

16
Feature Extraction - Spectrum
  • Spectral tilt and bias estimated by fitting a
    line to the glottal frequency response over a set
    interval.
  • Spectral Tilt Slope of the spectral roll-off of
    the glottal frequency response
  • Spectral Bias Intercept term
  • Two frequency intervals used for estimation
  • Between the peak frequency response and 1000 Hz
  • (gSt1000-spectral tilt gSb1000-spectral bias)
  • Between the peak frequency response and 3700 Hz
  • (gSt3700-spectral tilt gSb3700-spectral bias).

17
Feature Extraction Spectrum Example
18
Analysis Method
  • Assumption Individual sentences carry contextual
    affective content
  • Statistical analysis for affect is broken into
    two categories
  • Intra-sentence statistics (within a sentence)
  • Related to momentary affective expression as
    fitting to the context of the story (i.e.,
    character expression, narrator expression, etc.)
  • Inter-sentence statistics (between sentences)
  • Attempts to capture the overall affect of the
    speaker by monitoring the variation of affective
    expression across the passage
  • Quantifies intra-sentence statistics

19
Sentence Analysis Setup
S1 , SK , SK1, S2K, S65-K1 S65
20
Analysis Setup
  • Within Each Sentence (Intra-)
  • 15 extracted glottal features (timing, timing
    ratios, etc.)
  • 8 intra-sentence statistics
  • 15 8 120 feature statistics (vector)
  • Create observations
  • Group 1 (G1) 13 observations of 5 sentences
  • Group 2 (G2) 5 observations of 13 sentences
  • Between Grouped Sentences (Inter-)
  • 120 intra-sentence statistics per sentence
  • 8 inter-sentence statistics
  • 120 8 960 statistical measures per speaker
  • One-Way Analysis of Variance (ANOVA)

21
Classification Setup
  • Separation accuracy
  • Evaluate individual inter-sentence feature
    statistics
  • Subject-based accuracy (I.e., use of entire set
    of observations)
  • Gaussian Mixtures
  • 2 Models ?c for controls ?p for patients
  • N observations (based on grouping, G1, G2)
  • Choose class with maximum log-sum probability
  • Leave-One Out testing and evaluation

22
Classification Accuracy (Males, G1)
23
Classification Accuracy (Males, G2)
24
Classification Accuracy (Females, G1)
25
Classification Accuracy (Females, G2)
26
Results (Males)
  • Males
  • Single Feature Accuracy (87)
  • 2 misclassified subjects (1 patient, 1 control)
  • Best feature pair Accuracy (grouping G2, 93)
  • Closing Phase - Intra DRNG Inter Max
  • Closing Phase - Intra IQR Inter DRng
  • 1 misclassified control
  • Best overall single feature statistics
  • Glottal Spectrum
  • Glottal Ratios
  • G1 vs. G2
  • Observational length affected feature selection
    slightly
  • Overall accuracy similar for both

27
Results (Females)
  • Females
  • Single Feature Accuracy
  • 100 separation for (gSt3700 Intra MAX Inter
    Std) in G1
  • Feature Pair
  • No improvement in overall accuracy for G2
  • Best overall single feature statistics
  • Glottal Timing
  • Glottal Spectrum
  • G1 vs. G2
  • Best overall separation in G1 (I.e., shorter
    observational length, more observations)

28
Discussion
  • Conclusions
  • Relatively high separation accuracy males/females
  • Exclusive use of glottal features
  • Potentially more robust than prosodic features
  • Further Work
  • Subject accuracy vs. Observational Accuracy
  • Subject Accuracy Classifying subject utilizing
    all observations (3-5 minutes of speech) for
    majority vote decision
  • Observational Accuracy Accuracy classifying
    single observations at random
  • High Observational Accuracy ultimate goal
  • Finding optimal set of class separators
  • Understanding perceptual link
Write a Comment
User Comments (0)
About PowerShow.com