PF-STAR: emotional speech synthesis - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

PF-STAR: emotional speech synthesis

Description:

Shimmer, Jitter, HNR, Hammarberg's index, Spectral flatness, Spectral energy ... Low Level: acoustic description (e.g., spectral tilt , shimmer , jitter , etc. ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 9
Provided by: piero9
Category:

less

Transcript and Presenter's Notes

Title: PF-STAR: emotional speech synthesis


1
PF-STAR emotional speech synthesis
Istituto di Scienze e Tecnologie della
Cognizione, Sezione di Padova Fonetica e
Dialettologia, CNR
2
Analysis of emotive speech audio
disgust (D) surprise (SU) neutral (N)
anger (A) joy (J) fear (F) sadness (SA)
Recordings /aba/, /ava/, /mamma/
  • Cues extraction and analysis
  • Intensity, duration, pitch, pitch range,
    formants.
  • F0 stressed vowel mean and
  • F0mid values are strongly correlated.
  • Shimmer, Jitter, HNR, Hammarbergs index,
    Spectral flatness, Spectral energy distributions
    voice quality correlates.

F0mean (global and for stressed vowel), F0mid,
and F0range for /aba/
3
Analysis of emotive speech voice quality
Voice quality characterization Anger harsh
voice (/a/) Disgust creaky voice (/a/)
Joy, Fear, Surprise breathy voice
Discriminant analysis classification
scores 60/70 for stressed and
unstressed vowel Best score Fear, Anger
Worst score Surprise
VOQUAL 2003 paper Emotions and Voice Quality
Experiments with Sinusoidal Modeling
4
Processing of emotive speech
Neutral Emotive transformation based on
sinusoidal modeling
Target Disgust
Disgust
Neutral
Disgust (PsTs)
Sadness (PsTs)
Sadness
Target Sadness
  • Results
  • Time-stretch and (formant preserving) pitch
    shift alone cant account for the principal
    emotion related cues
  • Spectral conversion can account for some of the
    emotion cues

5
Processing of emotive speech
Neutral Emotive transformation based on
sinusoidal modeling
Neutral
PsTs
PsTsSc
Target
anger disgust joy fear surprise sadness
6
SI voice processing for TTS systems
Processing of emotive speech results
Emotive synthesis based on FESTIVAL MBROLA (Male
Voice)
Neutral
PsTs
PsTsVQtr
Target
Anger Disgust Joy Fear Surprise Sadness
7
ETTS Audio Examples
8
Mark-Up Languages for E-TTS
  • Hierarchic description of emotive voice

High Level emotive tag (e.g., ltangergt, ltjoygt,
ltfeargt, etc.) Medium Level phonetic voice
description (e.g., ltmodalgt, ltsoftgt, ltpressedgt,
etc.) Low Level acoustic description (e.g.,
ltspectral tiltgt, ltshimmer gt, ltjittergt, etc.)
Definition of speaker-independent rules to
control voice quality within a text-to-speech
synthesizer.
Write a Comment
User Comments (0)
About PowerShow.com