Dr. O. Dakkak - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Dr. O. Dakkak

Description:

Measures of prosodic features F0, duration and intensity, with their variations (Praat) ... The prosodic model proposed and tested in this work proved to be ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 20
Provided by: MHD2
Category:

less

Transcript and Presenter's Notes

Title: Dr. O. Dakkak


1
Prosodic Feature Introduction and Emotion
Incorporation in an Arabic TTS Presented by Dr.
O. Al Dakkak
  • Dr. O. Dakkak Dr. N. Ghneim HIAST
  • M. Abu-Zleikha S. Al-Moubyed IT fac., Damascus
    U.

2
Outline
  • Arabic TTS
  • Why Prosody generation?
  • Prosody Analysis and Rule Extraction
  • Emotion Inclusion
  • Results
  • Conclusion

3
Arabic Text-to-Speech System
  • Arabic Text-to-Phonemes (ATOPH) Including open
    /E/, /O/ phonemes and emphatic vowels
  • Use of MBROLA Diphone units to synthesize speech
    Till our semi-syllables are ready (Corpus is
    currently being recorded)
  • Prosody Generation and Emotion Inclusion

4
Arabic Text-to-Speech System
  • MBROLA permits to synthesize phonemes. With
    control on duration and F0 contour (a set of
    segments) and we implemented a tool to control
    the Amplitude.
  • Absent phonemes are replaced by the nearest
    present phonemes
  • Possibility to generate and test prosody

5
Why Prosody Generation?
  • Increase intelligibility expressionality.
  • Provides the context in which speech is
    interpreted
  • Signals speaker intentions (special aids)
  • Man-machine communication (airports,..)
  • Doublage

6
Methodology
  • Based on the punctuation marks (,, ., ? and
    !) we classify sentences into continuous
    affirmation, long affirmation, interrogative,
    exclamation respectively.
  • Recording a corpus and Analysis of its sentences
    to produce F0, and intensity curves
  • Statistical study of the curves and Rule
    extraction to generate them automatically.

7
The corpus
  • Use of a pre-recorded corpus, of 12 short
    sentences for each type, 5 speakers (4 m. 1
    f.). Each sentence has 14 phonemes at most.
  • Recording of other 10 sentences of variable
    lengths pronounced by 3 speakers.
  • short 4-20 phonemes,
  • medium 20-40 phonemes
  • long more than 40 phonemes.
  • The curves of F0, intensity were available for
    the pre-recorded corpus and were computed for the
    further set of recording.

8
Rules Extraction
  • Re-definition of the length concept, using fuzzy
    sets

9
Rules Extraction
  • Curve stylization after stochastic analysis, ex

10
Emotion Inclusion
  • Recording a corpus of 5 different emotional
    sentences (joy, anger, sadness, fear surprise)
    with their emotionless versions (20
    sentences/emotion).
  • Measures of prosodic features F0, duration and
    intensity, with their variations (Praat).
  • Extraction of rules to automatically produce
    emotion on synthetic speech.
  • Rules Validation.

11
?????? ??????? ???? ??????????? ????? ???? Is it
my fault to bear it?
Range difference between F0max F0min F0
Averag Mean value
Jitter Irregularities between successive glottal
pulses
Pitch variation of F0
Variability deg. Of it (high, low..) .
Contour slope shape of contour slope (range
variation).
12
Example Anger emotion
  • F0 mean 40-75
  • F0 range 50-100
  • F0 at vowels and semi-vowels 30
  • F0 slope
  • Speech rate
  • Silence rate -
  • Duration of vowels and semi-vowels
  • Intensity mean
  • Intensity monotonous with F0
  • Others F0 variability , F0 jitter

13
Analysis Rule Extraction Anger
emotionless
With emotion
14
Emotion Synthesis Anger
  • F0 mean 30
  • F0 range 30
  • F0 at vowels and semi-vowels 100
  • Speech rate 75-80
  • Duration of vowels and semi-vowels 30
  • Duration of fricatives 20

15
Synthetic examples
  • emotionless with emotion
  • Anger
  • Joy
  • Sadness
  • Fear
  • Surprise
  • who do you think you are?
  • no more clouds in the sky
  • Im so sad today
  • What a scary scene! What a beautiful
    scene!

16
EmoGen
Interface Text Editor
Voice
Input Text
Speech and emotion properties
Mbrola Player interface
Normal text to MBROLA text Converter (NTMTC)
Prosody Generator
Emotion Generator
17
Results
  • Five sentences for each emotion were synthesized
    and listened by 10 people.
  • Each listener gives the perceived emotion for
    each sentence (we dont provide our list of
    emotions)

18
Results
19
Conclusion
  • An automated tool for emotional Arabic synthesis
    has been developed
  • The prosodic model proposed and tested in this
    work proved to be successful. Especially in
    conversational context
  • Further work will follow to include other
    emotions Disgust, Annoyance,
Write a Comment
User Comments (0)
About PowerShow.com