A Speech Coding Method using an Anthropomorphic and Acoustic Approach PowerPoint PPT Presentation

presentation player overlay
1 / 14
About This Presentation
Transcript and Presenter's Notes

Title: A Speech Coding Method using an Anthropomorphic and Acoustic Approach


1
A Speech Coding Method using an Anthropomorphic
and Acoustic Approach
Nguyen Viet Son Eric Castelli Speech Processing
Group
Oriental-COCOSDA 2007, Hanoi
2
Introduction
  • Most automatic speech synthesis systems are
    based on signal processing techniques such as
    unit concatenation
  • The quality is rather good.
  • The greatest disadvantage difficult to change
    the voice.
  • Another approach modeling the mechanical and
    aero-dynamical behavior of the vocal cords and
    modeling the wave propagation of the signal waves
  • Very high quality and possible to change the
    voice but difficult to control it
  • In order to reduce this difficulty a new
    approach consisting in coupling the analog model
    to the DRM (Distinctive Region Model) model
  • For telecommunications and data transmission, our
    anthropomorphic approach can be an interesting
    way to reduce transmission rates while keeping a
    good quality for speech.

2
3
Content
  • I. Anthropomorphic synthesis.
  • II. Validation.
  • III. Use for coding.
  • IV. Conclusion.

3
4
Acoustic model
  • The vocal cords
  • Assume to be symmetric.
  • Vibrate and act as an oscillating valve.
  • The air is interrupted into a series of pulses as
    serves as the excitation source for the vocal
    tract.
  • The vocal tract
  • Dividing into elementary tubes.
  • The relationship of the flow volume at each
    junction is following

Schematic diagram of the vocal cord and vocal
tract system. (J.L. Flanagan and al. 1975)
4
5
Distinctive region model (DRM)
  • The DRM is structured in regions
  • by the zero-crossings of the sensitivity function
    computed on the uniform closed-open tube.
  • The sensitivity function specifies how each
    formant responds to a local perturbation of the
    area along a tube.
  • In speech synthesis, the vocal tract is
    considered as a closed-open tube, or a
    closed-closed tube
  • Initial area function 4 cm2, and can vary from
    0.5 to 16 cm2.
  • The total length 18 cm
  • Deforming the shape can change (increase/decrease)
    efficiently (small deformation lead to large
    formant variations) F1, F2, F3 or both of F1
    and/or F2 and/or F3, etc.
  • At each step, the sensitivity function is
    recomputed and a new deformation is performed
    base on it.

5
6
Distinctive region model (DRM)
Distinctive regions
R4
R3
R2
R1
  • The DRM is
  • a 2-region model when only F1 is controlled,
  • a 4-region model when (F1,F2) are controlled,
  • a 8-region model when (F1, F2, F3) are
    controlled.
  • The limits of those 8 regions correspond to the
    zero-crossing of the sensitivity function.

6
7
Distinctive region model (DRM)
  • The places of articulation are fixed
  • R8 corresponding to the lips.
  • R7 to the teeth.
  • R3, R4, R5, R6 to the tongue.
  • R1 to the larynx.
  • For vowel production, the DRM uses (F1, F2) with
  • The front constriction (R5, R6) and back
    constriction (R3, R4) for the closed-open tube.
  • The central constriction (R4, R5) for
    closed-closed tube.
  • For consonant production, the DRM uses (F1, F2,
    F3) with
  • R8 labial
  • R6 coronal.
  • R5 velar.

7
8
DRM model commands the anthropomorphic synthesizer
Vocal tract
Lungs
Cord vocal
  • Our new approach consists in replacing the
    articulatory model by the DRM
  • (R1-R8) are use the determination of the vocal
    tract shape.
  • (Q-Ag0) to control the fundamental frequency
    (F0).
  • Ps for the energy.

8
9
Validation (1)
  • Produce vowels /a/, /i/, /u/ and their
    transitions
  • Correct in the cases of the transition /a-u/ and
    /a-i/.
  • Not really the same in the transition /i-u/ but
    their directions are more or less close.
  • Produce some simple sentences with some
    consonants in Vietnamese language.
  • Capable to synthesize the vowels and some
    consonants

9
10
Validation (2)
11
Use for coding
  • In order to control the speech system, we need to
    use 12 parameters, but from the point of view of
    coding (more efficient in the control of the
    synthesizer/coder)
  • Consider R1-R2 change very small ? keep their
    geometry constant.
  • Only use 10 parameters Ps, (Q, Ag0), (R3-R8) and
    velum position parameter
  • Vocalic and consonant gestures are modeled as
    linear or logarithmic geometry changes
  • For coding application
  • Not need to refresh command parameters at every
    sample.
  • Using simple interpolations to determine
    intermediary values.
  • A small transmission rate
  • 10 parameters x 8bits x 40Hz 3.2Kbits

11
12
Conclusion
  • A new approach to command an anthropomorphic
    speech synthesizer with several advantages
  • Allow to better understand phenomena involved in
    the speech production.
  • The coupling between acoustic model and the DRM
    is simple.
  • Need only 10 - 12 parameters to control the whole
    system.
  • Possible to change the quality of the voice and
    produce different types of voices.
  • For coding, need only a 3.2kbits transmission
    rate to be controlled.
  • Possible to realize an embedded version of the
    coder on DSP processor.

12
13
References
  • C. C. Wai. 2003. Speech coding algorithms.
    Foundation and Evolution of Standardized Coders.
    Ed. Wiley Sons.
  • E. Castelli. 1999. Modélisation anthropomorphique
    de la parole. Thèse HDR, INP Grenoble, France.
  • J. L. Flanagan, K. Ishizaka, and K. L. Shipley.
    1975. Synthesis of speech from a dynamic model of
    the vocal cord and vocal tract. The Bell System
    Technical Journal, Vol.54, No.3, pages 485-506.
  • K. Ishizaka, and J. L. Flanagan. 1972. Synthesis
    of voiced sounds from a two-mass model of the
    vocal cords. The Bell System Technical Journal
    Vol.51, No.6, pages 1233-1268.
  • M. Mrayati, R. Carré and B. Guerin. 1988.
    Distinctive regions and modes A new theory of
    speech production. Speech Communication 7, pages
    257-286.
  • R. Carré. 1996. Prediction of vowel systems using
    a deductive approach. Proceeding of Int. Cong.on
    Speech and Language Processing, Philadelphia,
    pages 434-437.
  • R. Carré and Maria Mody. 1997. Prediction of
    vowel and consonant place of articulation.
    Proceeding of the Third Meeting of the ACL
    Special Interest Group in Computational
    Phonology, SIGPHON 97, Madrid.
  • R. Carré. 2003. From an acoustic tube to speech
    production. Speech Communication 42, pages
    227-240.
  • S. Maeda. 1982. A digital simulation method of
    the vocal tract system. Speech Communication 1,
    pages 199-299.

13
14
  • Thanks for your attention !!!

14
Write a Comment
User Comments (0)
About PowerShow.com