PRESENTED BY OMKAR PUND. VIKRAM RAJIVADE. SHAILESH RASKAR presentation

About This Presentation

Transcript and Presenter's Notes

Title: PRESENTED BY OMKAR PUND. VIKRAM RAJIVADE. SHAILESH RASKAR

1
PRESENTED BY OMKAR PUND. VIKRAM RAJIVADE.
SHAILESH RASKAR

DR.D.Y.PATIL POLYTECHNIC, AMBICOMPUTER
DEPARTMENT
TOPIC
VOICE MORPHING

2
CONTENTS

WHAT IS VOICE MORPHING ?
APPROACHS TO THE PROBLEM.
SPEECH PRODUCTION.
CONVERSION OF VOICE.
TYPES OF VOICE MORPHING.
REFRANCES OR METHODS.
APPLICATION OF VOICE MORPHING.
AVAILABLE SOFTWARE FOR VOICE MORPHING.
SUMMARY.
CONCLUSION.

3
WHAT IS VOICE MORPHING ?

Voice Morphing which is also referred to as voice
transformation and voice conversion is a
technique to modify a source speaker's speech
utterance to sound as if it was spoken by a
target speaker.
There are many applications which may benefit
from this sort of technology. For example, a TTS
system with voice morphing technology integrated
can produce many different voices. In cases where
the speaker identity plays a key role, such as
dubbing movies and TV-shows, the availability of
high quality voice morphing technology will be
very valuable allowing the appropriate voice to
be generated (maybe in different languages)
without the original actors being present.

4
APPROACHS TO THE PROBLEM

Voice conversion will be performed in two phases.
In the first phase, the training, the speech
signals of the source and target speakers will be
analyzed and the voice characteristics will be
extracted by means of a mathematical optimization
technique, very popular in the speech processing
world, the Linear Prediction Coding (LPC)
technique.

5
APPROACHS TO THE PROBLEM

In second phase , the transformed features will
be used in order to synthesis speech that will,
hopefully, resemble that of the target speaker.
Speech synthesis will be performed again by means
of the Linear Prediction Coding.

6
Speech production

The respiratory subsystem is composed of the
lungs, trachea and windpipe, diaphragm and the
chest cavity.
The larynx and pharyngeal cavity or throat
constitutes the laryngeal subsystems.
The articulatory subsystem includes the oral
cavity and the nasal cavity.

7
Speech production

The oral cavity is comprised of the velum, the
tongue, the lips, the jaw and the teeth.
In speech processing technical discussions, the
vocal tract is referred to as the combination of
the larynx, the pharyngeal cavity and the oral
cavity.
The respiratory subsystem behaves like an air
pump, supplying the aerodynamic energy for the
other two subsystems.
In speech processing, the basic aerodynamic
parameters are air volume, flow, pressure and
resistance.

8
Conversion of voice

TECHNICS-
Wavelet Decomposition.
Proposed model.
Wavelet Decomposition -
Wavelets are a class of functions that
possess compact support and form a basis for all
finite energy signals.
They are able to capture the non-stationary
spectral characteristics of a signal by
decomposing it over a set of atoms which are
localized in both time and frequency. The DWT
uses the set of dyadic scales and translates of
the mother wavelet to form an orthonormal basis
for signal analysis.

9
example

The original signal S is split into an
approximation cA1 and a detail cD1.
The approximation is then itself split into an
approximation and a detail and so on.
Decomposing a signal into k levels of
decomposition therefore results in k1 sets of
coefficients at different frequency resolutions,
k levels of detail and 1 level of approximation
coefficients.

10
Conversion of voice

Proposed model
Voice morphing is performed in two steps
training and transformation. The training data
consist of repetitions of the same phonemes
uttered by both source and target speakers.
The source and target training data is divided
into frames of 128 samples and the data is
randomly divided into training and validation
sets.
A 5-level wavelet decomposition is then performed
to the source and target training data.

11
Types of voice morphing

IN THIS SECTION WE KNOW THAT IN WHICH FORM WE
CAN TRANFORM A NORMAL VOICE OR SPEECH.

SOURCE TARGET RESULT1 RESULT2
F TO M SPEECH1 TARGET1 RESULT1 VOICE1
M TO F SPEECH2 TARGET2 RESULT2 VOICE2
F TO F SPEECH3 TARGET3 RESULT3 VOICE3
M TO M SPEECH4 TARGET4 RESULT4 VOICE4
12
Types of voice morphing

The "Source Speech" column indicates the
utterances of the source speaker.
Target Speech" column is the target speaker's
utterances.
The utterances in both these two columns are NOT
included in the training data for the estimation
of the conversion function.
The next two columns for result.
The difference between these two columns is that
the RESULT1" applies the target prosody
extracted from the target utterance, but the
RESULT2" still applies the original prosody of
the source utterances.

13
REFRANCES OR METHODS.

Abe M. , Nakamura S. , Shikano K. and Kuwabara
H. Voice conversion through vector quantization,
Proceedings of the ICASSP, 1988.
Stylianou Y., Cappe O. And Moulines E.
Statistical Methods for Voice Quality
Transformation, Proceedings of Euro speech, 1995.
Arslan L. and Talkin D Voice Conversion by
Codebook Mapping of Line Spectral Frequencies and
Excitation Spectrum, Proceedings of Euro speech ,
1997.

14
APPLICATION OF VOICE MORPHING

ENTERTAINMENT.
IN FILM INDUSTRY.
SECURITY.
IN COMPUTER GAMING

15
AVAILABLE SOFTWARE FOR VOICE MORPHING .

MORPH VOX PRO VOICE CHANGER 2.0.6.
MORPH VOX PRO VOICE CHANGER 4.2.2.
MORPH VOX PROVOICE CHANGER 4.3.8.
TERA VOICE SERVAER 2004.
FLASH VOICE BUTTONS 3.0.
VOICE TWISTER 1.0.4.
VOICE AGAIN 1.5.2.
QUICK VOICE FOR OSX 2.2.0.
QUICK VOICE FOR WINDOWS 2.2.0.

16
SUMMARY.

Voice morphing is the process of changing voice
personality i.e. speech uttered by a source
speaker is modified to sound as if the target
speaker had uttered it.
In this dissertation our attempt of voice
morphing commenced by introducing the basic
properties of speech signals.
Introducing basic techniques of voice morphing.
Concept behind voice morphing.

17
CONCLUSION.

As voice morphing is
a technology with a lot of interesting, useful
and fun applications further research on the
subject with or without the implementation of the
GTM (Generative Topographic Mapping) model is
bound to follow that will lead to the production
of morphed speech of an excellent quality.

18
thankyou

Write a Comment

User Comments (0)

About PowerShow.com

PRESENTED BY OMKAR PUND. VIKRAM RAJIVADE. SHAILESH RASKAR PowerPoint PPT Presentation