An efficient architecture for speech synthesis using Unit Selection Synthesis' - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

An efficient architecture for speech synthesis using Unit Selection Synthesis'

Description:

An efficient architecture for speech synthesis using Unit Selection Synthesis. ... Anil Muthineni. Ashish Kumar Agarwal. Atul Singh. Speech Synthesis ... – PowerPoint PPT presentation

Number of Views:125

Avg rating:3.0/5.0

Slides: 21

Provided by: atuls

Category:

more less

Transcript and Presenter's Notes

Title: An efficient architecture for speech synthesis using Unit Selection Synthesis'

1
An efficient architecture for speech synthesis
using Unit Selection Synthesis.

CS422 Project Presentation
Anil Muthineni
Ashish Kumar Agarwal
Atul Singh

2
Speech Synthesis

A very important branch of Natural Language
Processing.
Reads out things for us. E.g. Microsoft Narrator,
Plain Talk, TalkIt,
Essential component of bilingual computers so
that a person in India can talk to a person with
US without the help of any third person.
Widespread usage in Railway announcement system,
IVRS.
Makes use of computer equally easy for blind
people.

3
Steps involved in Speech Synthesis
Normalization
Morphimization
Prosodization
4
Basic Terminology and Techniques for Speech
Synthesis

Text is first of all normalized.
Then converted into phonetic transcriptions.
Then divided into prosodic units.
Then this linguistic representation is sent to
the output unit.
Then the output unit synthesizes the digital
sound signals on the basis of this input signal.
This output signal is fed to the last stage where
it undergoes IQ and IDCT.

5
Challenges

Normal text is full of heteronyms, numbers and
abbreviations.
Entrance ??
St. John St.
1234 1,234 1234
in

6
Major approaches for text to phoneme conversion

Dictionary based
Any token is searched in lexicon. If found the
word is pronounced. Large Database required, very
fast. Non dictionary words not possible.
Syllable based
Token is parsed into syllables using rules for
pronunciation, each syllable is concatenated and
converted into phoneme. Small Database required
but slower. Not all words pronounced correctly.
Put Cut !!

7
Major approaches used for synthesis of sound

Concatenative Synthesis Based on concatenation
of segments of recorded speech. Can sometime
produce audible glitches. Implement soft
thresholding as a solution of this problem.
Formant Synthesis Based on the synthesis of
speech using frequency, amplitude and other
characteristics of sound waveform.

8
Requirement Analysis

Because of the complexity of the algorithms
involved, the processing cannot be done in real
time on concurrent processors.
A application specific instruction set processor
provides the desired flexibility and
enhancebility.
Scalability with minimal changes is required.
Naturalness and intelligibility are two essential
components too.

9
Requirement Analysis Contd.

Need to minimize the glitches.
For this we need to do things well in advance.
Advance means advance on a nano-scale.
For this we need to make the computing fast for
that data which we get frequently.
So this gives us the idea that the frequently
used data needs to be stored in a cache time
access memory.

10
Requirement Analysis Contd.

Also need to have well balanced pipelines.
Because the time taken for text to morpheme
conversion is almost half of the time taken in
morpheme to sound conversion(Source Various
Research Papers on Speech Synthesis)
Basically due to bandwidth considerations.
Increase the bandwidth or have two pipelines work
to hide the latency.

11
Concatenative Synthesis

Unit Speech Synthesis Use of large recorded
speech database created by segmentation of
recorded utterance into syllables, phones,
morphemes, words, phrases and sentences.
Diphone Synthesis Uses minimal speech database
containing only diphones.

12
Implementation

Unique Approach with novel ideas.
Frequent things have been designed to be
processed fast while things that occur rather
infrequently, like symbols and numbers have been
left out in the cold.
Design is simplistic with least possible
complexity.
Power factor has been kept in mind with real time
processing being the primary motive.

13
Major features of the proposed architecture

Two parallel pipelines, each with specialized
tasks.
Cache memory doesnt require to have a data cache
because the data that we are dealing with is
basically a stream in nature without any
significant spatial locality.
Besides the conventional cache, we will need one
local storage which will have access time equal
to that of cache that will be split into three
parts
First part will store the recently processed
words in accordance with the principal of
temporal and spatial locality
Second part will store the database for all the
frequently occurring morphemes
Third will be a lookup table based built up on
the basis of hashing algorithms which map a
morpheme to its linguistic transcription. This
minimizes the hard disk access.

14
Pipelines

One of the pipelines will convert the text into
phonetic transcriptions.
This will require lot of partitioning algorithms
basically depending on loops and this code cannot
be vectorized and hence this pipeline will have
integer ALU with scalar registers.
Meanwhile the other pipeline unit is processing
the morpheme into linguistic unit.
This pipeline has an SIMD FPU because of intense
number crunching requirements.
Care must be taken that the pipelines are
interlocked.

15
Block diagram of proposed architecture
16
Cost and Performance Analysis

As we have just shown, the processor has a very
simple design.
Done to minimize the hardware complexity
Has two pronged benefits in terms of cost
Less actual hardware resources
Less control logic meaning less power
expenditure.

17
Contd

Performance as such is bound to be good because
the design is very straightforward.
No quantitative result thus far because the idea
is in its stage of infancy and has not been
marketed so far.
Yet intuitively, it is possible to conclude that
the performance is going to be at least at par
with that of the current commodity processors
while doing general mathematic and scientific
computations.

18
Instruction Set Architecture

Instruction set is small yet powerful with a
variety of instructions.
There are a couple of special purpose registers
in addition to general purpose registers
A status register has been kept to store
information about pitch, intensity and
speed/tempo.

19
(No Transcript)
20
Q A

Write a Comment

User Comments (0)