Automatic Pruning of Unit Selection Speech Databases for Synthesis without loss of Naturalness - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Automatic Pruning of Unit Selection Speech Databases for Synthesis without loss of Naturalness

Description:

Global Prosodic Mismatch Function (GPMF) But ... Using Weighted Global Prosodic Mismatch Function (GPMF) Undesirability. Repulsion ... – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 25

Provided by: rohit58

Category:

more less

Transcript and Presenter's Notes

Title: Automatic Pruning of Unit Selection Speech Databases for Synthesis without loss of Naturalness

1
Automatic Pruning of Unit Selection
SpeechDatabases for Synthesiswithout loss of
Naturalness

Rohit Kumar , S. P. Kishore
International Institute of Information
Technology
Hyderabad, INDIA
Carnegie Mellon University, Pittsburg

ICSLP INTERSPEECH 2004 Jeju Island, Korea
2
Organization of the Talk

Introduction to Unit Selection based T.T.S.
Need / Scope for Pruning of Speech Databases
Our Aims at this Work
Low Memory Device Synthesizer
Definitions of Neutral and Optimal Units
Implementation and Results
How do we prune ??
Ranking of Units
Heuristic for Creating Database of Any Size
Optimal Sized Database Perceptual Evaluation
Conclusions

3
Unit Selection based T.T.S.
Linguistic Information
Selected Units
Unit Selection Algorithms
Text / Linguistic Processing
Signal Processing
Text
Speech
Sequence of Basic Units
Basic Unit Inventory
Inventory Building Modules
UnitAA..AB..B
Instance12..n1..m
Featuresx(A1), y(A1) ..x(A2), y(A2)
....x(An), y(An)..x(B1), y(B1) ....x(Bm),
y(Bm) ..
Speech Corpus
Signal
Transcriptions
Labels
Features
4
Unit Pruning

Typical size of High quality Unit Selection
Databases is large (100 MB to 500 MB)
Using Unit Pruning Techniques, the database size
can be significantly reduced without loss in
quality
Unit Pruning refers to removal of units instances
from the Unit Selection Database that do not add
(or may even be harmful) to the quality of
synthesized speech.

5
Need / Scope for Pruning
1. To Improve Quality 2. To Reduce Size

ltlt Deviant Units gtgt
Units having features too deviant from usual
values of such features
These units rarely get selected due to the high
costs of selection
Removal of such units from database improves the
quality of synthesized output

ltlt Redundant Units gtgt
Units having very similar feature sets
Do not contribute significantly to the diversity
of units in the database
Removal of such units from database does no harm
to synthesis quality and helps in reducing
database size

6
Our Aims at this Work

Multiple Aims
To come up with a Low Memory Device Synthesizer
(for PDAs, Mobiles)
To be able to Create a Database of Any Size with
a Corresponding Quality
To corner upon an Optimal Size of the Database
without any loss of Quality

7
Low Memory Device Synthesizer

Requirement is to come up with a Speech
Synthesizer that would fit into a Low memory
device like a PDA
The database size of a normal Unit Selection
based TTS would be prohibitively large to fit the
system into a Small Device
So we trade off Quality ?? Size
But keep all possible basic units
So instead of multiple Instance of each basic
Unit, we keep only one Instance of each basic unit

8
Low Memory Device Synthesizer

Requirement is to come up with a Speech
Synthesizer that would fit into a Low memory
device like a PDA
The database size of a normal Unit Selection
based TTS would be prohibitively large to fit the
system into a Small Device
So we trade off Quality ?? Size
But keep all possible basic units
So instead of multiple Instance of each basic
Unit, we keep only one Instance of each basic unit

QUESTION How to choose one most suitable
instance from the various Instances of
each unit ??
9
QUESTION How to choose one most suitable
instance from the various Instances of
each unit ??
NEUTRAL UNITS
HypothesisBest Instance is the one that is
prosodically neutral and will have minimal
contextual effects. Neutral Units will join best
with Neutral Units Definition Neutral (Average)
Unit is the unit instance that has features
closest to Average of features.

Average Pitch Average Duration Average
Energy
So ltPNeutral, DNeutral, ENeutral gt is closest to
ltPIdeal, DIdeal, EIdeal gt
10
QUESTION How to choose one most suitable
instance from the various Instances of
each unit ?? Contd
Optimal Units
Alternative HypothesisBest instance is the one
that joins most suitably in all contexts that it
is likely to appear in. Definition Unit
Instance that joins most suitably with all the
units that appear in the context of the instances
of the unit under consideration

11
Optimal Units
Let A1, A2, . Ai, ., An be the instances of a
basic unit A Let Ai-1 and Ai1 be units preceding
and succeeding the instance Ai in the corpus
Global Prosodic Mismatch Function (GPMF)
But PExpectedAi-1 PAi DExpectedAi-1
DAi EExpectedAi-1 EAi
By definition, Optimal Instance of the Unit is
one that minimizes GPMF
12
Low Memory Device Synthesizer
Implementation
Linguistic Information
Signal Processing
Text / Linguistic Processing
Text
Speech
Sequenceof Basic Units
Basic Unit Inventory
UnitABC....PQ
Instancexyz....wm
UnitAA..AB..B
Instance12..n1..m
Featuresx(A1), y(A1) ..x(A2), y(A2)
....x(An), y(An)..x(B1), y(B1) ....x(Bm),
y(Bm) ..
Neutral / Optimal
Unit Selection has now moved from Synthesis Time
to Inventory Building Time
13
Low Memory Device Synthesizer
Results
Perceptual Tests on 8 subjects scoring 10
sentences on a scale of 0(worst) to 5(best)
Database F Optimal Units is a best performing
Approach
14
Low Memory Device Synthesizer
Results
A LMDS System implemented on a Handheld Computing
Device Hindi Database consisting of 2786 unique
basic units (syllables, phonemes) collected
using Optimal Unit Approach Actual Database
Size _at_ 16Khz 256kbps 180 MB GSM Coded _at_ 8KHz ?
Database Size 1.27 MB G722 Coded _at_ 16 Khz ?
Database Size 5.02 MB
15
How do we Prune ??

For every unit in the database,
Score each instance of the unit for the (un)
desirability of that particular instance given
all the other instances
Pick the top x of the instances of the unit and
remove all the others
( x ) is the Pruning Control Parameter

16
How do we Prune ??

For every unit in the database,
Score each instance of the unit for the (un)
desirability of that particular instance given
all the other instances
Pick the top x of the instances of the unit and
remove all the others
( x ) is the Pruning Control Parameter

Question How do we rank the units ??
17
Ranking the Unit Occurrences
1. Measure of Undesirability of Instance2. Inter
Instance Repulsion
Using Weighted Global Prosodic Mismatch Function
(GPMF)
Undesirability
Repulsion
SCOREx Ux (WREPLUSION x Rx) Ranking is in
descending order of Score
18
Database of Any Desired Size
Having the instances of Units in Ranked order, we
need a Pruning Control Parameter (x), to decide
what kind of database we want.
Experiment Hindi Speech Corpus (96 Minutes) 2
Kinds of Basic Units Syllables Unique 2391
Total 23096 Phonemes Unique 49 Total
54734 Pruning Control Parameters P ?
Percentage of Phonemes to be Kept S ?
Percentage of Syllables to be Kept WREPULSION ?
Inter Instance Repulsion Weight in Scoring
19
Database of Any Desired Size
We created several pruned database using
different sets of pruning control parameters. A
database specific empirical has been derived to
come up a pruned database of any desired size.
20
Optimal Database Size
To come up with an optimal set of pruning
parameters, so that minimal size of database can
be achieved without degradation in quality
(naturalness, perceptibility) of
speech. Perceptual tests conducted on several
pruned databases created with different Pruning
Control Parameters. 10 Databases with different
Pruning Control Parameter Values 8 Subjects
ranked 5 sentences each on a scale of 0 (worst)
to 5 (best)
21
Contd..
22
Wrepulsion 2.0
Contd..
Wrepulsion 0.5
23
Wrepulsion 2.0
Contd..
Wrepulsion 0.5
24
Conclusions

Various approaches for selecting the most
suitable instance of a unit type in a unit
selection database proposed.
GPMF based Optimal Unit found to be most
suitable.
Technique for Ranking unit instances using GPMF
described
Used for Pruning Unit selection database
A Low Memory Device Synthesizer implemented
Database Specific Empirical Formula derived to
come up with a database of any desired size
(based on set of suitable pruning control
parameters)
Optimal Sized Database created (pruned) without
loss of any naturalness