Title: Automatic Pruning of Unit Selection Speech Databases for Synthesis without loss of Naturalness
1Automatic Pruning of Unit Selection
SpeechDatabases for Synthesiswithout loss of
Naturalness
- Rohit Kumar , S. P. Kishore
- International Institute of Information
Technology - Hyderabad, INDIA
- Carnegie Mellon University, Pittsburg
ICSLP INTERSPEECH 2004 Jeju Island, Korea
2Organization of the Talk
- Introduction to Unit Selection based T.T.S.
- Need / Scope for Pruning of Speech Databases
- Our Aims at this Work
- Low Memory Device Synthesizer
- Definitions of Neutral and Optimal Units
- Implementation and Results
- How do we prune ??
- Ranking of Units
- Heuristic for Creating Database of Any Size
- Optimal Sized Database Perceptual Evaluation
- Conclusions
3Unit Selection based T.T.S.
Linguistic Information
Selected Units
Unit Selection Algorithms
Text / Linguistic Processing
Signal Processing
Text
Speech
Sequence of Basic Units
Basic Unit Inventory
Inventory Building Modules
UnitAA..AB..B
Instance12..n1..m
Featuresx(A1), y(A1) ..x(A2), y(A2)
....x(An), y(An)..x(B1), y(B1) ....x(Bm),
y(Bm) ..
Speech Corpus
Signal
Transcriptions
Labels
Features
4Unit Pruning
- Typical size of High quality Unit Selection
Databases is large (100 MB to 500 MB) - Using Unit Pruning Techniques, the database size
can be significantly reduced without loss in
quality - Unit Pruning refers to removal of units instances
from the Unit Selection Database that do not add
(or may even be harmful) to the quality of
synthesized speech.
5Need / Scope for Pruning
1. To Improve Quality 2. To Reduce Size
- ltlt Deviant Units gtgt
- Units having features too deviant from usual
values of such features - These units rarely get selected due to the high
costs of selection - Removal of such units from database improves the
quality of synthesized output
- ltlt Redundant Units gtgt
- Units having very similar feature sets
- Do not contribute significantly to the diversity
of units in the database - Removal of such units from database does no harm
to synthesis quality and helps in reducing
database size
6Our Aims at this Work
- Multiple Aims
- To come up with a Low Memory Device Synthesizer
(for PDAs, Mobiles) - To be able to Create a Database of Any Size with
a Corresponding Quality - To corner upon an Optimal Size of the Database
without any loss of Quality
7Low Memory Device Synthesizer
- Requirement is to come up with a Speech
Synthesizer that would fit into a Low memory
device like a PDA - The database size of a normal Unit Selection
based TTS would be prohibitively large to fit the
system into a Small Device - So we trade off Quality ?? Size
- But keep all possible basic units
- So instead of multiple Instance of each basic
Unit, we keep only one Instance of each basic unit
8Low Memory Device Synthesizer
- Requirement is to come up with a Speech
Synthesizer that would fit into a Low memory
device like a PDA - The database size of a normal Unit Selection
based TTS would be prohibitively large to fit the
system into a Small Device - So we trade off Quality ?? Size
- But keep all possible basic units
- So instead of multiple Instance of each basic
Unit, we keep only one Instance of each basic unit
QUESTION How to choose one most suitable
instance from the various Instances of
each unit ??
9QUESTION How to choose one most suitable
instance from the various Instances of
each unit ??
NEUTRAL UNITS
HypothesisBest Instance is the one that is
prosodically neutral and will have minimal
contextual effects. Neutral Units will join best
with Neutral Units Definition Neutral (Average)
Unit is the unit instance that has features
closest to Average of features.
Average Pitch Average Duration Average
Energy
So ltPNeutral, DNeutral, ENeutral gt is closest to
ltPIdeal, DIdeal, EIdeal gt
10QUESTION How to choose one most suitable
instance from the various Instances of
each unit ?? Contd
Optimal Units
Alternative HypothesisBest instance is the one
that joins most suitably in all contexts that it
is likely to appear in. Definition Unit
Instance that joins most suitably with all the
units that appear in the context of the instances
of the unit under consideration
11Optimal Units
Let A1, A2, . Ai, ., An be the instances of a
basic unit A Let Ai-1 and Ai1 be units preceding
and succeeding the instance Ai in the corpus
Global Prosodic Mismatch Function (GPMF)
But PExpectedAi-1 PAi DExpectedAi-1
DAi EExpectedAi-1 EAi
By definition, Optimal Instance of the Unit is
one that minimizes GPMF
12Low Memory Device Synthesizer
Implementation
Linguistic Information
Signal Processing
Text / Linguistic Processing
Text
Speech
Sequenceof Basic Units
Basic Unit Inventory
UnitABC....PQ
Instancexyz....wm
UnitAA..AB..B
Instance12..n1..m
Featuresx(A1), y(A1) ..x(A2), y(A2)
....x(An), y(An)..x(B1), y(B1) ....x(Bm),
y(Bm) ..
Neutral / Optimal
Unit Selection has now moved from Synthesis Time
to Inventory Building Time
13Low Memory Device Synthesizer
Results
Perceptual Tests on 8 subjects scoring 10
sentences on a scale of 0(worst) to 5(best)
Database F Optimal Units is a best performing
Approach
14Low Memory Device Synthesizer
Results
A LMDS System implemented on a Handheld Computing
Device Hindi Database consisting of 2786 unique
basic units (syllables, phonemes) collected
using Optimal Unit Approach Actual Database
Size _at_ 16Khz 256kbps 180 MB GSM Coded _at_ 8KHz ?
Database Size 1.27 MB G722 Coded _at_ 16 Khz ?
Database Size 5.02 MB
15How do we Prune ??
- For every unit in the database,
- Score each instance of the unit for the (un)
desirability of that particular instance given
all the other instances - Pick the top x of the instances of the unit and
remove all the others - ( x ) is the Pruning Control Parameter
16How do we Prune ??
- For every unit in the database,
- Score each instance of the unit for the (un)
desirability of that particular instance given
all the other instances - Pick the top x of the instances of the unit and
remove all the others - ( x ) is the Pruning Control Parameter
Question How do we rank the units ??
17Ranking the Unit Occurrences
1. Measure of Undesirability of Instance2. Inter
Instance Repulsion
Using Weighted Global Prosodic Mismatch Function
(GPMF)
Undesirability
Repulsion
SCOREx Ux (WREPLUSION x Rx) Ranking is in
descending order of Score
18Database of Any Desired Size
Having the instances of Units in Ranked order, we
need a Pruning Control Parameter (x), to decide
what kind of database we want.
Experiment Hindi Speech Corpus (96 Minutes) 2
Kinds of Basic Units Syllables Unique 2391
Total 23096 Phonemes Unique 49 Total
54734 Pruning Control Parameters P ?
Percentage of Phonemes to be Kept S ?
Percentage of Syllables to be Kept WREPULSION ?
Inter Instance Repulsion Weight in Scoring
19Database of Any Desired Size
We created several pruned database using
different sets of pruning control parameters. A
database specific empirical has been derived to
come up a pruned database of any desired size.
20Optimal Database Size
To come up with an optimal set of pruning
parameters, so that minimal size of database can
be achieved without degradation in quality
(naturalness, perceptibility) of
speech. Perceptual tests conducted on several
pruned databases created with different Pruning
Control Parameters. 10 Databases with different
Pruning Control Parameter Values 8 Subjects
ranked 5 sentences each on a scale of 0 (worst)
to 5 (best)
21Contd..
22Wrepulsion 2.0
Contd..
Wrepulsion 0.5
23Wrepulsion 2.0
Contd..
Wrepulsion 0.5
24Conclusions
- Various approaches for selecting the most
suitable instance of a unit type in a unit
selection database proposed. - GPMF based Optimal Unit found to be most
suitable. - Technique for Ranking unit instances using GPMF
described - Used for Pruning Unit selection database
- A Low Memory Device Synthesizer implemented
- Database Specific Empirical Formula derived to
come up with a database of any desired size
(based on set of suitable pruning control
parameters) - Optimal Sized Database created (pruned) without
loss of any naturalness