Automatic Pruning of Unit Selection Speech Databases for Synthesis without loss of Naturalness - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Automatic Pruning of Unit Selection Speech Databases for Synthesis without loss of Naturalness

Description:

Global Prosodic Mismatch Function (GPMF) But ... Using Weighted Global Prosodic Mismatch Function (GPMF) Undesirability. Repulsion ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 25
Provided by: rohit58
Category:

less

Transcript and Presenter's Notes

Title: Automatic Pruning of Unit Selection Speech Databases for Synthesis without loss of Naturalness


1
Automatic Pruning of Unit Selection
SpeechDatabases for Synthesiswithout loss of
Naturalness
  • Rohit Kumar , S. P. Kishore
  • International Institute of Information
    Technology
  • Hyderabad, INDIA
  • Carnegie Mellon University, Pittsburg

ICSLP INTERSPEECH 2004 Jeju Island, Korea
2
Organization of the Talk
  • Introduction to Unit Selection based T.T.S.
  • Need / Scope for Pruning of Speech Databases
  • Our Aims at this Work
  • Low Memory Device Synthesizer
  • Definitions of Neutral and Optimal Units
  • Implementation and Results
  • How do we prune ??
  • Ranking of Units
  • Heuristic for Creating Database of Any Size
  • Optimal Sized Database Perceptual Evaluation
  • Conclusions

3
Unit Selection based T.T.S.
Linguistic Information
Selected Units
Unit Selection Algorithms
Text / Linguistic Processing
Signal Processing
Text
Speech
Sequence of Basic Units
Basic Unit Inventory
Inventory Building Modules
UnitAA..AB..B
Instance12..n1..m
Featuresx(A1), y(A1) ..x(A2), y(A2)
....x(An), y(An)..x(B1), y(B1) ....x(Bm),
y(Bm) ..
Speech Corpus
Signal
Transcriptions
Labels
Features
4
Unit Pruning
  • Typical size of High quality Unit Selection
    Databases is large (100 MB to 500 MB)
  • Using Unit Pruning Techniques, the database size
    can be significantly reduced without loss in
    quality
  • Unit Pruning refers to removal of units instances
    from the Unit Selection Database that do not add
    (or may even be harmful) to the quality of
    synthesized speech.

5
Need / Scope for Pruning
1. To Improve Quality 2. To Reduce Size
  • ltlt Deviant Units gtgt
  • Units having features too deviant from usual
    values of such features
  • These units rarely get selected due to the high
    costs of selection
  • Removal of such units from database improves the
    quality of synthesized output
  • ltlt Redundant Units gtgt
  • Units having very similar feature sets
  • Do not contribute significantly to the diversity
    of units in the database
  • Removal of such units from database does no harm
    to synthesis quality and helps in reducing
    database size

6
Our Aims at this Work
  • Multiple Aims
  • To come up with a Low Memory Device Synthesizer
    (for PDAs, Mobiles)
  • To be able to Create a Database of Any Size with
    a Corresponding Quality
  • To corner upon an Optimal Size of the Database
    without any loss of Quality

7
Low Memory Device Synthesizer
  • Requirement is to come up with a Speech
    Synthesizer that would fit into a Low memory
    device like a PDA
  • The database size of a normal Unit Selection
    based TTS would be prohibitively large to fit the
    system into a Small Device
  • So we trade off Quality ?? Size
  • But keep all possible basic units
  • So instead of multiple Instance of each basic
    Unit, we keep only one Instance of each basic unit

8
Low Memory Device Synthesizer
  • Requirement is to come up with a Speech
    Synthesizer that would fit into a Low memory
    device like a PDA
  • The database size of a normal Unit Selection
    based TTS would be prohibitively large to fit the
    system into a Small Device
  • So we trade off Quality ?? Size
  • But keep all possible basic units
  • So instead of multiple Instance of each basic
    Unit, we keep only one Instance of each basic unit

QUESTION How to choose one most suitable
instance from the various Instances of
each unit ??
9
QUESTION How to choose one most suitable
instance from the various Instances of
each unit ??
NEUTRAL UNITS
HypothesisBest Instance is the one that is
prosodically neutral and will have minimal
contextual effects. Neutral Units will join best
with Neutral Units Definition Neutral (Average)
Unit is the unit instance that has features
closest to Average of features.

Average Pitch Average Duration Average
Energy
So ltPNeutral, DNeutral, ENeutral gt is closest to
ltPIdeal, DIdeal, EIdeal gt
10
QUESTION How to choose one most suitable
instance from the various Instances of
each unit ?? Contd
Optimal Units
Alternative HypothesisBest instance is the one
that joins most suitably in all contexts that it
is likely to appear in. Definition Unit
Instance that joins most suitably with all the
units that appear in the context of the instances
of the unit under consideration

11
Optimal Units
Let A1, A2, . Ai, ., An be the instances of a
basic unit A Let Ai-1 and Ai1 be units preceding
and succeeding the instance Ai in the corpus
Global Prosodic Mismatch Function (GPMF)
But PExpectedAi-1 PAi DExpectedAi-1
DAi EExpectedAi-1 EAi
By definition, Optimal Instance of the Unit is
one that minimizes GPMF
12
Low Memory Device Synthesizer
Implementation
Linguistic Information
Signal Processing
Text / Linguistic Processing
Text
Speech
Sequenceof Basic Units
Basic Unit Inventory
UnitABC....PQ
Instancexyz....wm
UnitAA..AB..B
Instance12..n1..m
Featuresx(A1), y(A1) ..x(A2), y(A2)
....x(An), y(An)..x(B1), y(B1) ....x(Bm),
y(Bm) ..
Neutral / Optimal
Unit Selection has now moved from Synthesis Time
to Inventory Building Time
13
Low Memory Device Synthesizer
Results
Perceptual Tests on 8 subjects scoring 10
sentences on a scale of 0(worst) to 5(best)
Database F Optimal Units is a best performing
Approach
14
Low Memory Device Synthesizer
Results
A LMDS System implemented on a Handheld Computing
Device Hindi Database consisting of 2786 unique
basic units (syllables, phonemes) collected
using Optimal Unit Approach Actual Database
Size _at_ 16Khz 256kbps 180 MB GSM Coded _at_ 8KHz ?
Database Size 1.27 MB G722 Coded _at_ 16 Khz ?
Database Size 5.02 MB
15
How do we Prune ??
  • For every unit in the database,
  • Score each instance of the unit for the (un)
    desirability of that particular instance given
    all the other instances
  • Pick the top x of the instances of the unit and
    remove all the others
  • ( x ) is the Pruning Control Parameter

16
How do we Prune ??
  • For every unit in the database,
  • Score each instance of the unit for the (un)
    desirability of that particular instance given
    all the other instances
  • Pick the top x of the instances of the unit and
    remove all the others
  • ( x ) is the Pruning Control Parameter

Question How do we rank the units ??
17
Ranking the Unit Occurrences
1. Measure of Undesirability of Instance2. Inter
Instance Repulsion
Using Weighted Global Prosodic Mismatch Function
(GPMF)
Undesirability
Repulsion
SCOREx Ux (WREPLUSION x Rx) Ranking is in
descending order of Score
18
Database of Any Desired Size
Having the instances of Units in Ranked order, we
need a Pruning Control Parameter (x), to decide
what kind of database we want.
Experiment Hindi Speech Corpus (96 Minutes) 2
Kinds of Basic Units Syllables Unique 2391
Total 23096 Phonemes Unique 49 Total
54734 Pruning Control Parameters P ?
Percentage of Phonemes to be Kept S ?
Percentage of Syllables to be Kept WREPULSION ?
Inter Instance Repulsion Weight in Scoring
19
Database of Any Desired Size
We created several pruned database using
different sets of pruning control parameters. A
database specific empirical has been derived to
come up a pruned database of any desired size.
20
Optimal Database Size
To come up with an optimal set of pruning
parameters, so that minimal size of database can
be achieved without degradation in quality
(naturalness, perceptibility) of
speech. Perceptual tests conducted on several
pruned databases created with different Pruning
Control Parameters. 10 Databases with different
Pruning Control Parameter Values 8 Subjects
ranked 5 sentences each on a scale of 0 (worst)
to 5 (best)
21
Contd..
22
Wrepulsion 2.0
Contd..
Wrepulsion 0.5
23
Wrepulsion 2.0
Contd..
Wrepulsion 0.5
24
Conclusions
  • Various approaches for selecting the most
    suitable instance of a unit type in a unit
    selection database proposed.
  • GPMF based Optimal Unit found to be most
    suitable.
  • Technique for Ranking unit instances using GPMF
    described
  • Used for Pruning Unit selection database
  • A Low Memory Device Synthesizer implemented
  • Database Specific Empirical Formula derived to
    come up with a database of any desired size
    (based on set of suitable pruning control
    parameters)
  • Optimal Sized Database created (pruned) without
    loss of any naturalness
Write a Comment
User Comments (0)
About PowerShow.com