EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

Description:

International Institute of Information Technology, Hyderabad ... parameterized in terms of formant frequencies or linear prediction coefficients ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 25
Provided by: spkishorea
Category:

less

Transcript and Presenter's Notes

Title: EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES


1
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES
FOR INDIAN LANGUAGES
  • S P Kishore, Alan W Black, Rohit Kumar,
    Rajeev Sangal
  • Language Technologies Research Center
  • International Institute of Information
    Technology, Hyderabad
  • Language Technologies Institute, Carnegie
    Mellon University
  • Institute of Software Research International,
    Carnegie Mellon University

2
ORGANIZATION OF THE TALK
  • Role of Language Technologies
  • Text to Speech Systems
  • Text Processing Front End
  • Speech Generation Component
  • Unit Selection Approach
  • Experiments
  • Choice of Unit Size
  • Generation of Databases Content Size of
    Database
  • Evaluation of Hindi Speech Synthesis System
  • Applications
  • Conclusion

3
ROLE OF LANGUAGE TECHNOLOGIES
  • Natural Interfaces for Information Access
  • Crucial Role for Multilingual Societies
  • Integration of Speech Recognition, Machine
    Translation and Speech Synthesis
  • For Interaction between 2 people speaking
    different languages

4
INDIAN LANGUAGE TEXT TO SPEECH (TTS) SYSTEMS
  • A Text to Speech System converts an arbitrary
    given text into a corresponding spoken waveform.
  • Why Text to Speech Synthesis ?

Basic Blocks of a Text to Speech System
Basic Units Sequence Prosody Information
Text
Speech
5
INDIAN LANGUAGE TEXT TO SPEECH (TTS) SYSTEMSTEXT
PROCESSING FRONT END
  • Nature of Indian Scripts
  • Basic units of Indian writing system are Aksharas
  • An Akshara is typically of the form V, CV, CCV
  • Common Phonetic Base
  • About 35 Consonants and 18 Vowels
  • Phonetic nature of languages - What is written
    is what is spoken
  • Exception Schwa Deletion (Inherent Vowel
    Suppression)

6
INDIAN LANGUAGE TEXT TO SPEECH (TTS) SYSTEMSTEXT
PROCESSING FRONT END
  • Format of Input Text
  • ISCII, Unicode, Various Fonts
  • Can be handled by use of appropriate conversion
    module(s)
  • Mapping Non Standard Words to Standard Words
  • NSW Symbols, digits, initials, abbreviations,
    Punctuations, non-native words etc.

7
INDIAN LANGUAGE TEXT TO SPEECH (TTS) SYSTEMSTEXT
PROCESSING FRONT END
  • Standard Words to Phoneme Sequence
  • Involves Lexicon Lookup and use of Letter to
    Sound rules for English
  • Due to phonetic nature of Indian scripts, simple
    letter to sound rules can be used
  • Problems with some languages
  • Inherent Vowel Suppression (schwa deletion)
  • e.g. ratana (rtana) is spoken as ratan
  • Presently we are using set of Heuristic Rules

8
INDIAN LANGUAGE TEXT TO SPEECH (TTS)
SYSTEMSSPEECH GENERATION COMPONENT
  • ARTICULATORY MODEL BASED SYNTHESIS
  • Involves simplistic modeling of human speech
    production mechanism
  • Difficult to accurately model the motion of
    articulators
  • PARAMETER BASED SYNTHESIS
  • Speech segments are parameterized in terms of
    formant frequencies or linear prediction
    coefficients
  • Difficult to come up with large number of rules
    to accurately manifest co articulation and
    prosody
  • CONCATENATION BASED SYNTHESIS
  • Inventory of recorded speech segments (units)
    used
  • Prosodic Variations
  • Intonation and duration could be acquired and
    incorporated in the form of rules
  • Store multiple realizations of units with
    differing prosody

9
INDIAN LANGUAGE TEXT TO SPEECH (TTS)
SYSTEMSSPEECH GENERATION COMPONENT
  • Unit Selection (Data Driven) Approach
  • Multiple realizations of basic units with varying
    prosodic features are stored in the speech
    database
  • Storage and retrieval of large number of recorded
    units is feasible in real time due to
    availability of cheap memory and computation power

10
UNIT SELECTION APPROACH
  • Building up of Speech Databases
  • Collection of optimal text corpuses
  • Recording the text corpuses
  • Automatic labeling followed by manual correction
    of labels
  • Extraction of units features
  • Clustering units to facilitate selection

11
UNIT SELECTION APPROACH
  • ISSUES INVOLVED
  • Choice of Unit Size
  • Sub words units half phone, phone, diphone,
    syllable
  • Larger the unit size lesser the joins and lesser
    the discontinuities
  • Also wide coverage of units in various contexts
    desirables
  • Generation of Speech Databases
  • Approach for Optimal Selection of Utterances
  • Criteria for Unit Selection
  • Most suitable units are selected from the
    database on basis of minimization of target and
    concatenation costs

12
EXPERIMENTSCHOICE OF UNIT SIZE
  • Hindi Synthesizers using different choices of
    unit sizes built
  • Syllable, diphone, phone, half phones
  • 24 sentences from Hindi news bulletin synthesized
  • Perceptual Test on Native Hindi Speaking Subjects
    conducted
  • AB Test
  • Results
  • Syllables performed better than diphones, phones
    and half phones
  • Half phones performed better than diphones and
    phones
  • Ref. S. P. Kishore, Alan W. Black, Unit Size in
    Unit Selection Speech Synthesis, Eurospeech
    2003, Geneva

13
EXPERIMENTSCHOICE OF UNIT
  • Example Utterances
  • Half Phones
  • Phones
  • Diphones
  • Syllables

14
GENERATION OF SPEECH DATABASES
  • Selection of utterances with wide phonetic and
    prosodic coverage
  • High Frequency Syllables
  • Syllable with relatively high occurrence in a
    corpus
  • A sentence is selected if it has at least one
    high frequency syllable not present in the
    previous selected sentences
  • Utterances Recorded and Labeled

15
GENERATION OF SPEECH DATABASES
SYLLABLE COVERAGE AND DURATION OF SPEECH
DATABASES
To Study Dependency of Quality on Coverage gtgt
16
EXPERIMENTSGENERATION OF SPEECH DATABASES
  • 6 databases with varying syllable coverage built

17
EXPERIMENTSGENERATION OF SPEECH DATABASES
PERCEPTUAL TESTS
5 Subjects asked to listen to 5 sentences and
score them on a scale of 0 (worst) to 5 (Best).
Example Example Example Example Example Example
18
EXPERIMENTSGENERATION OF SPEECH DATABASES
19
EVALUTION OF HINDI SPEECH SYNTHESIS SYSTEM
  • Text Processing Front End developed
  • Support of Hindi text in Unicode
  • Handles Non Standard words like
  • Date, Currency, Digits, Address Abbreviations,
    etc.
  • Schwa Deletion using Heuristic Rules
  • 200 Sentences Synthesized
  • 9 Native hindi speaking subjects evaluated
    perceptual quality of the synthesizer
  • Each Subject evaluated nearly 40 sentences out of
    the 200
  • Scoring on a scale of 0 (worst) to 5 (Best)
  • Words Not Sounding Natural were marked

20
EVALUTION OF HINDI SPEECH SYNTHESIS SYSTEM
21
EVALUTION OF HINDI SPEECH SYNTHESIS SYSTEM
  • OBSERVATIONS
  • 30 of Not Sounding Natural words were loan
    words from English
  • Proper Nouns not being pronounced correctly
  • Schwa Deletion rules not successfully deleting
    schwa in some places
  • Some punctuations characters not getting handled
    properly
  • LESSONS
  • Additional Phonetic Coverage for proper nouns and
    loan words required
  • Good text processing component needed for high
    quality speech synthesis

22
APPLICATIONS
  • Talking Tourists Aid
  • Limited Domain Synthesis
  • Allows person to communicate queries about city,
    travel, accomodation, etc.
  • News Reader
  • Reading news from a Hindi News Portal
  • Screen Reader for Visually Impaired

23
CONCLUSION
  • Syllables are better units for Indian Language
    Speech Synthesis
  • Syllable gt Half Phone gt Diphone gt Phone
  • High coverage of units produces high quality
    speech. Also there would be less variance marking
    higher consistency of results
  • Effects of Loan words should be considered in
    design of speech corpus
  • Good text processing front end needed for high
    quality synthesis

24
  • QUESTIONS
Write a Comment
User Comments (0)
About PowerShow.com