Building a Catalan diphone voice - PowerPoint PPT Presentation

About This Presentation
Title:

Building a Catalan diphone voice

Description:

Most Catalan phones (34) plus 2 Spanish phones (th and jj) ... Mostly same as Spanish, but with the new set of phones. ... more appropriate than Spanish, ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 14
Provided by: csC76
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Building a Catalan diphone voice


1
Building a Catalan diphone voice
  • Ariadna Font Llitjos
  • May 10, 2001

2
Defining the phoneset
  • Most Catalan phones (34) plus 2 Spanish phones
    (th and jj)
  • Reason All Catalan speakers also have Spanish
    phones, and there are many Spanish borrowed words
    that are in most Catalan speakers lexicon
  • Left out phones that need a much finer
    classification than the ones made for English
    phones (beta, gamma, etc)

3
Generating Diphone Schema
  • Mostly same as Spanish, but with the new set of
    phones.
  • Catalan has 8 vowels (w/o considering stress),
    whereas Spanish has only 5 -gt had to add a level
    of vheight (high mid-high mid-low low)
  • ( draw graph on the board)
  • Mapping Catalan phones to a predefined set of
    phones
  • Over generative. Voice better suited to pronounce
    foreign or nonsense words that contain phones in
    the language but no legal combination of those

4
Mapping Catalan phones to a predefined set of
phones
  • Options Spanish and English
  • My choice English
  • Reasons
  • English has more phones for vowels, more
    appropriate than Spanish,
  • Spanish phones have already been mapped to
    English phones, better to just map the phones
    directly to English, rather than indirectly

5
Generating and recording the prompts
  • 1109 prompts (recorded on festvox0)
  • Lots of room noise (typing, door, talking, etc.)
  • Microphone not always in same position
  • Different power and even different intonation and
    duration throughout the whole recording process

6
Labeling nonsense words
  • Automatically
  • make_labs
  • make_diph_index
  • Manually
  • Find a set of diphones that are wrong and look
    them up in dic/afldiph.est
  • Edit and correct the corresponding file with
    emulabel
  • Rerun make_diph_index (etc.)

7
Extracting pitchmarks and LPS coefficients
  • Automatically
  • make_pm_wav (edit to modify pitch range of
    speaker)
  • find_powerfactors (tells us what general power
    difference exists between files, calculated a
    table of power modifiers for each file)
  • make_lpc

8
Testing phone synthesis
  • (SayPhones (pau o l a pau s o k l a r i a d n a
    pau))
  • Catalan voice
  • Spanish voice
  • English voice
  • (modifying the phones)

9
Catalan voice is still quite bad
  • Bad example
  • But it does have a basic Spanish phone
  • and without it, it would sound like this
  • And here is how kal_diphone sounds

10
Added tokenization
  • To be able to tell the numbers in Catalan
    (followed the Spanish tokenizer)
  • Show file

11
Added some lexical entries
  • Letters of the alphabet, symbols, punctuation,
    some content words

12
Phrasing, duration and intonation
  • Not there yet
  • Nor can I get it to SayText

13
Summary building a diphone voice
  • Define phoneset
  • Generate diphone schema
  • Generate prompts
  • Record prompts
  • Label prompts
  • Extract pitchmarks and LPC coefficients
  • Test phone synthesis
  • Hand correct labels
  • Add tokenizer
  • Add lexicon
  • Add prosody, durations and intonation
  • Test and evaluate voice
  • Package for distribution
Write a Comment
User Comments (0)
About PowerShow.com