Title: Synthesis
1Synthesis evaluation of prosodically
exaggerated utterancesA preliminary study
- Kyuchul Yoon
- Division of English
- Kyungnam University
- Spring 2008 Joint Conference of KSPS KASS
2Contents
- Synthesis evaluation of human utterances with
exaggerated prosody - Synthesis of exaggerated prosody
- Useful for native utterances
- The definition of prosody exaggeration
- The algorithm
- Evaluation of exaggerated prosody
- Useful for evaluating learner utterances
- The algorithm an experiment
3Teaching evaluating prosody
- Teaching language prosody
- The need for exaggeration of native utterances
- How to define exaggeration
- Evaluating language prosody
- Given the native version of an utterance,
evaluate learners utterances w/ atypical prosody - How to measure the differences btw/ the native
and learner utterances
4Exaggerating native prosody
- Exaggeration of the F0 contour
- One way would be to make the pitch peaks/valleys
higher/lower - Exaggeration of the intensity contour
- One way would be to manipulate the intensity
contour of the pitch peaks/valleys - Exaggeration of the segmental durations
- One way would be to manipulate the segmental
durations of the pitch peaks/valleys
5Exaggerating native prosody
F0
The fundamental frequency (F0) contour of an
utterance Marianna!.
6Exaggerating native prosody
Intensity
The intensity contour of an utterance Marianna!.
7Exaggerating native prosody
Duration
The segmental durations of an utterance Marianna!
before and after the exaggeration.
8Algorithm prosody exaggeration
- Definition of prosody exaggeration
- F0 contour
- Make pitch peaks/valleys higher/lower in Hz
values - Intensity contour
- Make pitch peaks higher in dB values
- Segmental durations
- Make pitch peaks longer in times values
9Algorithm prosody exaggeration
F0
10Algorithm prosody exaggeration
Intensity
11Algorithm prosody exaggeration
Durations
12How Praat script works
13How Praat script works
F0
Intensity
Durations
14How Praat script works
Original
F0
Durations
F0
Durations
Intensity
15Evaluating learner prosody
- Assumes the existence of the native version
- Evaluates the learner versions
- Evaluation of the F0 intensity contours
- Is preceded by duration manipulation
- The durations of the matching segments of the two
utterances are made identical 3 - Is preceded by F0/intensity normalization F0
smoothing - The mean difference is added/subtracted to/from
learner utterance - Is followed by pitch/intensity point-to-point
comparison - Evaluation of segmental durations
- Done without any duration manipulation.
Segment-to-segment comparison - Evaluation measure Euclidean distance metric
16Algorithm prosody evaluation
Before after duration manipulation
native
learner before
learner after
17Algorithm prosody evaluation
F0 point-to-point comparison btw/ native and
learner
native
learner after
18Algorithm prosody evaluation
Intensity point-to-point comparison btw/ native
and learner
native
learner after
19Algorithm prosody evaluation
Duration segment-to-segment comparison btw/
native and learner
native
learner before
Euclidean distance metric for evaluation measure
P (p1, p2, p3,..., pn) and Q (q1, q2, q3,...,
qn) in Euclidean n-space
20A pilot experiment
native
learner after
Euclidean distance should be minimum
21A pilot experiment
native
learner after
F0 -100Hz to 100Hz with a 10Hz interval ? 21
stimuli Intensity -25dB to 25dB with a 5dB
interval ? 11 stimuli Duration 0.25, 0.50, 0.75,
1.00, 1.50, 2.00, 2.50, 3.00 times the original ?
8 stimuli
22Results Conclusion
23Results Conclusion
24Results Conclusion
25Results Conclusion
- Prosody exaggeration
- Can be a tool for teaching language prosody
- Can be used to test measures for evaluating
prosody - Limitation of the current prosody evaluation
- Native utterances should exist to yield measures
- TTS systems with advanced prosody models could be
helpful - Weights of the three separate measures
(F0/intensity/duration) need to be determined - Experiments with human evaluators could provide
the weights
26References
1 Boersma, Paul. 2001. Praat, a system for
doing phonetics by computer. Glot International
5(9/10). pp.341-345. 2 Moulines, E. F.
Charpentier. 1990. Pitch synchronous waveform
processing techniques for text-to-speech
synthesis using diphones. Speech Communication 9.
pp.453-467. 3 Yoon, K. 2007. Imposing native
speakers' prosody on non-native speakers'
utterances The technique of cloning prosody.
Journal of the Modern British American Language
Literature 25(4). pp.197-215.