PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan Niu - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan Niu

Description:

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING. 1. PREDICTION AND SYNTHESIS OF PROSODIC ... Genre: newspaper. Greedy features: linguistic control factors. One female speaker ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 26
Provided by: janvansant
Category:

less

Transcript and Presenter's Notes

Title: PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan Niu


1
PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON
SPECTRAL BALANCE OF VOWELSJan P.H. van Santen
and Xiaochuan Niu
Center for Spoken Language Understanding OGI
School of Science Technology at OHSU
2
OVERVIEW
  1. IMPORTANCE OF SPECTRAL BALANCE
  2. MEASUREMENT OF SPECTRAL BALANCE
  3. ANALYSIS METHODS
  4. RESULTS
  5. SYNTHESIS
  6. CONCLUSIONS

3
1. IMPORTANCE OF SPECTRAL BALANCE
  • Linguistic Control Factors
  • Stress-like factors
  • Positional factors
  • Phonemic factors
  • Acoustic Correlates
  • Traditionally TTS-controlled
  • Pitch, timing, amplitude
  • Demonstrated in natural speech,
    but usually not TTS-controlled
  • Spectral tilt, balance
  • Formant dynamics

4
2. MEASUREMENT OF SPECTRAL BALANCE
  • Data
  • 472 greedily selected sentences
  • Genre newspaper
  • Greedy features linguistic control factors
  • One female speaker
  • Manual segmentation
  • Accent independent rating by 3 judges
  • 0-3 score

5
2. MEASUREMENT OF SPECTRAL BALANCE
  • Energy in 5 formant-range frequency bands
  • B0 100-300 Hz F0
  • B1 300-800 Hz F1
  • B2 800-2500 Hz F2
  • B3 2500-3500 Hz F3
  • B4 3500- max Hz fricative noise
  • In other words, multidimensional measure
  • Filter bank ? Square ?
  • ? Average 1 ms rect. ? 20 log10(Bi )
  • Subtract estimated per-utterance means

6
2. MEASUREMENT OF SPECTRAL BALANCE
  • Details
  • Confounding with F0
  • Measure pitch-corrected and raw
  • For certain wave shapes, pitch directly related
    to fixed-frame energy
  • Why do both wave shapes may change in unknown
    ways
  • F0 not confined to B0 female speech
  • Vowel formants not quite confined to bands e.g.,
    F1 for /EE/ and F3 for /ER/

7
2. MEASUREMENT OF SPECTRAL BALANCE
  • Why not more or different bands?
  • Multiple interacting Linguistic Control Factors
  • Need measurements that minimize interactions
  • 5 bands ? Different vowels behave similarly
  • Can model vowels as a class
  • Why not simply spectral tilt?
  • 5 bands more information than single measure
  • Supply more information for synthesis

8
3. ANALYSIS METHODS
  • Measures likely to behave like segmental
    duration
  • Multiple interacting, confounded factors
  • Interaction Magnitude of effects on one factor
    may depend on other factors
  • Confounding Unequal frequencies of control
    factor combinations
  • Directional Invariance
  • Direction of effects on one factor
    independent of other factors

9
3. ANALYSIS METHODS
  • Need method that
  • can handle multiple interacting, confounded
    factors and
  • takes advantage of Directional Invariance
  • Used Sums of Products Model

10
3. ANALYSIS METHODS
  • Special cases
  • Multiplicative model K 1, I1 0,,n
  • Additive model K 0,,n, Ii i

11
3. ANALYSIS METHODS
  • Used additive model
  • Note Parameter estimates are
  • Estimates of marginal means
  • in balanced design

12
3. ANALYSIS METHODS
  • Pitch correction
  • Confounding with F0 Show both
  • ltB0, B1, B2, B3, B4gt
  • and
  • ltB0 B1, B2, B3, B4gt

13
4. RESULTS (A) POSITIONAL EFFECTS
  • 5 Bands, not pitch-corrected
  • Solid right position, dashed left position.
    Y-axis corrected mean

14
4. RESULTS (A) POSITIONAL EFFECTS
  • 5 Bands, pitch-corrected

15
4. RESULTS (A) POSITIONAL EFFECTS
  • 4 Bands, not pitch-corrected

16
4. RESULTS (A) POSITIONAL EFFECTS
  • 4 Bands, pitch-corrected

17
4. RESULTS (B) STRESS/ACCENT EFFECTS
  • 5 Bands, not pitch-corrected
  • Solid stressed syllable, dashed unstressed.
    Y-axis corrected mean

18
4. RESULTS (B) STRESS/ACCENT EFFECTS
  • 5 Bands, pitch-corrected

19
4. RESULTS (B) STRESS/ACCENT EFFECTS
  • 4 Bands, not pitch-corrected

20
4. RESULTS (B) STRESS/ACCENT EFFECTS
  • 4 Bands, pitch-corrected

21
4. RESULTS (C) TILT EFFECTS
22
5. SYNTHESIS
  • Use ABS/OLA sinusoidal model
  • sn sum of overlapped short-time signal
    frames skn
  • skn sum of quasi-harmonic sinusoidal
    components
  • skn ? Sl Ak,l cos(wk,l n fk,l)
  • Each frame of unit is represented by a set of
    quasi-harmonic sinusoidal parameters
  • Given the desired F0 contour, pitch shift is
    applied to the sinusoidal parameter component of
    the unit to obtain the target parameter Ak,l

23
5. SYNTHESIS
  • Considering the differences of prosody factors
    between original and target unit, band
    differences
  • Transform the band difference into weights
    applying to the sinusoidal parameters
  • ,when the jth harmonic is
    located in the i'th band
  • Spectral smoothing across unit boundaries.

24
5. SYNTHESIS
5 Bands modification example i
25
CONCLUSIONS
  • Described simple methods for predicting and
    synthesizing spectral balance
  • But Spectral balance is only one
    non-standard acoustic correlate
  • Others that remain to be addressed
  • Spectral dynamics
  • Phase
Write a Comment
User Comments (0)
About PowerShow.com