Title: PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan Niu
1PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON
SPECTRAL BALANCE OF VOWELSJan P.H. van Santen
and Xiaochuan Niu
Center for Spoken Language Understanding OGI
School of Science Technology at OHSU
2OVERVIEW
- IMPORTANCE OF SPECTRAL BALANCE
- MEASUREMENT OF SPECTRAL BALANCE
- ANALYSIS METHODS
- RESULTS
- SYNTHESIS
- CONCLUSIONS
31. IMPORTANCE OF SPECTRAL BALANCE
- Linguistic Control Factors
- Stress-like factors
- Positional factors
- Phonemic factors
- Acoustic Correlates
- Traditionally TTS-controlled
- Pitch, timing, amplitude
- Demonstrated in natural speech,
but usually not TTS-controlled - Spectral tilt, balance
- Formant dynamics
-
42. MEASUREMENT OF SPECTRAL BALANCE
- Data
- 472 greedily selected sentences
- Genre newspaper
- Greedy features linguistic control factors
- One female speaker
- Manual segmentation
- Accent independent rating by 3 judges
- 0-3 score
52. MEASUREMENT OF SPECTRAL BALANCE
- Energy in 5 formant-range frequency bands
- B0 100-300 Hz F0
- B1 300-800 Hz F1
- B2 800-2500 Hz F2
- B3 2500-3500 Hz F3
- B4 3500- max Hz fricative noise
- In other words, multidimensional measure
- Filter bank ? Square ?
- ? Average 1 ms rect. ? 20 log10(Bi )
- Subtract estimated per-utterance means
62. MEASUREMENT OF SPECTRAL BALANCE
- Details
- Confounding with F0
- Measure pitch-corrected and raw
- For certain wave shapes, pitch directly related
to fixed-frame energy - Why do both wave shapes may change in unknown
ways - F0 not confined to B0 female speech
- Vowel formants not quite confined to bands e.g.,
F1 for /EE/ and F3 for /ER/
72. MEASUREMENT OF SPECTRAL BALANCE
- Why not more or different bands?
- Multiple interacting Linguistic Control Factors
- Need measurements that minimize interactions
- 5 bands ? Different vowels behave similarly
- Can model vowels as a class
- Why not simply spectral tilt?
- 5 bands more information than single measure
- Supply more information for synthesis
83. ANALYSIS METHODS
- Measures likely to behave like segmental
duration - Multiple interacting, confounded factors
- Interaction Magnitude of effects on one factor
may depend on other factors - Confounding Unequal frequencies of control
factor combinations - Directional Invariance
- Direction of effects on one factor
independent of other factors
93. ANALYSIS METHODS
- Need method that
- can handle multiple interacting, confounded
factors and - takes advantage of Directional Invariance
- Used Sums of Products Model
103. ANALYSIS METHODS
- Special cases
- Multiplicative model K 1, I1 0,,n
- Additive model K 0,,n, Ii i
113. ANALYSIS METHODS
- Used additive model
- Note Parameter estimates are
- Estimates of marginal means
- in balanced design
123. ANALYSIS METHODS
- Confounding with F0 Show both
- ltB0, B1, B2, B3, B4gt
- and
- ltB0 B1, B2, B3, B4gt
134. RESULTS (A) POSITIONAL EFFECTS
- 5 Bands, not pitch-corrected
- Solid right position, dashed left position.
Y-axis corrected mean
144. RESULTS (A) POSITIONAL EFFECTS
154. RESULTS (A) POSITIONAL EFFECTS
- 4 Bands, not pitch-corrected
164. RESULTS (A) POSITIONAL EFFECTS
174. RESULTS (B) STRESS/ACCENT EFFECTS
- 5 Bands, not pitch-corrected
- Solid stressed syllable, dashed unstressed.
Y-axis corrected mean
184. RESULTS (B) STRESS/ACCENT EFFECTS
194. RESULTS (B) STRESS/ACCENT EFFECTS
- 4 Bands, not pitch-corrected
204. RESULTS (B) STRESS/ACCENT EFFECTS
214. RESULTS (C) TILT EFFECTS
225. SYNTHESIS
- Use ABS/OLA sinusoidal model
- sn sum of overlapped short-time signal
frames skn - skn sum of quasi-harmonic sinusoidal
components - skn ? Sl Ak,l cos(wk,l n fk,l)
- Each frame of unit is represented by a set of
quasi-harmonic sinusoidal parameters - Given the desired F0 contour, pitch shift is
applied to the sinusoidal parameter component of
the unit to obtain the target parameter Ak,l
235. SYNTHESIS
- Considering the differences of prosody factors
between original and target unit, band
differences
- Transform the band difference into weights
applying to the sinusoidal parameters
- ,when the jth harmonic is
located in the i'th band
- Spectral smoothing across unit boundaries.
245. SYNTHESIS
5 Bands modification example i
25CONCLUSIONS
- Described simple methods for predicting and
synthesizing spectral balance - But Spectral balance is only one
non-standard acoustic correlate - Others that remain to be addressed
- Spectral dynamics
- Phase