Title: Speech and speaker normalization (in vowel normalization)
1Speech and speaker normalization (in vowel
normalization)
- Venice International University
- Phonetic and technological aspects of speaker
characteristics - Prof. Dr. J. Harrington
- Presented by
- Clara Tillmanns
- clarainindia_at_yahoo.com
- 18.10.2007
2Contents
- Speech and speaker normalization in vowel
normalization definition - Influencing parameters and instruments for vowel
normalization - Theories
- Studies Johnson 1990 and 1999
- Recapitulation
3Definition
- Normalization.
- We know there is extensive variation in speech.
How come that listeners agree in their perception
of vowels?
4Fig. 1 Scatter plot of first and second formant
values of American English vowels. From Peterson
Barney 1952
5Definition
- Normalization.
- Which information influences this decision?
6Definition
- Normalization.
- And, which mechanism leads to the decision?
7Contents
- Speech and speaker normalization definition
- Influencing parameters and instruments for vowel
normalization - Context
- Formant ratio
- F0
- Visual information
- Auditory gestalts
- Theories
- Studies Johnson 1990 and 1999
- Recapitulation
8Influencing parameters and instruments for vowel
normalization
Context
Formant ratio
F0
Auditory gestalts
Visual information
9Influencing parameters and instruments for vowel
normalization
Syllable internal
Syllable external
Context
Formant ratio
F0
Auditory gestalts
Visual information
10Influencing parameters and instruments for vowel
normalization
Syllable internal
Syllable external
Context
Formant ratio
Vocalic
Prosodic
F0
Tonal
Auditory gestalts
Visual information
11Influencing parameters and instruments for vowel
normalization
- Context
- Perceived vowel quality is influenced
- by the formant frequencies of context vowels
(Ladefoged Broadbent 1957) - by the F0 range of the carrier phrase (Johnson
1990) - Tones Pitch range of a context utterance
influences Mandarin Chinese tones (Leather 1983)
12Influencing parameters and instruments for vowel
normalization
Syllable internal
Syllable external
Context
Formant ratio
Vocalic
Relative patterns
Prosodic
Gender
F0
Tonal
Auditory gestalts
Visual information
13Influencing parameters and instruments for vowel
normalization
- Formant ratio
- Vowels are relative patterns - no absolute
frequencies
14Influencing parameters and instruments for vowel
normalization
- Formant ratio
- Fig. 2 Spectrogram of a man and a woman saying
cat. The three lowest vowel formants (vocal
tract resonant frequencies are marked as F1, F2
and F3) (Johnson 2004)
15Influencing parameters and instruments for vowel
normalization
- F0
- Miller 1953
- doubled F0 and found vowel category shift for
most American English vowels - Fujisaki Kawashime 1968
- Found F1 boundary shifts from 100Hz to 200Hz for
F0 shifts of 200Hz
16Influencing parameters and instruments for vowel
normalization
Syllable internal
Syllable external
Context
Formant ratio
Vocalic
Relative patterns
Prosodic
Gender
F0
Tonal
Auditory gestalts
Visual information
Articulatory gestures
Gender / Age
17Influencing parameters and instruments for vowel
normalization
- Visual information
- Gender boundary shift much like the F0 shift
(Strand Johnson 1996) - Age
- Vowel quality boundary shift through differing
visual phonetic information (Johnson 1999) - Sociocultural Speech intelligibility is reduced,
when the voice is associated with an Asian
looking face (Rubin 1992)
18Influencing parameters and instruments for vowel
normalization
- Auditory gestalts - secondary cues
- Duration
- Formant frequency movement trajectories
- Lehiste Metzger 1973
- Fixed duration vowels synthesized with
steady-state formant frequencies (51 correct) - - mixed lists of the original vowels from men,
women and children 79 correct. - Hillenbrand Neary 1999
- Flat-formant vowels were correctly identified 74
of the time, while vowels synthesized with the
original formant frequency trajectories were
correctly identified 89 of the time.
19Contents
- Speech and speaker normalization in vowel
normalization definition - Influencing parameters and instruments for vowel
normalization - Theories
- 3.1 Vocal tract normalization (VTN)
- 3.2 Talker normalization (TN)
- 4. Studies Johnson 1990 and 1999
- 5. Recapitulation
20Theories - VTN
- Vocal tract normalization theories consider that
listeners perceptually evaluate vowels on a
talker specific coordinate system. (Johnson
2004) - Context vowels (reference)
- Visual information about the size of the vocal
tract
21Theories - VTN
- But Talkers may differ from each other at the
level of their articulatory habits of speech - Perception may not be able to depend on vocal
tract normalization to remove talker
differences by removing vocal tract differences.
(Johnson 2004) - ? Speaker/speech variation depends on anatomical
differences only?
22Theories - VTN
- Cross-linguistic gender differences
- Bladon, Henton and Pickering (1984)
- The difference between men and women vary from
language to language. - Cultural factors are involved in defining and
shaping male or female speech - Anatomy does not completely determine the vowel
formant frequencies
23Theories - VTN
Fig. 3 Spectral shift needed to normalize male
and female spectra From Bladon, Henton
Pickering (1984)
24Theories - VTN
- This seems to suggest that talkers choose
different styles of speaking as social, dialectal
gender markers. - A speaker normalization that removes vocal tract
differences will fail to account for the
linguistic categorical similarity of vowels that
are different due to different habits of
articulation. - (Johnson 2004)
25Theories - TN
- Talker normalization is subject to expectations
- Magnuson Nusbaum (1994) compared
- 1-voice with 2-voice instructions in a
mixed-talker and blocked-talker experiment. - Advantage of blocked-talker disappeared when
subjects didnt know about the different F0s of
the two voices. - Talker normalization is an active process
- Kato Kakehi (1988) Listener adaptation to
talker voice - Increase in recognition accuracy over the course
of 5 stimuli presented in noise
26Theories - TN
- In this approach, cognitive categories are
represented as collections of the stored
cognitive representations of experienced
instances of the category, - rather than as normalized abstract
representations from which category-internal
structure has been removed (Johnson 2004)
27Contents
- Speech and speaker normalization in vowel
normalization definition - Influencing parameters and instruments for vowel
normalization - Theories
- Studies
- 4.1 Johnson 1990
- 4.2 Johnson 1999
- 5. Recapitulation
28Studies
- The role of perceived speaker identity in F0
normalization of vowels (Johnson 1990) - Presentation of vowels from a hood-hud
continuum in two different intonational contexts
which were judged to have been produced by
different speakers, even though the F0 of the
test word was identical in the two contexts.
29Studies
- The role of perceived speaker identity in F0
normalization of vowels (Johnson 1990) - Shift in identification as a result of the
intonational context - which was interpreted as evidence for the role of
perceived speaker identity in vowel normalization
30Studies
- Auditory-visual integration of talker gender in
vowel perception (Johnson 1999) - Exp. 1 found, that the gender of
auditory-visually presented stimuli shift the
phoneme boundary of a vowel continuum - Exp. 2 found that visual phonetic information is
integrated in the boundary shift - Exp. 3 showed that listeners integrate abstract
gender information with phonetic information in
speech perception
31Contents
- Speech and speaker normalization in vowel
normalization definition - Influencing parameters and instruments for vowel
normalization - Theories
- Studies Johnson 1990 and 1999
- Recapitulation
32Recapitulation
- Great internal and external influence on the
perception (of vowels) - Explanation must integrate repeated learning
- Information on speaker identity influences the
perception (of vowels) - But Is the perception of speaker identity
influenced by certain components of the speech
signal? - May speaker identity be manipulated?
33References
- Bladon, R.A., Henton, C. G. Pickering, J. B.
(1984) Towards an auditory theory of speaker
normalization. Language Communication 4, 59-69. - Fujisaki, H. Kawashima, T. (1968) The roles of
pitch and higher formants in the perception of
vowels. IEEE Transactions on Audio and
Electroacoustics AU-16, 73-77. - Hillenbrand, J. M. Neary, T. M. (1999)
Identification of synthesized /hVd/ utterances
Effects of formant contour. J. Acoust. Soc. Am.
105, 3509-3523. - Ladefoged, P. Broadbent, D. E. (1957)
Information conveyed by vowels. J. Acoust. Soc.
Am. 29, 98-104 - Leather, J. (1983) Speaker normalization in the
perception of lexical tone. Journal of Phonetics
11, 373-382 - Lehiste, I. Metzger, D. (1973) Vowel and
speaker identification in natural and synthetic
speech. Language and Speech 16, 356-364. - Johnson, K., Strand, E. A. DImperio, M. (1999)
Auditory-visual integration of talker gender in
vowel perception. Journal of Phonetics 27,
359-384 - Johnson, K. (2004) Speaker normalization in
speech perception. Ohio State University - Johnson, K. (1990) The role of percieved speaker
identity in F0 normalization of vowels. J.
Acoust. Soc. Am. 88 642-654 - Kato, K Kakehi, K. (1988) Listener
adaptability to individual speaker differences in
monosyllabic speech perception. J. Acoust. Soc.
Of Japan 44, 180-186 - Magnuson, J. Nusbaum, H. (1994) Are
representations used for talker identification
available for talker normalization? Proceedings
of the International Conference on Spoken
Language Processing. - Miller, R. L. (1953) Auditory tests with
synthetic vowels. J. Acoust. Soc. Am. 25,
114-121. - Peterson, G. E. Barney, H. L. (1952) Control
methods used in the study of vowels. J. Acoust.
Soc. Am. 24, 175-184 - Rubin, D. L. (1992) Non-language factors
affecting undergraduates jedgements of
non-native English-speaking teaching assistants.
Research in Higher Education 33, 4. - Strand, E. A. Johnson, K. (1996) Gradient and
visual speaker normalization in the perception of
fricatives. In Natural languag processing and
speech technology results of the 3rd KONVENS
conference, Bielefeld, (D. Gibbon, Ed.), Berlin
Mouton de Gruyter (pp. 14-26).