Variability in the Speech Signal - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Variability in the Speech Signal

Description:

Variability in the Speech Signal. Why perfect speech recognition. is always ten years away. ... Level of education/social environment. Personal history ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 21
Provided by: speech
Category:

less

Transcript and Presenter's Notes

Title: Variability in the Speech Signal


1
Variability in the Speech Signal
  • Why perfect speech recognition
  • is always ten years away.
  • 11-752 Spring 2004
  • Antoine Raux

2
Outline
  • What is the speech signal?
  • What is in the speech signal?
  • Linguistic variability
  • Speaker variability
  • Task variability
  • Environmental variability

3
What is the speech signal?
  • A 1-dimension waveform analyzed in terms of
  • Spectrum
  • F0
  • Power
  • Duration (segments, pauses)

4
What information is in the Speech Signal?
  • Linguistic content
  • Phonemes, Words and Sentences
  • Prosody
  • Speaker characteristics
  • Gender
  • Dialect
  • Individual Differences

5
What is in the Speech Signal?
  • Task/State characteristics
  • Emotions
  • Lombard speech in noisy environment
  • Speaking style
  • Environment characteristics
  • Surrounding noise
  • Microphone/channel

6
Source-Channel Model of Speech Production
Linguistic Message
Channel
Speech Signal
7
Source-Channel Modelof Speech Production
Linguistic Message
Channel
Speech Signal
Speaker
Environment
Task
8
Source-Channel Model
Assumed to be Invariant!!
Channel
Linguistic Message
Speech Signal
Speaker
Environment
Task
9
Linguistic Variability
  • Different phonemes have different spectral
    characteristics (of course)
  • Coarticulation effect spectral characteristics
    of phonemes change depending on the neighboring
    phonemes
  • F0, duration, and power vary according to
    intonation and stress

10
Environment Variability
  • Non-speech events (usually not the focus of the
    task)
  • Noise at the source
  • Static noises (e.g. fan, engine)
  • Transient noises (e.g. door slam, other
    speakers)
  • Noise in the channel
  • Microphone buzz
  • Telephone/Cell Phone (limited bandwidth)
  • (Speech Enhancement Assessment Resource (SpEAR)
    Database. http//ee.ogi.edu/NSEL/. Beta Release
    v1.0. CSLU, Oregon Graduate Institute of Science
    and Technology. E. Wan, A. Nelson, and Rick
    Peterson.)

11
Speaker Variability Gender
  • Females usually have higher mean F0 than males
  • Other formants are often also higher for females
  • Some phonetic phenomena are more frequent in one
    gender than the other (e.g. in North American
    English, vocal creaks are more frequent for
    females than males)

12
Speaker Variability Gender
(A. Syrdal, Acoustic Variability in Spontaneous
Conversational Speech of American English
Talkers, ICSLP96)
13
Speaker Variability Dialect
  • Different dialects use different phonemes for the
    same word
  • e.g. British vs American better
  • Different dialects use different allophones for
    the same phoneme (in a given context)
  • e.g. Japanese accented vs American L/R
  • Differences in prosody

14
Speaker Variability Individual Differences
  • Physical constitution (lungs, vocal tract)
  • Level of education/social environment
  • Personal history
  • yield differences between the speech of different
    individuals.

15
Task/State Variability
  • Emotions
  • Irritation, frustration (e.g. dialogue systems)
  • Tiredness (e.g. at the end of long recording
    sessions)
  • Lombard speech (speech produced in noisy
    environments)
  • Energy shifts towards higher frequencies
  • Vowels get longer
  • (J.C. Junqua, The Lombard reflex and its role
    on human listeners and automatic speech
    recognizers, J. Acoust. Soc. Am., 1993)

16
Task/State Variability
  • Speaking style
  • Speech signal of the same speaker is different
    when reading a novel and having informal
    conversation.
  • Differences in formant positions can be
    significant (i.e. similar to inter-speaker)
  • Same for prosodic features (F0, duration)
  • (M. Abe, Speaking Styles Statistical Analysis
    and Synthesis by a Text-to-Speech System,
    Progress in Speech Synthesis, 1997)

17
Interaction Between Different Sources of
Variability
  • Examples
  • Effect of dialect depends on linguistic context
  • Effect of gender depends on dialect
  • Effect of emotion depends on gender
  • The components of the speech signal are HARD to
    separate

18
The Good Thing about Variability
Many types of information combined in the speech
signal
Many things to learn (about the speaker,
environment) just from the speech signal (or
combined with visual cues)!
19
Conclusion
  • Speech signal is analyzed in terms of spectrum,
    F0, duration and power
  • Language, Environment, Speaker and Task all
    affect one or more features
  • The impact on each source of variability depends
    on the others

20
Conclusion
  • This is why speech recognition is really
    difficult!
  • A speech processing system needs to either
  • Separate the uninteresting sources of
    variability from the interesting one(s)
  • OR
  • Work in limited conditions. Example
  • speech recognition fixed speaker, task, and
    environment
  • speaker recognition fixed linguistic content,
    task, and environment
Write a Comment
User Comments (0)
About PowerShow.com