Speech Signal Processing - PowerPoint PPT Presentation

About This Presentation
Title:

Speech Signal Processing

Description:

State of the art speech coders: CELP, sinusoidal ... Plosive. Nasal. Fricative. Retroflex. liquid. Lateral. liquid. Glide. iy, ih, ae, aa, ah, ao,ax, eh, ... – PowerPoint PPT presentation

Number of Views:2458
Avg rating:3.0/5.0
Slides: 19
Provided by: TMH6
Category:

less

Transcript and Presenter's Notes

Title: Speech Signal Processing


1
Speech Signal Processing
  • Lecturer Jonas SamuelssonTAs Barbara Resch
    and Jan PlasbergSpeech Processing Group
    (TSB)Dept. Signals, Sensors, and Systems (S3)

2
Algorithms(Programming)
PsychoacousticsRoom acousticsSpeech production
Speech Processing
Acoustics
SignalProcessing
InformationTheory
Phonetics
Fourier transformsDiscrete time filtersAR(MA)
models
EntropyCommunication theoryRate-distortion
theory
Statistical SPStochastic models
3
Topics, part I
  • Analysis of speech signals
  • Fourier analysis spectrogram
  • Autocorrelation pitch estimation
  • Linear prediction compression, recognition
  • Cepstral analysis pitch estimation, enhancement

4
Topics, part II
  • Speech compression.
  • Scalar quantization (PCM, DPCM).
  • (Transform Coding.)
  • Vector quantization.
  • State of the art speech coders CELP, sinusoidal

5
Topics, part III
  • Statistical modeling of speech.
  • Gaussian mixtures speaker identification.
  • Hidden Markov models speech recognition.

6
Topics, part IV
  • Speech enhancement
  • Microphone array processing.
  • Beamforming.
  • Blind signal separation (cocktail party).
  • Echo cancellation.
  • The LMS algorithm.
  • Noise suppression.
  • Spectral subtraction.
  • The Wiener filter.

7
Practicalities
  • 12 lectures, 12 exercises (48h altogether).
  • 4 compulsory (graded) assignments.
  • 1 written exam.
  • 4 study points awarded if success.
  • 4 pts 17 h/week.
  • Spoken Language Processing. A guide by Huang
    et. al. available at Kårbokhandeln.
  • Borrow headphones against 200 SEK deposit.
  • More info in syllabus and on http//www.s3.kth.se
    /speech/courses/2E1400/

8
Tools for Speech ProcessingPrerequisites
  • Fourier transform (continuous and discrete time,
    periodic and aperiodic signals).
  • Digital filter theory. Z-transform.
  • Random processes. Innovation processes, AR, MA.
    Filtering of stochastic signals.
  • Probability theory. ML and MMSE estimation.
  • And more cf. chapters 3 and 5 in Huang.

9
Speech Production
Lungs
10
Speech Sounds
  • Coarse classification with phonemes.
  • A phone is the acoustic realization of a phoneme.
  • Allophones are context dependent phonemes.

11
Phoneme Hierarchy
Speech sounds
Language dependent.About 50 in English.
Diphtongs
Vowels
Consonants
iy, ih, ae, aa, ah, ao,ax, eh,er, ow, uh, uw
ay, ey,oy, aw
Lateralliquid
Glide
l
Retroflexliquid
w, y
Plosive
Fricative
p, b, t,d, k, g
r
Nasal
f, v, th, dh,s, z, sh, zh, h
m, n, ng
12
Speech Waveform Characteristics
  • Loudness
  • Voiced/Unvoiced.
  • Pitch.
  • Fundamental frequency.
  • Spectral envelope.
  • Formants.

13
Speech Waveform Characteristics Cont.
Voiced Speech
Unvoiced Speech
/ih/
/s/
14
Short-Time Speech Analysis
  • Segments (or frames, or vectors) are typically of
    length 20 ms.
  • Speech characteristics are constant.
  • Allows for relatively simple modeling.
  • Often overlapping segments are extracted.

15
B1/N
B
B
B
B
16
The Spectrogram
  • A classic analysis tool.
  • Consists of DFTs of overlapping, and windowed
    frames.
  • Displays the distribution of energy in time and
    frequency.
  • is typically displayed.

17
The Spectrogram Cont.
18
Short time ACF
/m/
/s/
/ow/
ACF
DFT
Write a Comment
User Comments (0)
About PowerShow.com