Speech/Audio%20Signal%20Processing%20in%20MATLAB/Simulink - PowerPoint PPT Presentation

About This Presentation

Title:

Speech/Audio%20Signal%20Processing%20in%20MATLAB/Simulink

Description:

Speech/Audio Signal Processing in MATLAB/Simulink J.-S. Roger Jang ( ) CS Dept, Tsing-Hua Univ, Taiwan ( ) http://www.cs.nthu.edu.tw/~jang – PowerPoint PPT presentation

Number of Views:1049

Avg rating:3.0/5.0

Slides: 42

Provided by: KenH178

Category:

more less

Transcript and Presenter's Notes

Title: Speech/Audio%20Signal%20Processing%20in%20MATLAB/Simulink

1
Speech/Audio Signal Processing in MATLAB/Simulink
2006 Speech/Audio Signal Processing in
MATLAB/Simulink

J.-S. Roger Jang (???)
CS Dept, Tsing-Hua Univ, Taiwan
(???? ???)
http//www.cs.nthu.edu.tw/jang
jang_at_cs.nthu.edu.tw

2
About Me

Experiences
1993-1995 The MathWorks, Inc.
1995-now CS Dept., Tsing Hua Univ., Taiwan
Research interests
Speech/Audio Signal Processing, Fuzzy Logic,
Neural Networks, Pattern Recognition, Biometric
Identification, Document Classification,
Web-based Technologies
Programming languages
MATLAB, C, JavaScript, VBScript, Perl

3
Outline

Wave file manipulation
Reading, writing, recording ...
Time-domain processing
Delay, filtering, sptools
Frequency-domain processing
Spectrogram
Pitch determination
Auto-correlation, SIFT, AMDF, HPS ...
Others
Formant estimation, speech coding

4
Toolbox/Blockset Used

MATLAB
Simulink
Signal Processing Toolbox
DSP Blockset

5
MATLAB Primer

Before you start, you need to get familiar with
MATLAB. Please read MATLAB Primer at the
following page
http//neural.cs.nthu.edu.tw/jang/demo/demoDownloa
d.asp
Exercise
Please plot two curves ysin(2t) and ycos(3t)
in the same figure.
Please plot x vs. y where xsin(2t) and
ycos(3t).

6
To Read a Wave File

To read a MS .wav file (PCM format only) wavread
y wavread(file)
wavread(file, n1, n2)
y, fs, nbits, opts wavread(file)
wavread(file, n)
y, fs, nbits wavread(file)
If the wav file is stereo, y will be a two-column
matrix.

7
To Read a Wav File

Example (wavRead01.m)
y, fs wavread('singapore.wav')
plot((1length(y))/fs, y)
xlabel('Time in seconds')
ylabel('Amplitude')

Exercise
Plot the waveform of rrrrr.wav. Use MATLABs
zoom button to find the consecutive curling R
occurs.
Plot the two-channel waveform in flanger.wav.

8
Solution to the Previous Exercise

wavRead02.m
y, fs wavread(flanger.wav)
subplot(2,1,1), plot((1length(y))/fs, y(,1))
subplot(2,1,2), plot((1length(y))/fs, y(,2))

9
To Play Wav Files

To play sound using Windows audio output device
wavplay, sound, soundsc
wavplay(y, fs)
wavplay(y, fs, async) non-blocking call
wavplay(y, fs, sync) blocking call
sound(y, fs)
soundsc() autoscale the sound
Example (wavPlay01.m)
y, fs wavread(rrrrr.wav)
wavplay(y, fs)
Exercise
Follow the example to play flanger.wav.

10
To Read/Play Using DSP Blocks

To read/play sound using DSP Blockset
DSP Blockset/DSP Sources/From Wave File
DSP Blockset/DSP Sinks/To Wave Device
Example
Exercise
Create a model as shown above.

Frame-based operation!
11
Solution

Solution to the previous exercise
slWavFilePlay01.mdl

12
To Write a Wave File

To write MS wave files wavwrite
wavwrite(y, fs, nbits, wavefile)
nbits must be 8 or 16.
y must have two columns for stereo data.
Amplitude values outside -1,1 are clipped.
Example (wavWrite01.m)
y, fs wavread(rrrrr.wav)
wavwrite(y, fs1.2, 8, testout.wav)
!start testout.wav
Exercise
Try out the above example.

13
To Record a Wave File

To record wave files
1. Use the recording utility under WinXP.
2. Use wavrecord under MATLAB.
3. Use From Wave Device under Simulink, under
DSP Blocksets/Platform Specific IO/Windows
(Win32)
Example
1. Go ahead and try WinXP recording utility!
2. Try wavRecord01.m
3. Try slWavFileRecord01.mdl
Exercise
Try out the above examples.

14
Time-Domain Speech Signals

A typical time-domain plot of speech signals
Amplitude volume or intensity
Frequency pitch

15
Changing Wave Playback Param.

To control the play of a sound
Normal wavplay(y, fs)
High volume wavplay(2y, fs)
Low volume wavplay(0.5y, fs)
High pitch (and faster) wavplay(y, 1.2fs)
Low pitch (and slower) wavplay(y, 0.8fs)
Exercise
Try wavPlay01.m and trace the code.
Create wavPlay02.m such that you can record
your own voice on the fly.

16
Time-Domain Signal Processing

Take-home exrecise
How to get a high pitch with the same time span?

17
Synthetic Sounds

Use a sine wave generator (under DSP blocksets)
to produce sounds
Single frequency
Multiple frequencies
Amplitude modulation
Exercise
Create the above models.

18
Solution

Solution to the previous exercise
sineSource01
sineSource02
sineSource03

19
Delay in Speech/Audio

What is a delay in a signal?
y(n) --gt y(n-k)
What effects can delay generate?
Echo
Reverberation
Chorus
Flanging

20
Single Delay in Audio Signal

Block diagram

a
Input
Output
u(n)
y(n) u(n) au(n-k)
Simulink model
Exercise Create the above model.
21
Multiple Delay in Audio Signal

How to create karaoke effects

a
Input
Output y(n)
u(n)
2
3
y(n) u(n) a u(n-k) a u(n-2k) a u(n-3k) ...
Simulink model
22
Multiple Delay in Audio Signal

Parameter values
Feedback gain a lt 1
Actual delay time k/fs
Exercise
Create the above model and change some parameters
to see their effects.
Modify the model to take microphone input (so you
can start singing karaoke now!)
Use a configurable subsystem to include all
possible input files and the microphone. (See
next page.)

23
Multiple Delay in Audio Signal

How to use configurable subsystem block?
1. Create a library (say, wavinput.mdl)
2. Get a block of configurable subsystem
3. Fill the dialog box with the library name

24
Audio Flanging

Flanging sound
A sound similar to the sound of a jet plane
flying overhead, or a "whooshing" sound
Pitch modulation due to a variable delay
Simulink demo
dspafxf.mdl (all platforms)
dspafxf_nt.mdl (for 95/98/NT)

25
Audio Flanging

Simulink model

Original spectrogram
Modified spectrogram
26
Signal Processing Using sptool

To invoke sptool, type sptool.

27
Speech Production

How is speech produced?
Speech is produced when air is forced from the
lungs through the vocal cords (glottis) and along
the vocal tract.
Analogy to System Theory
Input air forced into the vocal cords
Output media vibration
System (or filter) vocal tract
Pitch frequency frequency of the input
Formant frequency resonant frequency

28
Source Filter Model of Speech

The source-filter model of speech production
Speech is split into a rapidly varying excitation
signal and a slowly varying filter. The envelope
of the power spectra contains the vocal tract
information.

Two important characteristics of the model are
fundamental (pitch) frequency (f0) and formants
(F1, F2, F3, )
29
Frame Analysis of Speech Signal
Speech wave form
Zoom in
Overlap
Frame
30
Spectrogram

Spectrogram (specgram.m) displays short-time
frequency contents

Wave form
Spectrogram
31
Real-time Spectrogram

Try dspstfft_win32

Spectrogram
Spectrum
32
Pitch and Formants

Pitch and formants can be defined visually

Pitch period 1/f0
First formant F1
Second formant F2
33
Spectrogram Reading

Spectrogram Reading
http//cslu.cse.ogi.edu/tutordemos/SpectrogramRead
ing/spectrogram_reading.html

Waveform
Spectrogram
compute
34
Pitch Determination Algorithms

Time-domain
Auto-correlation
AMDF (Average Magnitude Difference Function)
Gold-Rabiner algorithm (1969)
Frequency-domain
Cepstrum (Noll 1964)
Harmonic product spectrum (Schroeder 1968)
Others
SIFT (Simple inverse filter tracking)
Maximum likelihood
Neural network approach

35
Autocorrelation of Each Frame

Let s(k) be a frame of size 128.

1
128
s(k)
s(k-h)
h30
x(30) dot prod. of overlapped
sum(s(31128).s(199)
Autocorrelation x(h)
Pitch period
30
36
Autocorrelation via DSP Blockset

Real-time autocorrelation demo
Exercise
Construct the above model and try it.

37
Pitch Tracking via Autocorrelation

Real-time pitch tracking via autocorrelation
pitch2.mdl

38
Formant Analysis

Characteristics of formants
Formants are perceptually defined.
The corresponding physical property is the
frequencies of resonances of the vocal tract.
Formant analysis is useful as the position of the
first two formants pretty much identifies a
vowel.
Computation methods
Peak picking on the smoothed spectrum
Peak picking on the LP spectrum
Factoring for the LP roots
Fitting of mixture of Gaussians

39
Formant Analysis

Track Draw
A package for formant synthesis with options to
sketch formant tracks on a spectrogram.
http//www.utdallas.edu/assmann/TRACKDRAW/trackdr
aw.html
Formant Location Algorithm
MATLAB code by Michelle Jamrozik
http//ece.clemson.edu/speech/files.htm

40
Speech Waveform Coding

Time domain coding
PCM Pulse Code Modulation
DPCM Differential PCM
ADPCM Adaptive Differential PCM (dspadpcm.mdl)
Frequency domain coding
Sub-band coding
Transform coding
Speech Coding in MATLAB
http//www.eas.asu.edu/speech/education/educ1.htm
l

41
Conclusions

Ideal tools for speech/audio signal processing
MATLAB
Simulink
Signal Processing Toolbox
DSP Blockset
Advantages
Reliable functions well-established and tested
Visible graphical algorithm design tools
High-level programming language yet C-compatible
Powerful visualization capabilities
Easy debugging
Integrated environment

42
References

1 Discrete-Time Processing of Speech Signals,
by Deller, Proakis and Hansen, Prentice
Hall, 1993
2 Fundamentals of Speech Recognition, by
Rabiner and Juang, Prentice Hall, 1993
3 Effects Explained, http//www.harmony-centra
l.com/Effects/effects-explained.html
4 TrackDraw, http//www.utdallas.edu/assmann/
TRACKDRAW/trackdraw.html
5 Speech Coding in MATLAB, http//www.eas.asu.
edu/speech/education/educ1.html