Course Projects - PowerPoint PPT Presentation

1 / 50

About This Presentation

Title:

Course Projects

Description:

Spectrogram. First two Formants and amplitudes. Mel-Frequency Cepstral Coefficients ... Spectrogram. Classes. Speech in foreground. Speech in ... Spectrogram ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 51

Provided by: site4

Category:

more less

Transcript and Presenter's Notes

Title: Course Projects

1
Course Projects

Luc Lamarche

2
Outline

Part I
Course Intelligent Systems (Dr. Wail Gueaieb )
Project Sound Classification for Hearing Aid
Application
Part II
Course Adaptive Signal Processing (Dr. Claude
D'amours)
Project Adaptive Feedback Cancellation for
Hearing Aids

3
Sound Classification for Hearing Aid Application

Part 1
By Luc Lamarche

4
Outline

Introduction
Background Information
Implementation
Results
Conclusion

5
The Compromise

When a patient is fitted with a new hearing aid,
it is customized for the specific type of hearing
loss.
The hearing aid is programmed to optimize the
users speech intelligibility and sound quality.
It is however not possible to optimize both
measures for all environments. This means that a
compromised frequency response is used.
It is widely agreed that a hearing aid that
changes its algorithm for different environments
would significantly increase the users
satisfaction.

6
Introduction
Nature
Cocktail Party
7
Feature Extraction
Time Domain
Frequency Domain
Speech
Music
8
Feature Extraction

Overall input level
Fluctuation strength of the overall level
Spectral Center
Fluctuation strength of spectral center
Zero Crossing Ratio (ZCR)
Percentage of Low-Energy Frames
RMS of a Low-Pass Response
Spectral Flux (SP)
Mean an Variance of the Discrete Wavelet
Transform (DWT)
Difference of Maximum and Minimum Zero Crossings
Linear Predictor Coefficients (LPC)
High zero-crossing rate ratio (HZCRR)
Low Short-time energy ratio (LSTER)
LSP distance
Band periodicity (BP)
Noise frame ratio (NFR)
RMS
VDR
Silence Ratio

9
Feature Extraction

Overall input level
Fluctuation strength of the overall level
Spectral Center
Fluctuation strength of spectral center
Zero Crossing Ratio (ZCR)
Percentage of Low-Energy Frames
RMS of a Low-Pass Response
Spectral Flux (SP)
Mean an Variance of the Discrete Wavelet
Transform (DWT)
Difference of Maximum and Minimum Zero Crossings
Linear Predictor Coefficients (LPC)
High zero-crossing rate ratio (HZCRR)
Low Short-time energy ratio (LSTER)
LSP distance
Band periodicity (BP)
Noise frame ratio (NFR)
RMS
VDR
Silence Ratio

Spectral Flux (SP)
High zero-crossing rate ratio (HZCRR)
Low Short-time energy ratio (LSTER)
Sub-Band Energy Ratio (SBER)
Pitch
Salience of Pitch
Spectrogram

10
Classes

Speech in foreground
Speech in foreground mixed with speech in
background
Speech in foreground mixed with traffic noise
Speech in background
Traffic noise
Music
Nature
Alarm signals

11
Classes

Speech in foreground
Speech in foreground mixed with speech in
background
Speech in foreground mixed with traffic noise
Speech in background
Traffic noise
Music
Nature
Alarm signals

12
Feature Extraction

Small sample segments are taken of audio signals
for the calculations.
These segments are further divided into frames,
usually overlapping by 50.

Features are computed for each frame and
depending on the feature are combined into one
feature. (e.g Average of frames)

13
Zero Crossing Ratio (ZCR)

ZCR is defined as the number of times that a
signal changes signs in a frame.
Speech has generally a higher zero crossing
ratio since it is composed of alternating voiced
and unvoiced sounds in the syllable rate.
A more useful measure of the zero crossing ratio
is the High Zero-crossing Rate Ratio.

14
High Zero-Crossing Rate Ratio

HZCRR Is defined as the ratio of frames with ZCR
above 1.5 times the average ZCR.

42 samples of speech
Speech
44 samples of music
Music
Distribution for music and speech
15
Low Short-time Energy Ratio

LSTER It is defined as the ratio of frames with
an STE above 1.5 times the average STE.

Speech
Music
Figure distribution for LSTER for speech and
music
16
Spectral Flux (SF)

Measures the fluctuations in the spectrum between
two adjacent frames.
spectral flux is generally slightly higher for
speech than for music.
Speech frames will contain different phonemes
(which differ in spectrum).
Music will maintain its spectrum for a longer
period of time.

17
Spectral Flux (SF)
Speech
Music
18
PITCH

Pitch is defined as the fundamental frequency of
a human speech waveform.
It is calculated by computing the
autocorrelation lag with the largest energy

Figure autocorrelation of speech sample of
length 512
19
PITCH
Speech
Music
Figure Distribution of PITCH for speech and music
20
Salient of Pitch (SOP)

A second measure based on pitch.
It is defined as the ratio of the first peak
(pitch) value and the zero lag value of the
autocorrelation function.

Figure autocorrelation of speech sample of
length 512
21
Salient of Pitch (SOP)
Speech
Music
22
Sub-Band Energy Ratio (SBER)

To measure the SBER the spectrum is first divided
into four non-uniform sub-bands.
The four sub-bands are from 0,w0/8,
w0/8,w0/4,w0/4,w0/2 and w0/2,w0, where w0
is half of the sampling frequency.

Speech
Music
23
SBER2
Speech
Music
Figure x distribution of SBER2 for speech and
music
24
Spectrogram

The short-time Fourier transform is method to
transform a signal that is in time domain to a
time-frequency domain. It basically performs a
Fourier transform on a finite window centered at
t of a signal x(t).
Two features can be extracted
Mean of Spectrogram
Variance of Spectrogram

Var. Spec
Mean Spec
Speech
Music
25
Implementation
What type of Network? MLP
WHY?
Table 1 Results for the classification of speech
in foreground
Table 2 Results for the classification of
background noises
26
Implementation
Feature Extraction
Signal
27
Most Influential Feature
Using each feature as a single input to the ANN
the following results were observed
28
Most Influential Feature
Using each feature as a single input to the ANN
the following results were observed
29
Using 3 best features

The training error was reduced to 10.56.

Accuracy Speech 97.7 Music 89.3
Total 93.33
30
Comparison to other work

Dr. Fridjof Feldbursch reported Speech 96.3
and had 6 other classes (89.8) ,using 59
features.
M. Kashif Saeed Khan reported Total accuracy of
96.6 (Speech/Music)
L. Lu reported Speech97.45 Music 93.04
/Environment sound84.43 Total for speech/music
98.03

My results
Accuracy Speech 97.7 Music 89.3
Total 93.33
31
Conclusion

Implemented ANN performs relatively well compared
to others.
A classification rate of 100 is unreachable
because humans use context or background
knowledge to identify sounds, which our system
can not use.
Therefore still need remote or override switch
for hearing aid.
Further work would be to increase the number of
classes.

32
References

1M. K. S. Khan, W. G. Al-Khatib, M. Moinuddin,
Automatic classification of speech and music
using neural networks, in Proc. 2nd ACM Int.
Workshop on Multimedia databases, 2004, pp.
94-99.
2 L. Lu, H. Jiang, and H. J. Zhang, A robust
audio classification and segmentation method, in
Proc. 9th ACM Int. Conf. Multimedia, 2001, pp.
203211.
3 F. Feldbusch. Identification of Noises by
Neural Nets for Application in Hearing Aids. In
Second International ICSC Symposium on Neural
Computation NC 2000, pages 505 - 510, Berlin, May
23-26 2000.
4 Mingchun Liu Chunru Wan Lipo Wang,
Content-Based Audio Classification and Retrieval
Using A Fuzzy Logic System Towards Multimedia
Search Engines Vol. 6 (2002), Issue 5, pp
357-364, Journal of Soft Computing, Springer.

33
Adaptive Feedback Cancellation for Hearing Aids

PART 2

34
Outline

Introduction
Background Information
Adaptive filter used
Decorrelation stage
Implementation
Results
Conclusion

35
Introduction

One of the biggest problems that most hearing aid
users complain about is the howling noise of a
hearing aid when feedback is present.
Without a method to reduce the acoustic feedback,
the hearing aid user is forced to reduce the gain
of the hearing aid in order to eliminate such
problems.

36
Solution to problem

A solution to the acoustic feedback problem is
the use of an adaptive filter.
The adaptive filter models the feedback transfer
function.

37
Adaptive Filter

Z-1
Z-1
Z-1
wM-2
wM-1
w0
w1

S
S
S
38
Implementation

Adaptive filter used Normalized Least Mean
Square (NLMS)

1. Initialization 2. Filter output 3.
Compute 4.Update Weights
Where is the step size
39
Adaptive Filter

The adaptive filter will adapt its values to
minimize the Mean square of the signal e(n).

Minimize

In this system the signal e(n) contains not only
the feedback signal u(n), but also s(n), which we
do not want to be cancelled, thus a decorrelation
stage is needed.

40
Decorrelation

Based on the paper in 1 by H. Alfonso the
decorrelation method that was used is a frequency
compressor.
This decorrelation stage is added to the hearing
aid path as a preprocessor to the hearing aid.
In theory two pure tones that are of different
frequency are uncorrelated.

41
Decorrelation Stage
Frequency Compressor
where r is the compression ratio,
42
Frequency Compressor
Sine wave at 5000Hz
4700Hz
43
Simulation and Results

Simplifications
Feedback path was modeled as
Hearing Aid was modeled as
Adaptive Filter used
NLMS
32 taps
Conditions F(z) in real life changes with
environment therefore we want to have a fast
convergence time for adaptive filter

44
Results

Without adaptive filter
Feedback creates an unstable system causing a
howling noise.

45
Results

With Adaptive filter

46
Results

Convergence is approximately 5000 samples which
is 0.23 seconds at a sampling frequency of 22050
Hz

Figure 7 3 of the 32 adaptive filter weights
with
47
Results

Adaptive filter output compared to feedback signal

Figure 8 feedback signal u(n) compared with
adaptive filter output
(µ0.01)
48
Results

Varying Step size

Figure 9 b) MSE with µ0.05
Figure 9 a) e(n) with µ0.01
Figure 9 d) MSE with µ1
Figure 9 c) MSE with µ0.1
49
Conclusion

Must compromise between Convergence and quality
by choosing an appropriate step size µ.
Results showed that successfully reduced the
feedback.
Further work would be to use a varying feedback
transfer function to test the tracking ability
and to find an optimum step size µ.

50
References
1 Harry Alfonso L. Joson, Futoshi Asano, Yiti
Suzuki, and Toshio Sone. Adaptive feedback
cancellation with frequency compression for
hearing aids. In The Journal of the Acoustical
Society of America. Vol. 94, Issue 6, pp.
3248-3254, December 1993. 2 Proakis, J.G. and
Manolakis, D. G. Digital Signal Processing
Principal, Algorithms, and Applications. 3rd ed.
New Jersey Prentice Hall, 1996, pp. 782-792.

Write a Comment

User Comments (0)