Title: Course Projects
1Course Projects
2Outline
- Part I
- Course Intelligent Systems (Dr. Wail Gueaieb )
- Project Sound Classification for Hearing Aid
Application - Part II
- Course Adaptive Signal Processing (Dr. Claude
D'amours) - Project Adaptive Feedback Cancellation for
Hearing Aids
3Sound Classification for Hearing Aid Application
4Outline
- Introduction
- Background Information
- Implementation
- Results
- Conclusion
5The Compromise
- When a patient is fitted with a new hearing aid,
it is customized for the specific type of hearing
loss. - The hearing aid is programmed to optimize the
users speech intelligibility and sound quality. - It is however not possible to optimize both
measures for all environments. This means that a
compromised frequency response is used. - It is widely agreed that a hearing aid that
changes its algorithm for different environments
would significantly increase the users
satisfaction.
6Introduction
Nature
Cocktail Party
7Feature Extraction
Time Domain
Frequency Domain
Speech
Music
8Feature Extraction
- Overall input level
- Fluctuation strength of the overall level
- Spectral Center
- Fluctuation strength of spectral center
- Zero Crossing Ratio (ZCR)
- Percentage of Low-Energy Frames
- RMS of a Low-Pass Response
- Spectral Flux (SP)
- Mean an Variance of the Discrete Wavelet
Transform (DWT) - Difference of Maximum and Minimum Zero Crossings
- Linear Predictor Coefficients (LPC)
- High zero-crossing rate ratio (HZCRR)
- Low Short-time energy ratio (LSTER)
- LSP distance
- Band periodicity (BP)
- Noise frame ratio (NFR)
- RMS
- VDR
- Silence Ratio
9Feature Extraction
- Overall input level
- Fluctuation strength of the overall level
- Spectral Center
- Fluctuation strength of spectral center
- Zero Crossing Ratio (ZCR)
- Percentage of Low-Energy Frames
- RMS of a Low-Pass Response
- Spectral Flux (SP)
- Mean an Variance of the Discrete Wavelet
Transform (DWT) - Difference of Maximum and Minimum Zero Crossings
- Linear Predictor Coefficients (LPC)
- High zero-crossing rate ratio (HZCRR)
- Low Short-time energy ratio (LSTER)
- LSP distance
- Band periodicity (BP)
- Noise frame ratio (NFR)
- RMS
- VDR
- Silence Ratio
- Spectral Flux (SP)
- High zero-crossing rate ratio (HZCRR)
- Low Short-time energy ratio (LSTER)
- Sub-Band Energy Ratio (SBER)
- Pitch
- Salience of Pitch
- Spectrogram
10Classes
- Speech in foreground
- Speech in foreground mixed with speech in
background - Speech in foreground mixed with traffic noise
- Speech in background
- Traffic noise
- Music
- Nature
- Alarm signals
11Classes
- Speech in foreground
- Speech in foreground mixed with speech in
background - Speech in foreground mixed with traffic noise
- Speech in background
- Traffic noise
- Music
- Nature
- Alarm signals
12Feature Extraction
- Small sample segments are taken of audio signals
for the calculations. - These segments are further divided into frames,
usually overlapping by 50.
- Features are computed for each frame and
depending on the feature are combined into one
feature. (e.g Average of frames)
13Zero Crossing Ratio (ZCR)
- ZCR is defined as the number of times that a
signal changes signs in a frame. - Speech has generally a higher zero crossing
ratio since it is composed of alternating voiced
and unvoiced sounds in the syllable rate. - A more useful measure of the zero crossing ratio
is the High Zero-crossing Rate Ratio.
14High Zero-Crossing Rate Ratio
- HZCRR Is defined as the ratio of frames with ZCR
above 1.5 times the average ZCR.
42 samples of speech
Speech
44 samples of music
Music
Distribution for music and speech
15Low Short-time Energy Ratio
- LSTER It is defined as the ratio of frames with
an STE above 1.5 times the average STE.
Speech
Music
Figure distribution for LSTER for speech and
music
16Spectral Flux (SF)
- Measures the fluctuations in the spectrum between
two adjacent frames. - spectral flux is generally slightly higher for
speech than for music. - Speech frames will contain different phonemes
(which differ in spectrum). - Music will maintain its spectrum for a longer
period of time.
17Spectral Flux (SF)
Speech
Music
18PITCH
- Pitch is defined as the fundamental frequency of
a human speech waveform. - It is calculated by computing the
autocorrelation lag with the largest energy
Figure autocorrelation of speech sample of
length 512
19PITCH
Speech
Music
Figure Distribution of PITCH for speech and music
20Salient of Pitch (SOP)
- A second measure based on pitch.
- It is defined as the ratio of the first peak
(pitch) value and the zero lag value of the
autocorrelation function.
Figure autocorrelation of speech sample of
length 512
21Salient of Pitch (SOP)
Speech
Music
22Sub-Band Energy Ratio (SBER)
- To measure the SBER the spectrum is first divided
into four non-uniform sub-bands. - The four sub-bands are from 0,w0/8,
w0/8,w0/4,w0/4,w0/2 and w0/2,w0, where w0
is half of the sampling frequency.
Speech
Music
23SBER2
Speech
Music
Figure x distribution of SBER2 for speech and
music
24Spectrogram
- The short-time Fourier transform is method to
transform a signal that is in time domain to a
time-frequency domain. It basically performs a
Fourier transform on a finite window centered at
t of a signal x(t). - Two features can be extracted
- Mean of Spectrogram
- Variance of Spectrogram
Var. Spec
Mean Spec
Speech
Music
25Implementation
What type of Network? MLP
WHY?
Table 1 Results for the classification of speech
in foreground
Table 2 Results for the classification of
background noises
26Implementation
Feature Extraction
Signal
27Most Influential Feature
Using each feature as a single input to the ANN
the following results were observed
28Most Influential Feature
Using each feature as a single input to the ANN
the following results were observed
29Using 3 best features
- The training error was reduced to 10.56.
Accuracy Speech 97.7 Music 89.3
Total 93.33
30Comparison to other work
- Dr. Fridjof Feldbursch reported Speech 96.3
and had 6 other classes (89.8) ,using 59
features. - M. Kashif Saeed Khan reported Total accuracy of
96.6 (Speech/Music) - L. Lu reported Speech97.45 Music 93.04
/Environment sound84.43 Total for speech/music
98.03
My results
Accuracy Speech 97.7 Music 89.3
Total 93.33
31Conclusion
- Implemented ANN performs relatively well compared
to others. - A classification rate of 100 is unreachable
because humans use context or background
knowledge to identify sounds, which our system
can not use. - Therefore still need remote or override switch
for hearing aid. - Further work would be to increase the number of
classes.
32References
- 1M. K. S. Khan, W. G. Al-Khatib, M. Moinuddin,
Automatic classification of speech and music
using neural networks, in Proc. 2nd ACM Int.
Workshop on Multimedia databases, 2004, pp.
94-99. - 2 L. Lu, H. Jiang, and H. J. Zhang, A robust
audio classification and segmentation method, in
Proc. 9th ACM Int. Conf. Multimedia, 2001, pp.
203211. - 3 F. Feldbusch. Identification of Noises by
Neural Nets for Application in Hearing Aids. In
Second International ICSC Symposium on Neural
Computation NC 2000, pages 505 - 510, Berlin, May
23-26 2000. - 4 Mingchun Liu Chunru Wan Lipo Wang,
Content-Based Audio Classification and Retrieval
Using A Fuzzy Logic System Towards Multimedia
Search Engines Vol. 6 (2002), Issue 5, pp
357-364, Journal of Soft Computing, Springer.
33Adaptive Feedback Cancellation for Hearing Aids
34Outline
- Introduction
- Background Information
- Adaptive filter used
- Decorrelation stage
- Implementation
- Results
- Conclusion
35Introduction
- One of the biggest problems that most hearing aid
users complain about is the howling noise of a
hearing aid when feedback is present. - Without a method to reduce the acoustic feedback,
the hearing aid user is forced to reduce the gain
of the hearing aid in order to eliminate such
problems.
36Solution to problem
- A solution to the acoustic feedback problem is
the use of an adaptive filter. - The adaptive filter models the feedback transfer
function.
37Adaptive Filter
Z-1
Z-1
Z-1
wM-2
wM-1
w0
w1
S
S
S
38Implementation
- Adaptive filter used Normalized Least Mean
Square (NLMS)
1. Initialization 2. Filter output 3.
Compute 4.Update Weights
Where is the step size
39Adaptive Filter
- The adaptive filter will adapt its values to
minimize the Mean square of the signal e(n).
Minimize
- In this system the signal e(n) contains not only
the feedback signal u(n), but also s(n), which we
do not want to be cancelled, thus a decorrelation
stage is needed.
40Decorrelation
- Based on the paper in 1 by H. Alfonso the
decorrelation method that was used is a frequency
compressor. - This decorrelation stage is added to the hearing
aid path as a preprocessor to the hearing aid. - In theory two pure tones that are of different
frequency are uncorrelated.
41Decorrelation Stage
Frequency Compressor
where r is the compression ratio,
42Frequency Compressor
Sine wave at 5000Hz
4700Hz
43Simulation and Results
- Simplifications
- Feedback path was modeled as
- Hearing Aid was modeled as
- Adaptive Filter used
- NLMS
- 32 taps
- Conditions F(z) in real life changes with
environment therefore we want to have a fast
convergence time for adaptive filter
44Results
- Without adaptive filter
- Feedback creates an unstable system causing a
howling noise.
45Results
46Results
- Convergence is approximately 5000 samples which
is 0.23 seconds at a sampling frequency of 22050
Hz
Figure 7 3 of the 32 adaptive filter weights
with
47Results
- Adaptive filter output compared to feedback signal
Figure 8 feedback signal u(n) compared with
adaptive filter output
(µ0.01)
48Results
Figure 9 b) MSE with µ0.05
Figure 9 a) e(n) with µ0.01
Figure 9 d) MSE with µ1
Figure 9 c) MSE with µ0.1
49Conclusion
- Must compromise between Convergence and quality
by choosing an appropriate step size µ. - Results showed that successfully reduced the
feedback. - Further work would be to use a varying feedback
transfer function to test the tracking ability
and to find an optimum step size µ.
50References
1 Harry Alfonso L. Joson, Futoshi Asano, Yiti
Suzuki, and Toshio Sone. Adaptive feedback
cancellation with frequency compression for
hearing aids. In The Journal of the Acoustical
Society of America. Vol. 94, Issue 6, pp.
3248-3254, December 1993. 2 Proakis, J.G. and
Manolakis, D. G. Digital Signal Processing
Principal, Algorithms, and Applications. 3rd ed.
New Jersey Prentice Hall, 1996, pp. 782-792.