Title: System architecture for Pattern Recognition in Eco Systems
1System architecture for Pattern Recognition in
Eco Systems
- Benjamín Dugnol ÁlvarezCarlos Fernández
GarcíaDepartamento de MatemáticasUniversidad de
Oviedo
2Abstract The purpose of the present work is the
count and classification of the individuals in
the wolf packs by processing its audio signals,
supposing that we have recordings of sufficient
temporary length, obtained with a single
microphone. We will set out an architecture that
includes the treatment of the environmental
background noise, the separation of signals and
its classification. Keywords biological
signals, wolf pack size, spectral subtraction,
monaural signal separation, signal features,
cepstral analysis, signal classification.
31. Motivation of the proposal. 2. The music of
the wolves. 3. Introduction and
state-of-the-art 4. Proposed architecture. 5.
Processing of the background noise. 6.
Classification of the signal segments. 7.
Processing segments of one voice. 8. Processing
segments of two voices. 9. Characterization,
recognition and estimation. 10. Conclusions and
future work. 11. References.
4Motivation The problem
Usually wolves coexist in packs made up of a
number between 2 and 10 individuals. How many
wolves are there in the pack? It is hard to make
a direct estimation of the pack size due to the
difficulties of human approach to the group and
still preserve its integrity.
wolf pack
5Motivation The problem
The wolf pack generates acoustic information when
it wants to communicate with another pack
6Motivation The answer
Communication is basically made using different
sounds like howls, barks, whispers and
growls. The howls of pack members can be heard
about 15 km away.
Can any wolf be recognized through its howl?
7Motivation The answer
A solution can be found using DSP. The sound
transmitted by the wolf pack is processed, in
order to extract data on the sound that
corresponds to each animal.
8Motivation The answer
First, the biologist capture the sound
9Motivation The answer
Then, the engineers digitize the signal
10Motivation The answer
Finally, the engineers (using Matlab) process
the signal.
Pack size estimation
11Motivation The answer
Wolf howls function both to decrease and increase
distance between communicating individuals and,
as such, might be expected to provide information
on individual identity.
12Motivation The answer
Authors disagree on the presence of vocal
signatures in wolf howls.
But, I recognize my dog from its acoustic sound!
13Motivation The answer
14The music of the wolves
The adults The howl signals are simple. They
can be broadly described as loud, continuous,
tonal sounds with a fundamental frequency between
150 and 800 Hz. Often the pitch is constant or
piecewise-constant in long temporary segments or
with smooth variation by pieces. Sometimes it
shows a composition of several consecutive
connected segments or with very brief
interruptions.
15The music of the wolves
16The music of the wolves
The pups The pups (4/6 months), howl very often,
answering any other howl they listen. The
signals are usually made up of shorter segments
than the adults segments and with bigger average
fundamental frequencies.
17The music of the wolves
18The music of the wolves
The chorus It begins with a single howl, which is
relatively simple in structure. After a second or
two, a second wolf joins, followed by one or two
more before the rest of the pack follows
virtually in masse. This accelerating start makes
it possible to pick out the first three or four
individuals but, after that, too many begin
howling at once to count them.
19The music of the wolves
20Proposed architecture
21Proposed architecture
The record obtained using a single microphone is
digitized to a 44100 Hertz. The reduction of
the background noise is made. Signal
segmentation (Hanning window with 50
overlapping).
22Proposed architecture
Segments without underlying signal and with
residual background noise.
0
Segments that possibly contain one individual
voice and a residual background noise .
The output is a collection of signal segments
classified following a content based criterion.
I
Segments that possibly contain a mixture of two
individuals voices and a residual background
noise.
II
Segments that contains a mixture of three or more
individuals voices.
III
23Proposed architecture
0
Nothing
I
One signal
Separation of the signals contained in each
segment.
II
Two signals
III
Nothing
24Proposed architecture
Signals
Features
Classification
Pack size estimation
25Processing the background noise
We will use the Boll algorithm for the
suppression of acoustic noise in the speech
signal. This algorithm operates a spectral
subtraction by an efficient way.
Boll79 Boll, S.F. Suppression of Acoustic
Noise in Speech using Spectral Subtraction. IEEE
Trans. Acoust., Speech and Signal Processing,
vol. 27, pp. 113-120, 1979.
26Processing of the background noise
- Estimation of the spectrum of the signal by using
STFT, - Estimation of the noise spectrum,
- Spectral subtraction,
- Reduction of the residual noise,
- Suppression of the noise in segments of
inactivity of the underlying signal, - Synthesis of the clean signal.
27Processing of the background noise
Matlab Demo
28Classification of the segments
Type 0 segments
First, VAD detectsegments withoutunderlying
signal
Types I,II,III segments
Next, segments are classified using a
multipitch estimation technique.
Tolonen00 Tolonen, T., Karjalainen, M. A
Computationaly Efficient Multipitch Analysis
Model. IEEE Transactions on Speech and Audio
Processing, Vol. 8, No. 6, november 2000.
29Processing segments with one voice
- The mathematical model adopted to describe the
signal in this segments is
30Processing segments with one voice
- The signal extraction results from both
- State estimation using a Kalman recursion
- Model parameter estimation using
(Expectation-Maximization) EM algorithm.
31Processing segments with one voice
32Processing segments with two voices
- Now we have two signals s1, s2.
- For both we suppose a gaussian model.
- We know the observed data y s1s2.
- We calculate the MAP estimate for s1.
Godsill97 Godsill S.J.,Tan,CH. Removal of low
frequency transient noise from old recordings
using model-based signal separation techniques.
In Proc. IEEE Workshop on Audio and Acoustics,
Mohonk, NY State, Mohonk, NY State, October 1997.
33Characterization
- A1 Linear prediction coefficients of AR model,
or WLPC (warped) model,A2 Cepstral
coefficients also with the Mel frequency
scale,A3 Delta-cepstral coefficients, A4
Impulse response h(n) of AR model,A5 Spectral
centroid,A6 Onset segment duration,A7
Amplitude envelope,A8 Amplitude modulation,A9
Fundamental frequency,A101 and 2 harmonic,A11
Frequency modulation,A12 Spectral ratio,A13
Normalized energy.
34Classification
- Based on features, the signal classification has
two different directions - First, we design a two classes mechanism adults
and pups. - Next we build an initial class and another new
class for each individual which not belong to an
old class.
35Estimation
- The found number of different classes is the pack
size estimation. - The number of individuals of classes pups and
adults shows the pack structure.
36Future work
- The task of audio signal separation with a single
channel is a very complex problem.
37Future work
- 1. The signal capture using two microphones is
very relevant. This improvement reduces the
problem indetermination, simplifying the
background noise processing and allow the echo
signal processing. - 2. The background noise processing can be done
with wavelet transform. - 3. The background noise processing can be
realized with Kalman recursion. - 4. The separation task, the most difficult step,
can be realized with overcomplete signal
dictionaries, like Best Basis, Basis Pursuit or
Matching Pursuit.
38Future work
- 4. Following the human auditory system, Stéphane
Maes Maes96 suggests a method of nonlinear
squeezing to derive the amplitude and phase
components of the signal and then to derive
signal features 'wastrum' instead of cepstrum. - 5. The linear model selected to describe signals
can be enhanced considering a more sophisticated
representation for excitation signal.
39System architecture for Pattern Recognition in
Echo Systems
- (The end)
- You can go to
- http//coco.ccu.univi.es