Title: Sound Source Separation using 3D Correlogram, Fuzzy Logic, and Neural Networks
1Sound Source Separation using 3D
Correlogram,Fuzzy Logic, and Neural Networks
- A RESEARCH PROJECT
- Eduardo Dias Trama
2Table of Contents
- INTRODUCTION
- PROJECT OVERVIEW
- THE PREPROCESSOR
- THE LEARNING PROCESSOR
- THE SEPARATION PROCESSOR
- PROJECT EXPERIMENTS
- CONCLUSION
3INTRODUCTION
- Overview of sound source separation
- Sound separation methods
- Related applications of sound separation
4Overview of sound source separation
- What is sound separation?
- Psychoacoustic properties
- Timbre
- How can sound be modeled?
5Sound separation methods
- CASA (Computational Auditory scene Analysis),
Marrian - Spatial and Periodicity-and-Harmonicity
- CASA 3D Correlogram analysis
- Blind source separation and prediction-driven
6Related applications of sound separation
- Sound and voice recognition
- Noise removal
- Compression
7PROJECT OVERVIEW
- Overview
- Auditory model analysis
- Sound data library and classification
- Sound data matching
- Complete sound separation system
8Overview
- What is a piano sound?
- Memory
- Clustering
9Auditory model analysis
- Properties
- Grouping
- Past knowledge
- Correlation
10Sound data library and classification
- Sound memory
- How much information is needed for later
analysis? - Does it matter if audio data is compressed?
- Structure of classification
11Sound data matching
12Complete sound separation system
13THE PREPROCESSOR
- The Cochlea Filter Model
- Correlogram
- 3-D Correlogram
14The Cochlea Filter Model
- Filtering basilar membrane (BM)
- Detection inner hair cell (IHC)
- Compression automatic gain control (AGC)
- Cochleagram
15Lyon cochlear model
16Correlogram
- Short time auto-correlations of the neural firing
rates as a function of cochlear place (best
frequency) versus time - Correlogram movie
17Correlogram
- Speech processing
- Extract the formants of voiced and unvoiced
sounds - Short duration
- Auto-correlation window size
Window size
18Correlogram Frame
- Vertical axis shows low to high frequencies from
bottom to top - Horizontal axis represents the lag or time delay
19Correlogram Frame
- Dark areas in the image show activity in the
Correlogram frame - Vertical lines cochlear channels firing in the
same period
20Correlogram Frame
- Horizontal bands are indicators of large amounts
of energy within a frequency band
21Slaney, Lyon structure to compute a Correlogram
223-D Correlogram
- A series of Correlograms over time
- Frequency information comes from a cochlea filter
bank - A finite time/frequency analysis
- It depends on the initial time
23Daniel Ellis signal-processing front-end
implementation
24THE LEARNING PROCESSOR
- Creating the network input
- Classification
- Artificial neuron network fuzzy classification
25Creating the network input
- Responsible for learning each Correlogram frame
of a selected sound - It should be exposed to many small variations of
the target (selected) sound - The total number of neural nets (NN) is NN
FB x CF
26Signal path to the network input
27Classification
- Class
- Family
- Length
- Frequency range
- Number of Correlogram frames
- Sufficient to classify one particular sound
- Make the matching process faster
- Intensive parallel processing
28Figure of a parallel neural network classification
29Artificial neuron network fuzzy classification
- Fuzzy IF-THEN rules to describe a classifier
- An adaptive-network-based fuzzy classifier to
solve fuzzy classification problems - ANFIS (adaptive-network-based fuzzy inference
system)
30Block diagram of a general fuzzy inference system
31THE SEPARATION PROCESSOR
- Choosing method for sound matching
- The Matching Fuzzy Logic sound library
- Sound separation
32Choosing method for sound matching
- Preamble, search, matching and interpolation
- Target and precision
- Fuzzy clustering algorithms
33The Matching Fuzzy Logic sound library
- A set of fuzzy sound elements will be used for
matching (FIS) - The initial values for search need to be
determined by external inputs - ANFIS (Adaptive Neuro-Fuzzy Inference Systems)
34Sound separation
- Search, match and extract
- Step 1 Input process
- Step 2 Classification
- Step 3 Choosing what to separate
- Step 4 Dynamics and pitch extraction
- Step 5 Re-synthesis
35Step 1 Input process
- Analog to digital conversion
- Cochlea filter bank
- Cochleagram
- Correlogram frames
- Neuro-Fuzzy input matrix
36Step 2 Classification
37Step 3 Choosing what to separate
- Rule 1 Assume that human auditory system can
recognize one or more sounds from the audio input
mixture - Rule 2 One recognizable audio should be selected
for separation - Rule3 Assume that complete or partial
information of selected audio class must exist in
sound library
38Step 4 Dynamics and pitch extraction
39Step 5 Re-synthesis
- Re-synthesis of selected sound Correlogram
frames at unit pitch - Apply dynamics to each Correlogram frame
- Correlogram frame inversion
40PROJECT EXPERIMENTS
- Experiment setup
- Experiment procedures
- Experiment results
41Experiment setup
42Experiment procedures
- Recorded wave data5 sec. _at_ 44100 Hz sample rate,
16 bits resolution, and two channels (stereo) - Down-sampled to 11025 Hz to one channel
- Mixed combinations without delay
- Mixed combinations with 0.5 sec. delay
43Experiment results
- Single Sound Source
- Two sound source without delay
- Two sound source with delay
- Modeling ANFIS for Correlogram frames
- Correlogram frame channel training
(classification) - Correlogram frame channel evaluation (matching)
44Single Sound Source
45Single Sound Source
46Single Sound Source
47Two sound source without delay
48Two sound source without delay
49Two sound source without delay
50Two sound source with delay
51Two sound source with delay
52Two sound source with delay
53(No Transcript)
54Modeling ANFIS for Correlogram frames
55Correlogram frame channel training with ANFIS
56Generic training data matrix format for FIS
57Correlogram frame channel training
(classification)
58Correlogram frame channel evaluation (matching)
59Correlogram frame channel evaluation (matching)
60Correlogram frame channel evaluation (matching)
61CONCLUSION
- Experiment conclusions
- Limits on sound separation
- Multidimension to two dimension transform
- Final remarks
62Experiment conclusions
- Exhausting classification process
- Large amounts of data generated
- Correlogram frame clustering constraints
- Correlogram frame can change significantly just
by changing its time window analysis (delay)
63Limits on sound separation
- Human auditory limitations
- How advanced humanlike adaptive learning
techniques are - Experience and learning
64Multidimension to two dimension transform
- An audio signal is a non-linear function in a
multi-dimension space - A 3-D Correlogram can give one more dimension to
a given analysis but it is still not suitable for
every signal
65 A sound wave from its source to receiver
- An audio signal function will be far more
complex when its analysis includes the source of
generation, the medium and the receiver (ear)
66 Final remarks
- Sound separation is a problem not yet solved
- Proposed techniques are not suitable for
real-time analysis
67(No Transcript)