Sound Source Separation using 3D Correlogram, Fuzzy Logic, and Neural Networks - PowerPoint PPT Presentation

1 / 67

About This Presentation

Title:

Sound Source Separation using 3D Correlogram, Fuzzy Logic, and Neural Networks

Description:

Sound Source Separation using 3D Correlogram, Fuzzy Logic, and Neural Networks A RESEARCH PROJECT Eduardo Dias Trama Table of Contents INTRODUCTION PROJECT OVERVIEW ... – PowerPoint PPT presentation

Number of Views:213

Avg rating:3.0/5.0

Slides: 68

Provided by: Eduard159

Category:

more less

Transcript and Presenter's Notes

Title: Sound Source Separation using 3D Correlogram, Fuzzy Logic, and Neural Networks

1
Sound Source Separation using 3D
Correlogram,Fuzzy Logic, and Neural Networks

A RESEARCH PROJECT
Eduardo Dias Trama

2
Table of Contents

INTRODUCTION
PROJECT OVERVIEW
THE PREPROCESSOR
THE LEARNING PROCESSOR
THE SEPARATION PROCESSOR
PROJECT EXPERIMENTS
CONCLUSION

3
INTRODUCTION

Overview of sound source separation
Sound separation methods
Related applications of sound separation

4
Overview of sound source separation

What is sound separation?
Psychoacoustic properties
Timbre
How can sound be modeled?

5
Sound separation methods

CASA (Computational Auditory scene Analysis),
Marrian
Spatial and Periodicity-and-Harmonicity
CASA 3D Correlogram analysis
Blind source separation and prediction-driven

6
Related applications of sound separation

Sound and voice recognition
Noise removal
Compression

7
PROJECT OVERVIEW

Overview
Auditory model analysis
Sound data library and classification
Sound data matching
Complete sound separation system

8
Overview

What is a piano sound?
Memory
Clustering

9
Auditory model analysis

Properties
Grouping
Past knowledge
Correlation

10
Sound data library and classification

Sound memory
How much information is needed for later
analysis?
Does it matter if audio data is compressed?
Structure of classification

11
Sound data matching
12
Complete sound separation system
13
THE PREPROCESSOR

The Cochlea Filter Model
Correlogram
3-D Correlogram

14
The Cochlea Filter Model

Filtering basilar membrane (BM)
Detection inner hair cell (IHC)
Compression automatic gain control (AGC)
Cochleagram

15
Lyon cochlear model
16
Correlogram

Short time auto-correlations of the neural firing
rates as a function of cochlear place (best
frequency) versus time
Correlogram movie

17
Correlogram

Speech processing
Extract the formants of voiced and unvoiced
sounds
Short duration
Auto-correlation window size

Window size
18
Correlogram Frame

Vertical axis shows low to high frequencies from
bottom to top
Horizontal axis represents the lag or time delay

19
Correlogram Frame

Dark areas in the image show activity in the
Correlogram frame
Vertical lines cochlear channels firing in the
same period

20
Correlogram Frame

Horizontal bands are indicators of large amounts
of energy within a frequency band

21
Slaney, Lyon structure to compute a Correlogram
22
3-D Correlogram

A series of Correlograms over time
Frequency information comes from a cochlea filter
bank
A finite time/frequency analysis
It depends on the initial time

23
Daniel Ellis signal-processing front-end
implementation
24
THE LEARNING PROCESSOR

Creating the network input
Classification
Artificial neuron network fuzzy classification

25
Creating the network input

Responsible for learning each Correlogram frame
of a selected sound
It should be exposed to many small variations of
the target (selected) sound
The total number of neural nets (NN) is NN
FB x CF

26
Signal path to the network input
27
Classification

Class
Family
Length
Frequency range
Number of Correlogram frames

Sufficient to classify one particular sound
Make the matching process faster
Intensive parallel processing

28
Figure of a parallel neural network classification
29
Artificial neuron network fuzzy classification

Fuzzy IF-THEN rules to describe a classifier
An adaptive-network-based fuzzy classifier to
solve fuzzy classification problems
ANFIS (adaptive-network-based fuzzy inference
system)

30
Block diagram of a general fuzzy inference system
31
THE SEPARATION PROCESSOR

Choosing method for sound matching
The Matching Fuzzy Logic sound library
Sound separation

32
Choosing method for sound matching

Preamble, search, matching and interpolation
Target and precision
Fuzzy clustering algorithms

33
The Matching Fuzzy Logic sound library

A set of fuzzy sound elements will be used for
matching (FIS)
The initial values for search need to be
determined by external inputs
ANFIS (Adaptive Neuro-Fuzzy Inference Systems)

34
Sound separation

Search, match and extract
Step 1 Input process
Step 2 Classification
Step 3 Choosing what to separate
Step 4 Dynamics and pitch extraction
Step 5 Re-synthesis

35
Step 1 Input process

Analog to digital conversion
Cochlea filter bank
Cochleagram
Correlogram frames
Neuro-Fuzzy input matrix

36
Step 2 Classification
37
Step 3 Choosing what to separate

Rule 1 Assume that human auditory system can
recognize one or more sounds from the audio input
mixture
Rule 2 One recognizable audio should be selected
for separation
Rule3 Assume that complete or partial
information of selected audio class must exist in
sound library

38
Step 4 Dynamics and pitch extraction
39
Step 5 Re-synthesis

Re-synthesis of selected sound Correlogram
frames at unit pitch
Apply dynamics to each Correlogram frame
Correlogram frame inversion

40
PROJECT EXPERIMENTS

Experiment setup
Experiment procedures
Experiment results

41
Experiment setup
42
Experiment procedures

Recorded wave data5 sec. _at_ 44100 Hz sample rate,
16 bits resolution, and two channels (stereo)
Down-sampled to 11025 Hz to one channel
Mixed combinations without delay
Mixed combinations with 0.5 sec. delay

43
Experiment results

Single Sound Source
Two sound source without delay
Two sound source with delay
Modeling ANFIS for Correlogram frames
Correlogram frame channel training
(classification)
Correlogram frame channel evaluation (matching)

44
Single Sound Source
45
Single Sound Source
46
Single Sound Source
47
Two sound source without delay
48
Two sound source without delay
49
Two sound source without delay
50
Two sound source with delay
51
Two sound source with delay
52
Two sound source with delay
53
(No Transcript)
54
Modeling ANFIS for Correlogram frames
55
Correlogram frame channel training with ANFIS
56
Generic training data matrix format for FIS
57
Correlogram frame channel training
(classification)
58
Correlogram frame channel evaluation (matching)
59
Correlogram frame channel evaluation (matching)
60
Correlogram frame channel evaluation (matching)
61
CONCLUSION

Experiment conclusions
Limits on sound separation
Multidimension to two dimension transform
Final remarks

62
Experiment conclusions

Exhausting classification process
Large amounts of data generated
Correlogram frame clustering constraints
Correlogram frame can change significantly just
by changing its time window analysis (delay)

63
Limits on sound separation

Human auditory limitations
How advanced humanlike adaptive learning
techniques are
Experience and learning

64
Multidimension to two dimension transform

An audio signal is a non-linear function in a
multi-dimension space
A 3-D Correlogram can give one more dimension to
a given analysis but it is still not suitable for
every signal

65
A sound wave from its source to receiver