Sound Source Separation using 3D Correlogram, Fuzzy Logic, and Neural Networks - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Sound Source Separation using 3D Correlogram, Fuzzy Logic, and Neural Networks

Description:

Sound Source Separation using 3D Correlogram, Fuzzy Logic, and Neural Networks A RESEARCH PROJECT Eduardo Dias Trama Table of Contents INTRODUCTION PROJECT OVERVIEW ... – PowerPoint PPT presentation

Number of Views:213
Avg rating:3.0/5.0
Slides: 68
Provided by: Eduard159
Category:

less

Transcript and Presenter's Notes

Title: Sound Source Separation using 3D Correlogram, Fuzzy Logic, and Neural Networks


1
Sound Source Separation using 3D
Correlogram,Fuzzy Logic, and Neural Networks
  • A RESEARCH PROJECT
  • Eduardo Dias Trama

2
Table of Contents
  • INTRODUCTION
  • PROJECT OVERVIEW
  • THE PREPROCESSOR
  • THE LEARNING PROCESSOR
  • THE SEPARATION PROCESSOR
  • PROJECT EXPERIMENTS
  • CONCLUSION

3
INTRODUCTION
  • Overview of sound source separation
  • Sound separation methods
  • Related applications of sound separation

4
Overview of sound source separation
  • What is sound separation?
  • Psychoacoustic properties
  • Timbre
  • How can sound be modeled?

5
Sound separation methods
  • CASA (Computational Auditory scene Analysis),
    Marrian
  • Spatial and Periodicity-and-Harmonicity
  • CASA 3D Correlogram analysis
  • Blind source separation and prediction-driven

6
Related applications of sound separation
  • Sound and voice recognition
  • Noise removal
  • Compression

7
PROJECT OVERVIEW
  • Overview
  • Auditory model analysis
  • Sound data library and classification
  • Sound data matching
  • Complete sound separation system

8
Overview
  • What is a piano sound?
  • Memory
  • Clustering

9
Auditory model analysis
  • Properties
  • Grouping
  • Past knowledge
  • Correlation

10
Sound data library and classification
  • Sound memory
  • How much information is needed for later
    analysis?
  • Does it matter if audio data is compressed?
  • Structure of classification

11
Sound data matching
12
Complete sound separation system
13
THE PREPROCESSOR
  • The Cochlea Filter Model
  • Correlogram
  • 3-D Correlogram

14
The Cochlea Filter Model
  • Filtering basilar membrane (BM)
  • Detection inner hair cell (IHC)
  • Compression automatic gain control (AGC)
  • Cochleagram

15
Lyon cochlear model
16
Correlogram
  • Short time auto-correlations of the neural firing
    rates as a function of cochlear place (best
    frequency) versus time
  • Correlogram movie

17
Correlogram
  • Speech processing
  • Extract the formants of voiced and unvoiced
    sounds
  • Short duration
  • Auto-correlation window size

Window size
18
Correlogram Frame
  • Vertical axis shows low to high frequencies from
    bottom to top
  • Horizontal axis represents the lag or time delay

19
Correlogram Frame
  • Dark areas in the image show activity in the
    Correlogram frame
  • Vertical lines cochlear channels firing in the
    same period

20
Correlogram Frame
  • Horizontal bands are indicators of large amounts
    of energy within a frequency band

21
Slaney, Lyon structure to compute a Correlogram
22
3-D Correlogram
  • A series of Correlograms over time
  • Frequency information comes from a cochlea filter
    bank
  • A finite time/frequency analysis
  • It depends on the initial time

23
Daniel Ellis signal-processing front-end
implementation
24
THE LEARNING PROCESSOR
  • Creating the network input
  • Classification
  • Artificial neuron network fuzzy classification

25
Creating the network input
  • Responsible for learning each Correlogram frame
    of a selected sound
  • It should be exposed to many small variations of
    the target (selected) sound
  • The total number of neural nets (NN) is NN
    FB x CF

26
Signal path to the network input
27
Classification
  • Class
  • Family
  • Length
  • Frequency range
  • Number of Correlogram frames
  • Sufficient to classify one particular sound
  • Make the matching process faster
  • Intensive parallel processing

28
Figure of a parallel neural network classification
29
Artificial neuron network fuzzy classification
  • Fuzzy IF-THEN rules to describe a classifier
  • An adaptive-network-based fuzzy classifier to
    solve fuzzy classification problems
  • ANFIS (adaptive-network-based fuzzy inference
    system)

30
Block diagram of a general fuzzy inference system
31
THE SEPARATION PROCESSOR
  • Choosing method for sound matching
  • The Matching Fuzzy Logic sound library
  • Sound separation

32
Choosing method for sound matching
  • Preamble, search, matching and interpolation
  • Target and precision
  • Fuzzy clustering algorithms

33
The Matching Fuzzy Logic sound library
  • A set of fuzzy sound elements will be used for
    matching (FIS)
  • The initial values for search need to be
    determined by external inputs
  • ANFIS (Adaptive Neuro-Fuzzy Inference Systems)

34
Sound separation
  • Search, match and extract
  • Step 1 Input process
  • Step 2 Classification
  • Step 3 Choosing what to separate
  • Step 4 Dynamics and pitch extraction
  • Step 5 Re-synthesis

35
Step 1 Input process
  • Analog to digital conversion
  • Cochlea filter bank
  • Cochleagram
  • Correlogram frames
  • Neuro-Fuzzy input matrix

36
Step 2 Classification
37
Step 3 Choosing what to separate
  • Rule 1 Assume that human auditory system can
    recognize one or more sounds from the audio input
    mixture
  • Rule 2 One recognizable audio should be selected
    for separation
  • Rule3 Assume that complete or partial
    information of selected audio class must exist in
    sound library

38
Step 4 Dynamics and pitch extraction
39
Step 5 Re-synthesis
  • Re-synthesis of selected sound Correlogram
    frames at unit pitch
  • Apply dynamics to each Correlogram frame
  • Correlogram frame inversion

40
PROJECT EXPERIMENTS
  • Experiment setup
  • Experiment procedures
  • Experiment results

41
Experiment setup
42
Experiment procedures
  • Recorded wave data5 sec. _at_ 44100 Hz sample rate,
    16 bits resolution, and two channels (stereo)
  • Down-sampled to 11025 Hz to one channel
  • Mixed combinations without delay
  • Mixed combinations with 0.5 sec. delay

43
Experiment results
  • Single Sound Source
  • Two sound source without delay
  • Two sound source with delay
  • Modeling ANFIS for Correlogram frames
  • Correlogram frame channel training
    (classification)
  • Correlogram frame channel evaluation (matching)

44
Single Sound Source
45
Single Sound Source
46
Single Sound Source
47
Two sound source without delay
48
Two sound source without delay
49
Two sound source without delay
50
Two sound source with delay
51
Two sound source with delay
52
Two sound source with delay
53
(No Transcript)
54
Modeling ANFIS for Correlogram frames
55
Correlogram frame channel training with ANFIS
56
Generic training data matrix format for FIS
57
Correlogram frame channel training
(classification)
58
Correlogram frame channel evaluation (matching)
59
Correlogram frame channel evaluation (matching)
60
Correlogram frame channel evaluation (matching)
61
CONCLUSION
  • Experiment conclusions
  • Limits on sound separation
  • Multidimension to two dimension transform
  • Final remarks

62
Experiment conclusions
  • Exhausting classification process
  • Large amounts of data generated
  • Correlogram frame clustering constraints
  • Correlogram frame can change significantly just
    by changing its time window analysis (delay)

63
Limits on sound separation
  • Human auditory limitations
  • How advanced humanlike adaptive learning
    techniques are
  • Experience and learning

64
Multidimension to two dimension transform
  • An audio signal is a non-linear function in a
    multi-dimension space
  • A 3-D Correlogram can give one more dimension to
    a given analysis but it is still not suitable for
    every signal

65
A sound wave from its source to receiver
  • An audio signal function will be far more
    complex when its analysis includes the source of
    generation, the medium and the receiver (ear)

66
Final remarks
  • Sound separation is a problem not yet solved
  • Proposed techniques are not suitable for
    real-time analysis

67
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com