Content Based Audio Classification A Neural Network Approach - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Content Based Audio Classification A Neural Network Approach

Description:

Each sensation triggered through multiple neurons ... [3] Malcolm Slaney, Auditory Toolbox for Matlab, Interval Research Corporation, Version 2. ... – PowerPoint PPT presentation

Number of Views:313
Avg rating:3.0/5.0
Slides: 27
Provided by: admi996
Category:

less

Transcript and Presenter's Notes

Title: Content Based Audio Classification A Neural Network Approach


1
Content Based Audio Classification A Neural
Network Approach
  • Vikramjit Mitra, Chia-Jiu Wang
  • Spring 2005

2
Abstract
  • Aim Audio classification based on audio content.
  • Parallel Artificial Neural Networks (ANN) based
    architecture introduced.
  • Audio signals processed to extract features.
  • Features fed to parallel ANN architecture.
  • Genre classification accuracy 87.3

3
Acronyms
  • ANN Artificial Neural Network
  • MLP Multi-Layer Perceptron
  • GRBF Gaussian Radial Basis Function
  • MFCC Mel Frequency Cepstrum Coefficient
  • LPC Linear Predictive Coding
  • SVM Support Vector Machine
  • DWT Discrete Wavelet Transform
  • CWT Continuous Wavelet Transform
  • IE Inference Engine
  • DCT Discrete Cosine Transform
  • PA Polynomial Approximation
  • HMM Hidden Markov Model

4
1. Introduction
  • Internet contains huge number of audio-visual
    files.
  • Text searching possible, but
  • Multimedia search?
  • Current multimedia search based on file header
    information-
  • If header contains wrong information?
  • Proposed solution
  • Content based searching.
  • Multimedia file contents analyzed
  • Analysis results determine the category to which
    that file belongs

5
1.1 System Architecture
Audio file Identification and Reading
Parallel ANN Classifier
Inference Engine
Feature Extractor

. . . . . .
Input Audio
. . .
Final Decision
6
2.0 Audio Data
  • 2 types of audio data considered
  • MP3 (MPEG audio layer 3)
  • Wav
  • Sampling rate 44.1 KHz
  • Total Audio files 60
  • 6 Classes (Genres) Classical (CL), Hard Rock
    (HR), Jazz (JZ), Pop (PP), Rap (RP) and Soft Rock
    (SR).
  • Each file segmented into 5 windows (7.8 sec)
  • Number of windows 300 (50 per class).
  • Training 180 windows
  • Testing 120 windows

7
3.0 Features
  • 8 Feature vectors V1, V2, V3, V4, V5, V6, V7,
    V8
  • V1 ? LPC 1 estimates s(n) as a linear
    combination of previous samples s(n)
  • 9 Coefficients per window selected

8
  • DWT transform of windows using Haar Wavelet
  • V2 ? Mean (µ) and Variance (s2) of segments of
    DWT coefficients (8 values).
  • V3 ? Polynomial Approximation of DWT coeffs (9
    values)
  • DWT of a discrete signal sk given as
  • Dilation factor a 2j , translation factor
    b 2jn, j,n are Integers
  • Where Fa,b is the wavelet function derived from
    mother wavelet F

9
3 Level DWT decomposition tree
  • DWT down samples sk by 2, Gives approx coeff CA
    and detailed coeff CD vector

Haar
Daubechies
Symlet
10

V2
V3
11
  • Mel Frequency Cepstrum Coefficient (MFCC) 3
  • - Used as features for speech recognition
  • Steps-
  • Break the signal into frames (using hamming
    window).
  • Obtain the amplitude spectrum of each frame.
  • Take the logarithm (log10) of the amplitude
    spectrum.
  • Convert to Mel (perceptual based) spectrum.
  • Take the Discrete Cosine Transform (DCT) of the
    spectrum.
  • MFCC generate 13 coefficient sets
  • each coefficient set represent a subband spectrum
  • CA vector ? Haar DWT of each subband is
    generated.
  • Each CA vector coded into 5 coefficients by PA
  • 13 subbands give 65 coefficients ? V4

12

V4
13
  • V5 and V6 are similar to V2 and V3 but Wavelet
    used - Daubechies

14
  • V7 and V8 are similar to V2 and V3 but wavelet
    used Symlet

15
  • Feature Vector

16
4.0 Artificial Neural Networks
  • Adaptive, generally nonlinear learning agents
  • Built of different Processing Elements (PEs)
  • PEs Receive input from
  • External sources
  • Other PEs
  • Interconnection of PEs define the topology of ANN
  • Signal flowing through connection scaled by
    weight wij.
  • PEs sum the different contribution
  • Produces output as a nonlinear function of the
    sum
  • Output from a PE is processed by other PEs until
    the system output is generated

17
Desired Output
Output
Cost
Adaptive System
Input
Training Algorithm
Error
  • A Typical ANN Architecture

18
2 types of ANN studied
  • Multi-Layer Perceptron (MLP)
  • Layered Feed-forward network 2
  • Idial for pattern recognition or classification
  • Slow training
  • Lots of training data
  • Gaussian Radial Basis Function (GRBF)
  • Nonlinear hybrid networks 2
  • Uses Gaussian transfer function
  • Faster than MLP
  • Uses both supervised and unsupervised learning

19
  • 5. Proposed Classifier Architecture

ANN-1

Feature Vector V1
. . .
Inference Engine
ANN-2
Decision
Feature Vector V2

. . .
. . .
ANN-8
Feature Vector V8
. . .

20
  • Desired Output for each class

Rule Base for IE
21
  • 6.0 Results

22
  • Each ANNs were trained separately
  • Tested separately
  • Prediction accuracy low for certain classes
  • Parallel ANN architecture ? Average the
    prediction
  • Imitate human nervous system
  • Each sensation triggered through multiple neurons
  • Each ANN receives input from each unique feature
    vector.
  • Results from each ANN vector summed.
  • Processed by IE ? Final Decision

23
  • Prediction result example
  • Parallel ANN and IE prediction for each cases

24
  • Prediction result example

Prediction accuracy for each class
25
7.0 Conclusion
  • Parallel ANN (MLP based) accuracy ? 87.3
  • MLP performed better than GRBF
  • MLP easy to construct, train and test
  • PA coefficients ? poor features
  • Future research-
  • Other possible ANN architectures
  • Parametric Classifiers (Bayesian, SVM, HMM etc)
  • Better feature vectors probably more
    exploration on wavelets
  • Capability to read other audio file formats.

26
8.0 References
  • 1 B. Atal, and M. Schroeder, Predictive coding
    of speech signals and subjective error criteria,
    IEEE Transactions on Acoustics, Speech, and
    Signal Processing, Vol. 27, Issue 3, pp. 247254,
    June, 1979.
  • 2 J.C. Principe, N.R. Euliano and W.C.
    Lefebvre, Neural and Adaptive Systems
    Fundamentals through Simulations, John Wiley
    Sons, Inc., February 29, 2000.
  • 3 Malcolm Slaney, Auditory Toolbox for Matlab,
    Interval Research Corporation, Version 2.
Write a Comment
User Comments (0)
About PowerShow.com