BiologicallyInspired Audio Coding - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

BiologicallyInspired Audio Coding

Description:

Used in Ipods, cell phones, DVDs, blu-rays, TVs, computers, etc. ... Coding Results for Percussion. Spikes Before Masking. Spikes After Masking. Spike Gain ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 21
Provided by: wwweduGel
Category:

less

Transcript and Presenter's Notes

Title: BiologicallyInspired Audio Coding


1
Biologically-Inspired Audio Coding
  • Ramin Pishehvar
  • Advanced Audio Systems, VPBT

2
Plan
  • What is audio coding?
  • What is sparse coding?
  • Why sparse coding?
  • Motivations For a New Paradigm
  • Mathematical Background
  • Details of the Proposed Coding Paradigm
  • Pattern extraction
  • Coding Results For Different Audio Signals

3
Audio Coding
  • Techniques that allow us to transmit audio signal
    with a bitrate smaller than the raw material
    without loss of perceptible quality
  • Used in Ipods, cell phones, DVDs, blu-rays, TVs,
    computers, etc.
  • State of the art techniques based on Fourier
    (frequency domain)-like transforms MDCT, FFT,
    etc.
  • Some known standards in the industry MP3, AAC,
    High-Efficiency AAC, AMR-WB, etc.

4
Linkages - Industry Canada
Conceptual Differences between Classical Coding
and Sparse Coding
Histogram of Energy
Histogram of Energy
5
Next Steps of Action
6
Why Is Change Necessary?
  • Frame-based (FFT, DCT, MDCT, etc.) coding
  • Transients and acoustic events smeared across
    frames
  • Change of Analysis results depending on the
    alignment of the frames (even wavelets
    Simoncelli et al. 1992)
  • Frame-based analysis not shift invariant
  • Energy saving with sparse coding

7
Shift Variance In Frame-Based Coding
From Smith and Lewicki 2005
8
Matching Pursuit (MP)
Save the optimal kernel
MAX
-
Residual
9
Proposed Biologically-Inspired Audio Coder
10
Gammatone/Gammachirp Filterbanks
  • Frequency-modulated version of gammatones
  • Minimization of the scale-time uncertainty (Irino
    and Patterson 2001) by the gammachirp
  • Better fit to physiological data
  • More free tuning parameters
  • Rotation of the tilings in the time-frequency
    plane

Decay Slope
Attack Slope
Center Frequency
Deviation From Center Frequency
Chirp Factor
Time Delay
11
Why Gammatone/Gamachirp?
  • Optimal Auditory Coding strategy Maximize the
    information conveyed to the brain while
    minimizing the required energy and neural
    resources (Smith and Lewicki 2006)
  • For natural sounds optimal auditory coding
    achieved when gammatone used (Smith and Lewicki
    2006)
  • Gammatone is optimal for audio as Gabor is
    optimal for image (Smith and Lewicki 2006)

12
Adaptive vs. Non-Adaptive
  • Non-Adaptive
  • Gammatone filterbank
  • Center frequencies, time delays, and spike
    amplitudes computed
  • Adaptive
  • Gammachirp filterbank
  • Center frequencies, time delays, spike
    amplitudes, chirp (modulation) factors, attack,
    and decay parameters computed
  • Combinational explosion suboptimal search

Our Claim Switching to Adaptive increases coding
efficiency
13
Comparison of Adaptive vs. Non-Adaptive For Speech
Only the modulation (chirp) factor is adapted
14
Masking
  • MP is based on MSE
  • Perceptual-based MP uses only instantaneous
    masking
  • Remove spikes below the absolute threshold of
    hearing
  • Remove inaudible spikes due to forward or
    backward temporal masking (on-frequency masking)
  • Remove inaudible spikes in adjacent critical
    bands (i.e., off-frequency masking)

15
Coding Results for Percussion
Spikes Before Masking
Previous Works 0.66N-3.2N for 4kHz speech
30000
10000
29370
Spikes After Masking
9430
0.37N
0.12N
Spike Gain
2.90
Bit rate
1.93
Adaptive
Non-Adaptive
High Quality With Informal Listening Tests
16
Pattern Extraction
  • Extraction of auditory objects
  • Spikes not statistically independent
  • Episode discovery in spikegrams
  • Codebook generation based on audio objects
  • Signal coded as codebook elements plus residual
  • Bitrate reduced by 40-50

17
(No Transcript)
18
WO Pattern 21982 1911
8
23704
19
Future and Ongoing Work
  • Generalizing pattern extraction to other features
    (time, amplitude, etc.)
  • Closed-form, precomputed , tree-like search
    matching pursuit to speed-up (MPTK 0.25 real
    time for large signals)
  • Parametric coding of spike parameters(mean firing
    rate, delay, etc.)
  • Modular approach to replace MP
  • Compressed sensing

20
Conclusion
  • Efficient coding paradigm when coding delay can
    be afforded
  • Paradigm mimics the auditory pathway
  • Adaptive approach (with gammachirp) more
    efficient than non-adaptive (with gammatones)
  • Masking removes inaudible spikes
  • Object-based coding
  • Expected to give 1 bit/sample for high quality
    44.1 kHz audio for archiving and broadcasting
Write a Comment
User Comments (0)
About PowerShow.com