Robust Audio Identification for Commercial Applications - PowerPoint PPT Presentation

About This Presentation
Title:

Robust Audio Identification for Commercial Applications

Description:

Fraunhofer IIS, AEMT, D-98693 Ilmenau, Germany. Matthias Gruhne, ghe ... Extractor. Feature. Processor. Matthias Gruhne, ghe_at_emt.iis.fhg.de Page 8. Fraunhofer ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 18
Provided by: alm123
Learn more at: http://virtualgoods.org
Category:

less

Transcript and Presenter's Notes

Title: Robust Audio Identification for Commercial Applications


1
Robust Audio Identification for Commercial
Applications
  • Matthias Gruhne
  • ghe_at_emt.iis.fhg.de
  • Fraunhofer IIS, AEMT, D-98693 Ilmenau, Germany

2
Overview
  • What is AudioID?
  • Requirements
  • System Architecture
  • MPEG 7
  • Recognition Performance
  • Applications
  • Conclusions
  • Demonstration

3
What is AudioID?
4
What is AudioID?
  • Identify audio material (artist, song, etc.) by
    analysis of the signal itself
  • Content-Based Identification
  • No associated information required (headers, ID3
    tags)
  • No embedded signals (e.g. watermark), are
    required
  • Some knowledge available about music to be
    identified (reference database)

Purpose
Conditions
5
Requirements
  • High recognition rates (gt 95), even with
    distorted signals
  • Robust against various distortions
  • volume change, equalization, noise addition,
    audio coding (e.g. MP3), ...
  • analog artifacts (e.g. D/A, A/D)
  • Small signature size
  • Extensibility of database (gt 106 items) while
    keeping processing time low(few ms/item)

Recognition rate
Robustness
Compactness
Scalability
6
System Architecture - Overview
7
System Architecture
  • Signal preprocessing
  • Extract the essence of audio signal
  • Increase discriminance efficiency
  • Temporal grouping of features (super vector)
  • Statistics calculation (mean, variance, etc.)

FeatureExtractor
FeatureProcessor
8
System Architecture
  • Clustering of processed feature vectors
  • further reduce the amount of data
  • enhance robustness (overfitting)
  • Add class with associated metadata to database
  • Compare feature vectors against classes in
    database by means of some metric
  • Find class yielding the best approximation
  • Retrieve associated metadata

Class generator
Classification
9
MPEG-7 - Elements for Robust Audio Matching
  • AudioSpectrumFlatness LLD
  • Derived fromSpectral Flatness Measure (SFM)
  • Describes un/flatness of spectrum in frequency
    bands (tonal ? noise)
  • AudioSignature Description Scheme
  • Statistical data summarization of
    AudioSpectrumFlatness LLD
  • Textual description in XML syntax

Low leveldata
Fingerprint
10
MPEG-7 - Benefits
  • Standardized Feature Format? guarantees
    worldwide interoperability
  • Published, open format? descriptive data can be
    produced easily
  • Large MPEG-7 compliant databases expected to be
    available in near future (incl. fingerprints)
  • Long term format stability/ life time

11
Recognition Performance- Conditions
Conditions
  • Training and test sets (mostly rock / pop)
  • 15,000 items
  • 90,000 items
  • Spectral Flatness Measure (SFM)
  • Number of correctly identified items (both
    single best and within top 10)

Considered feature
Classificationperformance
12
Recognition Performance - 15k items
Feature SFM
Cropping 100.0 / 100.0
MP3 _at_ 96kbps 99.6 / 99.8
Loudsp./Mic. 98.0 / 99.0
  • 16 bands
  • Advanced matching with temporal tracking

13
Recognition Performance - 90k items
!
!
  • 16 bands
  • Advanced matching with temporal tracking

14
Applications
  • Retrieve associated metadata by identifying audio
    content
  • Automated search of audio content on the Internet
  • Broadcast monitoring by protocoling the
    transmission of audio material
  • Feature based indexing of audio databases
    (similarity search)
  • ...

15
Conclusions
  • High recognition rates (gt99 tested with 90,000
    items)
  • Robust to real world signal distortions
  • Fast and reliable extraction and classification
  • Underlying feature specified in MPEG-7
    standard? ensures worldwide interoperability and
    licensing available for everyone

16
Real Time Demonstration
  • Demo running on laptop(Pentium III _at_ 500 MHz)
  • Local database with 15,000 items(Rock / Pop
    genre)
  • Acoustic transmission mp3 -gt D/A -gt Speakers -gt
    Noisy Environment -gt Microphone -gt A/D -gt
    AudioID

17
Thanks for your Attention !
Write a Comment
User Comments (0)
About PowerShow.com