Audio Fingerprinting - PowerPoint PPT Presentation

About This Presentation
Title:

Audio Fingerprinting

Description:

a small, unknown segment of audio data (it can be as short as just a couple of ... playlist generation. royalty collection. ad verification. Connected Audio ... – PowerPoint PPT presentation

Number of Views:1050
Avg rating:3.0/5.0
Slides: 18
Provided by: musicM
Category:

less

Transcript and Presenter's Notes

Title: Audio Fingerprinting


1
Audio Fingerprinting
  • Wes Hatch
  • MUMT-614
  • Mar.13, 2003

2
What is Audio Fingerprinting?
  • a small, unknown segment of audio data (it can be
    as short as just a couple of seconds) is used to
    identify the original audio file from which it
    came

3
Applications
  • Broadcast monitoring
  • playlist generation
  • royalty collection
  • ad verification
  • Connected Audio
  • general term for consumer applications
  • Other
  • Napster--use of fingerprinting systems to
    prohibit the transmission of copywritten
    materials
  • Finding desired content efficiently in an
    overwhelming amount of audio material

4
Benefits
  • Automated search of illegal content on the
    Internet
  • examines the real audio information rather than
    just tag information
  • For the consumer
  • make the meta-data of songs in a library
    consistent, allowing for easy organization
  • can guarantee that what is downloaded is actually
    what it says it is
  • will allow consumer to record signatures of sound
    and music on small handheld devices

5
Two principle components
  • Compute the fingerprint
  • Compare it to a database of previously computed
    fingerprints
  • A text example in a box. I will not eat them
    with a fox. I

6
Details to worry about
  • Robustness (to noise, distortion)
  • Reliability
  • Fingerprint size (reduced dimensionality)
  • Granularity
  • Search speed and scalablity
  • Computationally efficient
  • Resulting features must be informative about the
    audio content
  • Semantic or non-semantic features?
  • Hash table or vector representation?

7
Computing the fingerprint
  • Compare to hash functions?
  • compare computed hash value with that stored in a
    database
  • Drawback
  • need to worry about perceptual similarity and not
    mathematical similarity
  • PCM audio vs. MP3 both sound alike but
    mathematically (i.e. spectral content) are quite
    different
  • perceptual similarity is not transitive
  • not possible to design a system which computes
    mathematical fingerprints for perceptually
    similar objects

8
Techniques (general)
  • Any x number of seconds may be used to compute
    the fingerprint
  • Audio gets separated into frames
  • Features computed for each frame
  • Fourier coefficients
  • MFCC, LPC
  • Spectral flatness
  • sharpness
  • features mapped into a more compact
    representation by using HMM, or quantization

9
Techniques (Haitsma, Kalker)
  • one 32-bit sub-fingerprint every 11.6 ms
  • A block consists of 256 sub-fingerprints
  • Corresponds to a granularity of only 3 seconds
  • Large overlap (31/32), so subsequent
    sub-fingerprints are similar and vary slowly in
    time
  • worst-case scenario the frame boundaries used
    during identification are 5.8 ms off with those
    in database

10
Techniques (Haitsma, Kalker)
  • Data from each frame is sent through a filterbank
  • 33 filters, logarithmically spaced (to correspond
    roughly to the Bark scale)
  • between 300 and 2000Hz
  • phase is neglected (perceptual reasons)

11
System overview
12
Techniques (Burges, Platt)
  • downsampled to 11.025 kHz, split into frames with
    overlap of 2
  • MCLT is then applied to each frame. A 128-sample
    log spectrum is generated by taking the log
    modulus of each MCLT coefficient

13
Techniques (Burges, Platt)
  • Use prior knowledge to define form of the feature
    extractor
  • Features computed by a linear, convolutional
    neural network
  • convert signal into a feature vector
  • uses Pattern Classification and Scene Analysis
    (PCA) to find a set of projections
  • generates a vector of 128 values for every 11.6ms
    interval
  • dimensional-reduction method (i.e. lots of math)

14
Techniques (Burges, Platt)
  • 3 layers of Oriented PCA (OPCA)
  • operates on a frame of 128 values
  • layer 1 generates 10 values for each frame
  • layer 2 takes 42 layer 1 outputs and produces
    20 values
  • layer 3 takes 40 layer 2 outputs and produces
    64 values (11K inputs --gt 64 outputs)

15
Searching the Database
  • Look for the most similar (not necessarily exact)
    fingerprint
  • 10,000 5-min. songs ? 250 million
    sub-fingerprints
  • brute force takes in excess of 20 minutes on a
    very fast PC
  • brute force computes bit-error rate for every
    possible position in the database

16
Searching the Database
  • make assumption that at least 1 (of the 256)
    sub-fingerprints are error-free
  • then, use a hash table (as opposed to more
    memory-intensive look-up table)
  • 800,000 times faster

17
Results
  • false-positive rate of 3.6x10-2 (Haitsma, Kalker)
  • On tests with a large (500,000) set of input
    traces
  • has a low false-positive and false-negative
    rate. (Burges, Platt)
  • didnt test on time compression, expansion
  • can withstand distortions occurring from
    transmission over mobile phones.
Write a Comment
User Comments (0)
About PowerShow.com