Audio%20Meets%20Image%20Retrieval%20Techniques - PowerPoint PPT Presentation

About This Presentation
Title:

Audio%20Meets%20Image%20Retrieval%20Techniques

Description:

Proportional is more even over all bands. Bin size doesn't appear to be crucial ... Using bands for testing allows for ground truth. Audio files are BIG! ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 22
Provided by: DaveKa
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Audio%20Meets%20Image%20Retrieval%20Techniques


1
Audio Meets Image Retrieval Techniques
  • Dave Kauchak
  • Department of Computer Science
  • University of California, San Diego
  • dkauchak_at_cs.ucsd.edu

2
Image vs. Audio
?
?
?
?
?
?
Rock
Classical
Country
3
Image techniques to audio
  • Idea Apply image retrieval (and classification)
    techniques to audio
  • Image is 2-D
  • Audio is 1-D

?
4
Benefits
  • Dont have to reinvent the wheel
  • Image techniques have had fairly good success
  • More literature in image processing
  • Audio retrieval is a relatively new field

5
Key Concepts and Goals
  • Image techniques to audio processing
  • Apply a number of different image techniques (and
    show they work ?)
  • Relate various parts of audio to counterparts in
    image
  • Novel data set with known ground truth
  • Multiple input for audio
  • Raw audio

6
A first step
  • Audio retrieval
  • Input A number of songs
  • Output Similar songs from an audio database
  • Histogramming methods (Puzicha et. al.)
  • Wavelets instead of gabor filters

7
Basic Technique
histogram
Database
DWT
Most similar songs
8
Normal vs. Proportional Histogramming
  • Remember DWT
  • Different number of samples per level
  • Normal Histogram each level with same number of
    bins
  • Proportional Histogram each level keeping
    samples/bin equal

9
Compare Histograms
  • Chi-square on each level
  • Sum chi-square value and use for dissimilarity
    measure (lower the better)
  • Sum dissimilarity over all input songs

10
Ground Truth Data Set
  • Songs by 4 different bands (10 songs each)
  • Dave Mathews band
  • U2
  • Blink 182
  • Green Day
  • Mono, sampled at 22 KHz from a number of sources

11
Experiment
  • Input 5 songs by a single band
  • Goal Pull out 5 other songs by that band
  • 10 random experiments per band (40 total)
  • Normal bins 8, 16, 32, 64, 128, 192, 256, 320,
    384, 448, 512
  • Proportional bins 4, 8, 16, 32, 64

12
Scoring
  • By points
  • 5 pts. Correct answer in first place
  • 4 pts. Correct answer in second place, etc.
  • Perfect 54321 15
  • Percentage correct at each place
  • Percentage that have correct answer less than or
    equal to place

13
Results Points
14
Results Points Proportional
15
Best Score Results 16 bins ?
1st 2nd 3rd 4th 5th Score
Dave Mathews .6 .8 .4 .3 .2 8.2
Blink 182 .3 .1 .1 0 .1 2.3
U2 0 0 0 .1 0 .2
Green Day .2 .3 .2 0 .5 3.3
Average .275 .3 .175 .1 .2 3.5
16
Different Bands
Normal Proportional
Dave Mathews 6.9 5.8
Blink 182 1.3 2
U2 .9 1.5
Green Day 2.1 2
Average 2.8 2.8
17
Percentage correct
1st 2nd 3rd 4th 5th
Normal .23 .17 .17 .17 .18
Proportional .16 .21 .24 .15 .15
18
One last result ?
19
Summary of Results
  • Overall, results are not amazing
  • Band choice has large influence
  • Normal and Proportional perform somewhat similar
  • Proportional is more even over all bands
  • Bin size doesnt appear to be crucial
  • 75 of a chance a song by the same band will end
    up in top 5

20
Next Step
  • Adaptive Binning
  • Vary Parameters
  • Levels
  • Song length
  • Histogram comparison methods
  • Another image retrieval algorithm
  • Boosting for feature selection using large
    feature set?
  • Other?
  • Larger and more diverse database

21
Conclusion
  • Even though results are not fabulous, image
    processing techniques CAN be used for audio
    processing
  • Using bands for testing allows for ground truth
  • Audio files are BIG!
Write a Comment
User Comments (0)
About PowerShow.com