Audiobased Music Similarity Analysis - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Audiobased Music Similarity Analysis

Description:

Foote 1997: Content-based Retrieval of Music and Audio ... On simple sounds (laughter, musical notes, animal, etc.) On music clips. 12 ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 33
Provided by: jiao6
Category:

less

Transcript and Presenter's Notes

Title: Audiobased Music Similarity Analysis


1
Audio-based Music Similarity Analysis
  • MUMT 611
  • Beinan Li
  • Music Tech _at_ McGill
  • 2005-3-17

2
Content
  • Overview (based on Foote and Logans work)
  • Application background
  • Summary on common approach
  • Foote 1997 Content-based Retrieval of Music and
    Audio
  • Foote 2002 Audio Retrieval by Rhythmic
    Similarity
  • Logan 2001 A Music Similarity Function Based on
    Signal Analysis

3
Overview
  • Application background
  • Query by similarity
  • Automatic play-list generation
  • Automatic D.J.
  • Automatic summarizing and categorization

4
Overview
  • Common approach of audio similarity analysis
  • Find a hidden metaphor to relate acoustic
    similarity with statistical audio features, based
    on defined application domains.
  • Distance between features from different audio
    samples is taken as the measure of similarity.
  • Supervised vs. Unsupervised approach.

5
Overview
  • Common steps
  • Audio Parameterization
  • Windowing time-domain signal and extract
    low-level features, usually frequency-domain or
    cepstral features.
  • Quantization (lower the dimensionality)
  • Transform low-level features into some
    higher-level statistical features according to
    the metaphor.
  • Supervised discriminative quantization, training
    involved.
  • Unsupervised statistical clustering
  • Distance calculation
  • Calculate distance between high-level features
    from different audio samples, based on certain
    type of measures.

6
Foote Content-based Retrieval of Music and Audio
  • Application domain
  • Audio search engine for audio documents by
    acoustic similarity
  • Metaphor
  • Sample-specific template Histogram of feature
    groups
  • Maximized Mutual Information (informatics)
  • Supervised approach
  • Multiple distance measures are tested.

7
Foote Content-based Retrieval of Music and Audio
  • Picture taken from Foote 1997

8
Foote Content-based Retrieval of Music and Audio
  • Audio parameterization
  • Hamming window, overlapping steps.
  • 13-D feature
  • 12 MFCC coefficients and an energy term
  • Emphasizes on mid-frequency bands.

9
Foote Content-based Retrieval of Music and Audio
  • Quantization
  • Tree-based vector quantization.
  • Leaves - Histogram bins (features fall in)
  • Off-line training of binary-tree construction
  • Supervised training data labeled by known
    classes
  • Binary branching threshold
  • determined via MMI between features and
    class-labels
  • 1-D feature space partition (find the dimension
    by MMI)
  • Stopping rule thresholds for probability-weighted
    MMI
  • Practical tree construction
  • Sample-specific template signature (Logan 2001)

10
Foote Content-based Retrieval of Music and Audio
  • Distance calculation
  • Euclidean distance
  • Straightforward
  • Sensitive to magnitude
  • Successful in Speaker ID domain.
  • Cosine distance
  • Derived from scalar product
  • Insensitive to magnitude
  • No evidence on the relation of measures to
    perception.

11
Foote Content-based Retrieval of Music and Audio
  • Experiments
  • Performance measurement
  • No subjective test
  • File-naming hints (oboe) - TREC-like Average
    Precision.
  • Percentage of top-ranked items that are actually
    relevant
  • On simple sounds (laughter, musical notes,
    animal, etc.)
  • On music clips

12
Foote Content-based Retrieval of Music and Audio
  • Experiments on simple sounds (no predominance)
  • Q-tree vs. Muscle Fish (unsupervised)
  • Pitch vs. timbre similarity - subjective
    importance?

(Diagram from Foote 1997)
13
Foote Content-based Retrieval of Music and Audio
  • Experiments on music clips
  • Each artist as a class
  • Cosine distance performs best

(Diagram from Foote 1997)
14
Foote Content-based Retrieval of Music and Audio
  • Conclusion and future
  • Histogram bins can be further weighted to
    maximize the entropy.
  • May be used to measure subjective perceptual
    qualities.
  • Audio content change (via templates within a
    stream)
  • Tree can show importance of feature dimensions
    (1-D)
  • Compressed audio may help skip the step of
    parameterization.
  • Demo link http//www.fxpal.com/people/foote/music
    r/doc0.html

15
Foote Audio Retrieval by Rhythmic Similarity
  • Previous tempo/rhythm tracking approach
  • Restricted to narrow application domains
  • Dannenberg 1987 MIDI
  • Schierer 1998 strong percussive elements
  • Goto 1994 4/4, bass drum on downbeat
  • Muscle Fish drum-only tracks
  • Cliff 2000 dance music
  • Not robust.

16
Foote Audio Retrieval by Rhythmic Similarity
  • Application domain
  • Automatic D.J., play-list via rhythmic similarity
  • Metaphor
  • Beat Spectrum autocorrelation
  • Within a certain time range (lag),
    autocorrelation of spectral-related audio
    features hints rhythm.
  • Unsupervised approach
  • Multiple distance measures are tested.

17
Foote Audio Retrieval by Rhythmic Similarity
  • Audio Parameterization
  • Overlapped windowing, FFT
  • Logarithmic magnitude response (power spectrum)
  • Others (similar sounds - similar parameters)
  • Linear prediction
  • MFCC
  • Psychoacoustic

18
Foote Audio Retrieval by Rhythmic Similarity
  • Quantization
  • Distance between ST-powers within the stream
  • Similarity Matrix (visualization of audio
    structure)
  • End-to-end repeated time-line
  • Main diagonal (self-correlate)
  • Diagonal stripes (D(i,j) S(k, kl))
  • Visible periodicity (repeated stripes)

(Picture from Foote 2002)
19
Foote Audio Retrieval by Rhythmic Similarity
  • Similarity Matrix
  • Brightness - distance
  • Beat Spectrum (in a range)
  • Autocorrelation
  • Peak of BS - repetition

(Picture from Foote 2002)
20
Foote Audio Retrieval by Rhythmic Similarity
  • Distance calculation
  • Euclidean distance
  • Cosine distance
  • Fourier Beat Spectral Coefficients
  • Further lower the dimensionality

21
Foote Audio Retrieval by Rhythmic Similarity
  • Experiments
  • Euclidean distance
  • Different-tempo versions of identical music
  • Find itself

(Diagram from Foote 2002)
22
Foote Audio Retrieval by Rhythmic Similarity
  • Experiments
  • Three 10-sec sections from each of 4 songs
  • Relevant with sections of same song only
  • Lag size carefully chosen
  • Within a certain time range (lag),
    autocorrelation of spectral-related audio
    features hints rhythm.
  • Rule out overly small/large lag candidates
  • Cosine and FBSC win with precision of 96.7

23
Foote Audio Retrieval by Rhythmic Similarity
  • Conclusion and future
  • Beat Spectrum is actually a vector space, so
    common classification / machine learning can be
    used.
  • Auto-play-list build Similarity Matrix in terms
    of ending of Candidate Song N and beginning of
    Candidate Song N1.
  • Knowledge constraints.

24
Logan A Music Similarity Function Based on
Signal Analysis
  • Application domain
  • Automatic play-list via similarity
  • Metaphor
  • Song-signature instrument type,
    singing-presence?
  • Spectral features
  • Transformation cost between signatures
  • Unsupervised approach
  • Multiple distance measures are tested.

25
Logan A Music Similarity Function Based on
Signal Analysis
  • Audio Parameterization
  • Windowing, MFCC
  • Many other candidates so long as a distance
    measure can be found.

26
Logan A Music Similarity Function Based on
Signal Analysis
  • Quantization
  • signature based on Foote 1997 (template)
  • Supervision may overly rely on training data and
    thus emphasize on several specific histogram
    bins.
  • K-means clustering
  • Assume the number of clusters be fixed for a song
  • Signature vector of common statistic parameters
  • (means, covariance, weight)

27
Logan A Music Similarity Function Based on
Signal Analysis
  • Distance measure
  • Earth Movers Distance
  • Weighted cost of moving probability mass from one
    cluster to another. EMD is the normalized cost.
  • Distance is based on Kullback-Leibler distance.

28
Logan A Music Similarity Function Based on
Signal Analysis
  • Experiments
  • Test over 8,000 style-variant songs in a
    database.
  • Multiple number of MFCC coefficients are tested.
  • Main metrics
  • Average distance between all songs
  • Average distance between songs on the same album
  • Average distance between in the same genre
  • Average distance between by the same artist
  • Objective and subjective relevance tested.
  • Robustness to corruption tested.
  • Remove a section of a song on purpose

29
Logan A Music Similarity Function Based on
Signal Analysis
  • Experiments (Objective)

(Diagrams from Logan 2001)
30
Logan A Music Similarity Function Based on
Signal Analysis
(Diagrams from Logan 2001)
  • Experiments
  • Subjective
  • Corruption

31
Other approaches
  • Li Guo, 200x Content-Based Audio
    Classification and Retrieval Using SVM Learning

32
Bibliography
  • Summary
  • HTML Bibliography
Write a Comment
User Comments (0)
About PowerShow.com