Title: Audiobased Music Similarity Analysis
1Audio-based Music Similarity Analysis
- MUMT 611
- Beinan Li
- Music Tech _at_ McGill
- 2005-3-17
2Content
- Overview (based on Foote and Logans work)
- Application background
- Summary on common approach
- Foote 1997 Content-based Retrieval of Music and
Audio - Foote 2002 Audio Retrieval by Rhythmic
Similarity - Logan 2001 A Music Similarity Function Based on
Signal Analysis
3Overview
- Application background
- Query by similarity
- Automatic play-list generation
- Automatic D.J.
- Automatic summarizing and categorization
4Overview
- Common approach of audio similarity analysis
- Find a hidden metaphor to relate acoustic
similarity with statistical audio features, based
on defined application domains. - Distance between features from different audio
samples is taken as the measure of similarity. - Supervised vs. Unsupervised approach.
5Overview
- Common steps
- Audio Parameterization
- Windowing time-domain signal and extract
low-level features, usually frequency-domain or
cepstral features. - Quantization (lower the dimensionality)
- Transform low-level features into some
higher-level statistical features according to
the metaphor. - Supervised discriminative quantization, training
involved. - Unsupervised statistical clustering
- Distance calculation
- Calculate distance between high-level features
from different audio samples, based on certain
type of measures.
6Foote Content-based Retrieval of Music and Audio
- Application domain
- Audio search engine for audio documents by
acoustic similarity - Metaphor
- Sample-specific template Histogram of feature
groups - Maximized Mutual Information (informatics)
- Supervised approach
- Multiple distance measures are tested.
7Foote Content-based Retrieval of Music and Audio
- Picture taken from Foote 1997
8Foote Content-based Retrieval of Music and Audio
- Audio parameterization
- Hamming window, overlapping steps.
- 13-D feature
- 12 MFCC coefficients and an energy term
- Emphasizes on mid-frequency bands.
9Foote Content-based Retrieval of Music and Audio
- Quantization
- Tree-based vector quantization.
- Leaves - Histogram bins (features fall in)
- Off-line training of binary-tree construction
- Supervised training data labeled by known
classes - Binary branching threshold
- determined via MMI between features and
class-labels - 1-D feature space partition (find the dimension
by MMI) - Stopping rule thresholds for probability-weighted
MMI - Practical tree construction
- Sample-specific template signature (Logan 2001)
10Foote Content-based Retrieval of Music and Audio
- Distance calculation
- Euclidean distance
- Straightforward
- Sensitive to magnitude
- Successful in Speaker ID domain.
- Cosine distance
- Derived from scalar product
- Insensitive to magnitude
- No evidence on the relation of measures to
perception.
11Foote Content-based Retrieval of Music and Audio
- Experiments
- Performance measurement
- No subjective test
- File-naming hints (oboe) - TREC-like Average
Precision. - Percentage of top-ranked items that are actually
relevant - On simple sounds (laughter, musical notes,
animal, etc.) - On music clips
12Foote Content-based Retrieval of Music and Audio
- Experiments on simple sounds (no predominance)
- Q-tree vs. Muscle Fish (unsupervised)
- Pitch vs. timbre similarity - subjective
importance?
(Diagram from Foote 1997)
13Foote Content-based Retrieval of Music and Audio
- Experiments on music clips
- Each artist as a class
- Cosine distance performs best
(Diagram from Foote 1997)
14Foote Content-based Retrieval of Music and Audio
- Conclusion and future
- Histogram bins can be further weighted to
maximize the entropy. - May be used to measure subjective perceptual
qualities. - Audio content change (via templates within a
stream) - Tree can show importance of feature dimensions
(1-D) - Compressed audio may help skip the step of
parameterization. - Demo link http//www.fxpal.com/people/foote/music
r/doc0.html
15Foote Audio Retrieval by Rhythmic Similarity
- Previous tempo/rhythm tracking approach
- Restricted to narrow application domains
- Dannenberg 1987 MIDI
- Schierer 1998 strong percussive elements
- Goto 1994 4/4, bass drum on downbeat
- Muscle Fish drum-only tracks
- Cliff 2000 dance music
- Not robust.
16Foote Audio Retrieval by Rhythmic Similarity
- Application domain
- Automatic D.J., play-list via rhythmic similarity
- Metaphor
- Beat Spectrum autocorrelation
- Within a certain time range (lag),
autocorrelation of spectral-related audio
features hints rhythm. - Unsupervised approach
- Multiple distance measures are tested.
17Foote Audio Retrieval by Rhythmic Similarity
- Audio Parameterization
- Overlapped windowing, FFT
- Logarithmic magnitude response (power spectrum)
- Others (similar sounds - similar parameters)
- Linear prediction
- MFCC
- Psychoacoustic
18Foote Audio Retrieval by Rhythmic Similarity
- Quantization
- Distance between ST-powers within the stream
- Similarity Matrix (visualization of audio
structure) - End-to-end repeated time-line
- Main diagonal (self-correlate)
- Diagonal stripes (D(i,j) S(k, kl))
- Visible periodicity (repeated stripes)
(Picture from Foote 2002)
19Foote Audio Retrieval by Rhythmic Similarity
- Similarity Matrix
- Brightness - distance
- Beat Spectrum (in a range)
- Autocorrelation
- Peak of BS - repetition
(Picture from Foote 2002)
20Foote Audio Retrieval by Rhythmic Similarity
- Distance calculation
- Euclidean distance
- Cosine distance
- Fourier Beat Spectral Coefficients
- Further lower the dimensionality
21Foote Audio Retrieval by Rhythmic Similarity
- Experiments
- Euclidean distance
- Different-tempo versions of identical music
- Find itself
(Diagram from Foote 2002)
22Foote Audio Retrieval by Rhythmic Similarity
- Experiments
- Three 10-sec sections from each of 4 songs
- Relevant with sections of same song only
- Lag size carefully chosen
- Within a certain time range (lag),
autocorrelation of spectral-related audio
features hints rhythm. - Rule out overly small/large lag candidates
- Cosine and FBSC win with precision of 96.7
23Foote Audio Retrieval by Rhythmic Similarity
- Conclusion and future
- Beat Spectrum is actually a vector space, so
common classification / machine learning can be
used. - Auto-play-list build Similarity Matrix in terms
of ending of Candidate Song N and beginning of
Candidate Song N1. - Knowledge constraints.
24Logan A Music Similarity Function Based on
Signal Analysis
- Application domain
- Automatic play-list via similarity
- Metaphor
- Song-signature instrument type,
singing-presence? - Spectral features
- Transformation cost between signatures
- Unsupervised approach
- Multiple distance measures are tested.
25Logan A Music Similarity Function Based on
Signal Analysis
- Audio Parameterization
- Windowing, MFCC
- Many other candidates so long as a distance
measure can be found.
26Logan A Music Similarity Function Based on
Signal Analysis
- Quantization
- signature based on Foote 1997 (template)
- Supervision may overly rely on training data and
thus emphasize on several specific histogram
bins. - K-means clustering
- Assume the number of clusters be fixed for a song
- Signature vector of common statistic parameters
- (means, covariance, weight)
27Logan A Music Similarity Function Based on
Signal Analysis
- Distance measure
- Earth Movers Distance
- Weighted cost of moving probability mass from one
cluster to another. EMD is the normalized cost. - Distance is based on Kullback-Leibler distance.
28Logan A Music Similarity Function Based on
Signal Analysis
- Experiments
- Test over 8,000 style-variant songs in a
database. - Multiple number of MFCC coefficients are tested.
- Main metrics
- Average distance between all songs
- Average distance between songs on the same album
- Average distance between in the same genre
- Average distance between by the same artist
- Objective and subjective relevance tested.
- Robustness to corruption tested.
- Remove a section of a song on purpose
29Logan A Music Similarity Function Based on
Signal Analysis
(Diagrams from Logan 2001)
30Logan A Music Similarity Function Based on
Signal Analysis
(Diagrams from Logan 2001)
- Experiments
- Subjective
- Corruption
31Other approaches
- Li Guo, 200x Content-Based Audio
Classification and Retrieval Using SVM Learning
32Bibliography
- Summary
- HTML Bibliography