AUDIO TONALITY MODE CLASSIFICATION

About This Presentation

Title:

AUDIO TONALITY MODE CLASSIFICATION

Description:

AUDIO TONALITY MODE CLASSIFICATION WITHOUT TONIC ANNOTATIONS Zhiyao Duan1,2, Lie Lu1, and Changshui Zhang2 1. Microsoft Research Asia (MSRA), China. – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 2

Provided by: MedicalIll46

Category:

more less

Transcript and Presenter's Notes

Title: AUDIO TONALITY MODE CLASSIFICATION

1
AUDIO TONALITY MODE CLASSIFICATION WITHOUT TONIC
ANNOTATIONS Zhiyao Duan1,2, Lie Lu1, and
Changshui Zhang2 1. Microsoft Research Asia
(MSRA), China. 2. Department of Automation,
Tsinghua University, China.

Summary
Tonality mode classification for popular songs,
only mode is labeled in training data.
Traditional key finding algorithms often rely on
tonic annotations of the training songs.
Keys of popular songs are hard to obtain
Easier to label mode than key for a song.
Mode is more important than tonic.
An alignment approach to transpose chroma
features to a reference (but unknown) tonic.
Three methods for mode learning
Single Profile Correlation (SPC)
Multiple Profile Correlation (MPC)
Support Vector Machine (SVM)
Key C-major, a-minor, Eb-major, etc.
Mode major/minor
Tonic C, C, D, etc.

After N times updates, is used to
initialize again, and Step 2 is performed once
more.
The calculated average vector is stable when the
sequence of the training chroma vectors being
randomly changed.

training set is small, i.e., the training
samples of major and minor mode are close to each
other in the feature space. Therefore, it is hard
for SVM to find a good classification surface
between two modes.
The decisive (shifted) chroma vector among is the
furthest one from the classification surface.
This makes the distribution of the decisive test
vectors different from that of the training
vectors in Method (a) and (b).
For Method (c), this alignment together with the
inner-class alignment, can be seen analog to
minimize the intra-class distance while to
maximize the inter-class distance.

Learning and Classification
Single Profile Correlation (SPC)
In training, Each mode is represented by one
chroma profile, using a 12-d or 7-d feature.
Each element of the 7-d profile corresponds to
the diatonic note of the 12-d profile.
In testing, circularly shift the chroma vector
of a excerpt 12 times .
Correlate against the major/ minor profiles The
highest correlated one indicates the mode.
Majority voting of excerpts for song mode.
Multiple Profile Correlation (MPC)
In training, K profiles (12-d or 7-d) to
represent a mode, using a K-kernel Gaussian
Mixture Model.
In testing, circularly shift the chroma vector of
a excerpt 12 times to generate 12 vectors.
Correlate the shifted vectors with the major/
minor profiles (Eq. (6)). The maximum or the
weighted summation of the correlations defines
the confidence score. The highest confidence
score indicates the mode.
Majority voting of excerpts for song mode.
Support Vector Machine (SVM)
In training, train a SVM using training chroma
vectors.

Experiments
Materials
4,528 (2,786 major and 1,742 minor) songs.
Various genres including rock, electronica,
folk, country, jazz, etc.
Songs having ambiguous modes or major-minor
modulations were discarded.
Training set 25, test set 75.
Results

Algorithm Flow

Feature Extraction and Alignment
Chroma feature extraction
Divide a song into excerpts (15s, 30s, whole).
In each frame (130ms with 10ms shift) of an
excerpt, a 48-bins CQT in the frequency range
from 130Hz (C3) to 1975Hz (B6) is calculated.
For each excerpt, a 12-d Chroma vector is
calculated from the average CQT vector.
Each Chroma vector is normalized.
Alignment
To transpose chroma vectors within each mode to a
reference (but unknown) tonic.
Criteria Maximize the overall correlation.
inner product norm
the transposition of , by circularly
shifting the items j positions to the left
i-th aligned vector q the average
vector.