AUDIO TONALITY MODE CLASSIFICATION - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

AUDIO TONALITY MODE CLASSIFICATION

Description:

AUDIO TONALITY MODE CLASSIFICATION WITHOUT TONIC ANNOTATIONS Zhiyao Duan1,2, Lie Lu1, and Changshui Zhang2 1. Microsoft Research Asia (MSRA), China. – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 2
Provided by: MedicalIll46
Category:

less

Transcript and Presenter's Notes

Title: AUDIO TONALITY MODE CLASSIFICATION


1
AUDIO TONALITY MODE CLASSIFICATION WITHOUT TONIC
ANNOTATIONS Zhiyao Duan1,2, Lie Lu1, and
Changshui Zhang2 1. Microsoft Research Asia
(MSRA), China. 2. Department of Automation,
Tsinghua University, China.
  • Summary
  • Tonality mode classification for popular songs,
    only mode is labeled in training data.
  • Traditional key finding algorithms often rely on
    tonic annotations of the training songs.
  • Keys of popular songs are hard to obtain
  • Easier to label mode than key for a song.
  • Mode is more important than tonic.
  • An alignment approach to transpose chroma
    features to a reference (but unknown) tonic.
  • Three methods for mode learning
  • Single Profile Correlation (SPC)
  • Multiple Profile Correlation (MPC)
  • Support Vector Machine (SVM)
  • Key C-major, a-minor, Eb-major, etc.
  • Mode major/minor
  • Tonic C, C, D, etc.
  1. After N times updates, is used to
    initialize again, and Step 2 is performed once
    more.
  2. The calculated average vector is stable when the
    sequence of the training chroma vectors being
    randomly changed.
  • training set is small, i.e., the training
    samples of major and minor mode are close to each
    other in the feature space. Therefore, it is hard
    for SVM to find a good classification surface
    between two modes.
  • The decisive (shifted) chroma vector among is the
    furthest one from the classification surface.
    This makes the distribution of the decisive test
    vectors different from that of the training
    vectors in Method (a) and (b).
  • For Method (c), this alignment together with the
    inner-class alignment, can be seen analog to
    minimize the intra-class distance while to
    maximize the inter-class distance.
  • Learning and Classification
  • Single Profile Correlation (SPC)
  • In training, Each mode is represented by one
    chroma profile, using a 12-d or 7-d feature.
  • Each element of the 7-d profile corresponds to
    the diatonic note of the 12-d profile.
  • In testing, circularly shift the chroma vector
    of a excerpt 12 times .
  • Correlate against the major/ minor profiles The
    highest correlated one indicates the mode.
  • Majority voting of excerpts for song mode.
  • Multiple Profile Correlation (MPC)
  • In training, K profiles (12-d or 7-d) to
    represent a mode, using a K-kernel Gaussian
    Mixture Model.
  • In testing, circularly shift the chroma vector of
    a excerpt 12 times to generate 12 vectors.
  • Correlate the shifted vectors with the major/
    minor profiles (Eq. (6)). The maximum or the
    weighted summation of the correlations defines
    the confidence score. The highest confidence
    score indicates the mode.
  • Majority voting of excerpts for song mode.
  • Support Vector Machine (SVM)
  • In training, train a SVM using training chroma
    vectors.
  • Experiments
  • Materials
  • 4,528 (2,786 major and 1,742 minor) songs.
  • Various genres including rock, electronica,
    folk, country, jazz, etc.
  • Songs having ambiguous modes or major-minor
    modulations were discarded.
  • Training set 25, test set 75.
  • Results

Algorithm Flow
  • Feature Extraction and Alignment
  • Chroma feature extraction
  • Divide a song into excerpts (15s, 30s, whole).
  • In each frame (130ms with 10ms shift) of an
    excerpt, a 48-bins CQT in the frequency range
    from 130Hz (C3) to 1975Hz (B6) is calculated.
  • For each excerpt, a 12-d Chroma vector is
    calculated from the average CQT vector.
  • Each Chroma vector is normalized.
  • Alignment
  • To transpose chroma vectors within each mode to a
    reference (but unknown) tonic.
  • Criteria Maximize the overall correlation.
  • inner product norm
  • the transposition of , by circularly
    shifting the items j positions to the left
  • i-th aligned vector q the average
    vector.
  • Future Work
  • How to propose a kind of key-independent feature
    for mode classification?
  • How to exploit temporal information to improve
    the mode model building?
Write a Comment
User Comments (0)
About PowerShow.com