Musical Genre Classification - PowerPoint PPT Presentation

About This Presentation

Title:

Musical Genre Classification

Description:

Country of Origin ('Japanese music' ... Soltau et al 1998 'Recognition of Music Types' ... Have users rate their music, match like-tasted users, recommend ... – PowerPoint PPT presentation

Number of Views:3743

Avg rating:3.0/5.0

Slides: 37

Provided by: noug

Category:

more less

Transcript and Presenter's Notes

Title: Musical Genre Classification

1
Musical Genre Classification

Prepared by Elliot Sinyor
for MUMT 611
March 3, 2005

2
Table of Contents

What is Genre?
Approaches to Genre Classification
Manual
Automatic
Related Work
Soltau 1998
Tzanetakis Cook
prescriptive approach
Pachet et al. 2001
emergent approach
Conclustion

3
What is Genre?

A way of describing what an item shares with
other items as well as what differentiates it
from other items
From Aucouturier and Pachet
The genesis of genre is therefore to be found in
our natural and irrepressible tendency to
classify

4
What is Genre?

AP separate into two broad categories
Intentional vs. Extensional

5
What is Genre? - Intentional

More subjective
Relies on collective cultural knowledge
Social/Historical context
Eg 60s, hippies, brit-pop

6
Problems with Genre

What do the names mean?
Rock? Pop?
No fixed semantics
Amazon.com Genres by
Period (60s pop)
Topic (love song)
Country of Origin (Japanese music)
Genre is based on extrinsic habits rather than
intrinsic properties
To a French person C. Aznavour Variety
To an English person C. Aznavour French

7
What is Genre? - Extensional

Analysis-based
Describes the music itself
Tempo, timbre, pitch, language, etc.
(sometimes) easier for automatic genre
classification systems
Eg fast rock, mellow classical.

8
Problems with Genre

What granularity to use?
By Artist?
Please Please Me vs. Sgt. Pepper
By Album?
Revolution 9 vs. Helter Skelter vs. Mother
Natures Son
Does work for broad categories
Rock vs. Classical

9
Problems with Genre

Does anyone agree?
Allmusic.com 531 genres
Amazon.com 719 genres
Mp3.com 430 genres
Only 70 words common to the three taxonomies
(Pachet and Cazaly 2000)

10
Approaches to Genre Classification

Manual
Musicologists and Elbow Grease
Automatic
Prescriptive
Signal Analysis based
Emergent
Uses existing human-entered meta-data to group
things together

11
Manual Classification

Dannenberg et al. 2001
To build a taxonomy for MSN Music Search Engine
Few hundred thousand songs
Hired full-time musicologists
Took 30 human years
The details of the taxonomy and the design
methodology are, however, not available

12
Manual Classification

Pachet and Cazaly 2001 (CUIDADO)
Separated descriptors country, instrumentation,
artist type, etc
_____ Rock
Too sensitive to musical evolution, difficult to
build, difficult to maintain
Changed focus to artists instead of titles.
In any case, insufficient for millions of titles

13
Prescriptive History

Originated from Speech Recognition work
Most Classified audio from TV into
music/speech/environmental

14
Prescriptive Various Approches

Saunders 1996
Thresholding/ZCR techniques
Scheirer and Slaney 1997
Multiple features and statistical pattern
recognition
Kimber and Wilcox 1996
MFCCs and HMM to classify into music, speech,
laughter and nonspeech
Zhang and Kuo 2001
Rule-based system for classifying audio from
movies and TV into
Non-music
Pure speech, non harmonic environmental sound
Music
Harmonic environmental sound, pure music, song,
speech with music, environmental sound with music

15
Prescriptive

Soltau et al 1998 Recognition of Music Types
New approach Explicit Time Modelling with
Neural Network (ETM-NN)

16
Prescriptive Soltau et al. 1998

In a nutshell
Transform acoustic signal into sequence of
abstract sonic events
Look at statistical patterns derived from
sequences ? combine into vectors that represent
temporal structure
3-layer feed-forward network

17
Prescriptive Soltau et al. 1998

Experimental Results
3 hours of data (360 samples, 30 sec each)
Rock, Pop, Techno, Classical
67 training, 13 cross-validation, 20
evaluation
Compare ETM-NN vs. HMM, using cepstral
coefficients
ETM-NN 86.1 HMM 79.2

18
Musical Genre Classification of Audio Signals
Tzanetakis and Cook, 2002

Timbral Texture Features
Spectral Centroid, Rolloff, Flux, ZCR, MFCC (5
coefficients)
Analysis Window features should be stable 23
ms
Texture Window minimum amount of time to
identify a 'texture 43 analysis windows, 1 sec.
Memory of the past
Statistics (means, variances) of features over
the texture window

19
Musical Genre Classification of Audio Signals
Tzanetakis and Cook, 2002

Timbral Texture Features
Spectral Centroid, Rolloff, Flux, ZCR, MFCC (5
coefficients)
Analysis Window features should be stable 23
ms
Texture Window minimum amount of time to
identify a 'texture 43 analysis windows, 1 sec.
Memory of the past

20
Timbral Texture Feature Vector

Statistics (means, variances) of features over
the texture window
19 dimensions
(m, v) of SC, SF, SR, ZCR, 5 MFCC
low energy feature fraction of analysis windows
over texture window that have less than average
RMS energy
Eg vocal music will have more silences

21
Rhythmic Content Beat Histogram

Pitch detection with larger periods
Use DWT to divide signal into frequency bands

22
Rhythmic Content Beat Histogram
23
Features taken from BH

A0, A1 relative amplitude (divided by the sum of
amplitudes) of the first, and second histogram
peak
RA ratio of the amplitude of the second peak
divided by the amplitude of the first peak
P1, P2 period of the first, second peak in bpm
SUM overall sum of the histogram (indication of
beat strength).

24
Pitch Content Features

Used enhanced Autocorrelation function to create
folded (1 octave) and unfolded (all notes) pitch
histograms
Mapped to MIDI note numbers
Folded- common pitch classes
Unfolded pitch range
Higher for jazz, classical
FA0, UP0, UP1, IPO1 (interval between 2 highest
peaks), SUM

25
Experimental Results

Used GMM classifiers with diagonal covariance
matrices

26
Experimental Results
27
Prescriptive Some Results (from AP)

Gaussian and Gaussian Mixture Models, used in 48
of successful classification in Ermolinskiy et
al.(2001) using 100 songs for each class in the
training phase. This result has to be taken with
care since the system uses only pitch
information.
Tzanetakis et al. (2001) achieves a rather
disappointing 57, but also reports 75 in
Tzanetakis and Cook (2000a) using 50 songs per
class.
90 in Lambrou and Sandler (1998) and 75 in
Deshpande et al. (2001) on a very small training
and test set, which may not be representative.
Pye (2000) reports 90 on a total set of 175
songs.
Soltau (1998) reports 80 with HMM, 86 with NN,
with a database of 360 songs.

28
Emergent

Unlike Prescriptive, it is unsupervised
Based on cultural similarity from text
documents
Possible to extract similarities that are not
possible to extract from the audio signal

29
Emergent Collaborative Filtering

Shardanand Maes 1995, Pestoni et al. 2001
There are patterns in tastes
Have users rate their music, match like-tasted
users, recommend unknown items to users
Problems
Good for naïve profiles, bad for broad, eclectic
tastes
Favors middle of the road liked by large
proportion
Only works some time after release of new music

30
Emergent co-concurrent analysis

Pachet et al. 2001
Looks at online text sources for co-occurrences
of songs (aka data mining)
If 2 items appear in the same context (or share a
common neighbour), this is evidence of some sort
of similarity

31
Co-occurrence

Pachet et al. 2001 Musical Data Mining for
Electronic Music Distribution
Sources used
Track listing databases (CDDB)
Mostly look at compilations of similar artists
Radio Show playlists
Specialty programs better than daily commercial
radio
Lists made by experts

32
Co-occurrence

Build a matrix where
Value of entry (i, j) corresponds to number of
times title i co-occurs with title j
What about indirect co-occurrence?
Eg Eleanor Rigby/Good Vibrations, Good
Vibrations/God Only Knows ? Eleanor Rigby God
Only Knows
Correlation measure, using co-variance matrices
of each title

33
Experimental Results

Using distance functions, use Ascendant
Hierarchical Clustering
Used CDDB database, compared co-occurrence vs
correlation
Manually examined results
70 of clusters had interesting similarities

34
Experimental Results
35
Challenges

Name format is not strictly enforced
The Beatles Beatles, The Beatles
Difficult to characterize the nature of the
similarities
Cover songs can sound nothing alike

36
Conclusions and Future directions

It seems that samples of Techno and Classical
are easy to discriminate Rock and Pop seems to
be more difficult Soltau et al 1998
Manual classification not feasible
Why not combine prescriptive/emergent techniques?

Write a Comment

User Comments (0)