Title: Semantic Audio
1- Semantic Audio
- Studio Tools and Techniques
- using MPEG-7
- Dr. Michael Casey
- Centre for Computational Creativity
- Department of Computing
- City University, London
2Overview
- MPEG-7 Tools
- Low Level Audio Descriptors
- Statistical Sound Models (Semantic ?)
- Music Unmixing
- Independent Spectrogram Separation
- Sound Classification
- Automatic label extraction
- Semantic processing
- Segment Similarity, Structure Extraction Musaics
- S-Matrix (Self-Similarity Matrix)
- C-Matrix (Cross-Similarity Matrix)
- Segment Replacement
- Musaics
3Semantic Audio Analysis
Acoustic Features
Extraction
Semantic Audio Description
4MPEG-7 Audio Descriptors
Header
5MPEG-7 Audio Descriptors
Segments
6MPEG-7 Audio Descriptors
Descriptor
7Some Useful Descriptors for Music Processing
- AudioSpectrumEnvelopeD
- AudioSpectrumBasisD
- AudioSpectrumProjectionD
- SoundModelDS
- SoundModelStatePathD
- SoundModelStateHistogramD
8EXAMPLE 1MUSIC UNMIXING
9AudioSpectrumBasisD
10AudioSpectrumBasisD
AudioSpectrumBasisD
SVD / ICA Basis Rotation
AudioSpectrumProjectionD
11AudioSpectrumBasisD
12AudioSpectrumProjectionD
AudioSpectrumBasisD
SVD / ICA Basis Rotation
AudioSpectrumProjectionD
13AudioSpectrumProjectionD
14Outer Product Spectrum Reconstruction
Individual Basis Component
154 Component Reconstruction
1610 Component Reconstruction
17Music Unmixing
- Linear basis projection using SVD and ICA
- spectrum subspace separation
- fast computation of subspace ICA
- full-rate filterbank masking
- Blocked ICA functions
- subspace reconstruction Y XVV
- cluster subspaces to identify tracks
- sum masked filterbank output to create audio
j
j
j
18Independent Spectrogram Subspace Layers
Time Function
Spectrogram Layer
1 Component
Spectral Basis
Mixture Spectrogram
4 Components
10 Components
19Music Unmixing Example(Pink Floyd mono - 9
subspace tracks)
20EXAMPLE 2AUTOMATIC AUDIO CLASSIFICATION
21Sound Model DSand related descriptors
AudioSpectrumBasisD
ContinuousHiddenMarkovModelDS
SoundModelStatePathD
x
1 3 3 2 2 3 4 4 4 4 ...
T(i,j)
AudioSpectrumEnvelopeD
AudioSpectrumProjectionD
22Sound Recognition using HMMs
Trained HMMs
Sound Database
23MPEG-7 Intelligent Music Browsing
24Music Genre Classification
Class Name Num of Files Num Segments
1) Blues 79
86 2) hiphop 15
129 3) Gospel 23 25 4)
Country 27 28 5)
DrumNBass 26 275 6)
Classical 8 156 7) 2Step
39 311 8) Merengue
34 304 9) Reggae 80
398 10) Salsa 39
425 ------------------------------------------- To
tals 370 2137
25Music Genre Classification
26Semantic AudioGeneral Sound Taxonomy
27DS General Audio Classification
28EXAMPLE 3STRUCTURE EXTRACTION
29Structure Discovery
Acoustic Features
State-Space Models
Hierarchical Structure Discovery
30SoundModelStatePathD
A simplified representation of spectral dynamics
State Path
31SoundModelStateHistogramD
state index
0.01s Frames
state index
seconds
32High-Level Structure Discovery
33S-Matrix
34STRUCTURE EXTRACTION SEGMENTATION
35Structure Discovery
Low level features
Acoustic Features
State-Space Models
High-level Structure
Hierarchical Structure Discovery
36High-Level Structure Discovery
Alanis Morrisette
Machine Segmentation
Human Segmentation
37High-Level Structure Discovery
Cranberries
Machine Segmentation
Human Segmentation
38High-Level Structure Discovery
Nirvana
Machine Segmentation
Human Segmentation
39High-Level Structure Discovery
40EXAMPLE 4MUSAICS
41Musaics (Music Mosaics)
- C-Matrix Cross-Song Similarity Matrix
- Outer product of target and source histograms
- Find segments similar to target segment
- Similarity between all target and database
segments - SORT columns of similarity matrix
- Replace segments with similar material
- Segmentation boundaries (beat alignment)
- Replace with best fit using DTW on most similar
segments - EXAMPLES
42Musaics
MPEG-7 Database
Extract
Segment
Match
Target
StatePathHistograms
Beats
Replace
Musaic
43Musaics
44Musaics
45Musaics
46Musaics
47Musaics
48Musaics
49Musaics
50Musaics
51Musaics
52Musaics
53Musaics
54Musaics
- New Content by Similarity Replacement
- C-Matrix Cross-Song Similarity Map
- 1 Target, Many Sources
- Constraints
- Preserve Rhythm by Beat Tracking
- Preserve Beats by DTW alignment
- Bigger Source Database Better
- Greater Number of Accurate Matches
55Acknowledgements
- International Standards Organisation
- ISO/IEC JTC 1 SC29 WG11 (MPEG)
- Mitsubishi Electric Research Labs
- Massachusetts Institute of Technology
- Music Mind Machine Group (formerly Machine
Listening Group) - Paris Smaragdis, Youngmoo Kim, Brian Whitman
- Iroro Orife, John Hershey, Alex Westner, Kevin
Wilson - City University
- Department of Computing
- Centre for Computational Creativity