Semantic Audio - PowerPoint PPT Presentation

About This Presentation
Title:

Semantic Audio

Description:

Linear basis projection using SVD and ICA. spectrum subspace separation ... (Pink Floyd: mono - 9 subspace tracks) Centre for Computational Creativity. EXAMPLE 2 ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 56
Provided by: Cas16
Category:
Tags: audio | floyd | pink | semantic

less

Transcript and Presenter's Notes

Title: Semantic Audio


1
  • Semantic Audio
  • Studio Tools and Techniques
  • using MPEG-7
  • Dr. Michael Casey
  • Centre for Computational Creativity
  • Department of Computing
  • City University, London

2
Overview
  • MPEG-7 Tools
  • Low Level Audio Descriptors
  • Statistical Sound Models (Semantic ?)
  • Music Unmixing
  • Independent Spectrogram Separation
  • Sound Classification
  • Automatic label extraction
  • Semantic processing
  • Segment Similarity, Structure Extraction Musaics
  • S-Matrix (Self-Similarity Matrix)
  • C-Matrix (Cross-Similarity Matrix)
  • Segment Replacement
  • Musaics

3
Semantic Audio Analysis
Acoustic Features
Extraction
Semantic Audio Description
4
MPEG-7 Audio Descriptors
Header
5
MPEG-7 Audio Descriptors
Segments
6
MPEG-7 Audio Descriptors
Descriptor
7
Some Useful Descriptors for Music Processing
  • AudioSpectrumEnvelopeD
  • AudioSpectrumBasisD
  • AudioSpectrumProjectionD
  • SoundModelDS
  • SoundModelStatePathD
  • SoundModelStateHistogramD

8
EXAMPLE 1MUSIC UNMIXING
9
AudioSpectrumBasisD
10
AudioSpectrumBasisD
AudioSpectrumBasisD
SVD / ICA Basis Rotation
AudioSpectrumProjectionD
11
AudioSpectrumBasisD
12
AudioSpectrumProjectionD
AudioSpectrumBasisD
SVD / ICA Basis Rotation
AudioSpectrumProjectionD
13
AudioSpectrumProjectionD
14
Outer Product Spectrum Reconstruction
Individual Basis Component
15
4 Component Reconstruction
16
10 Component Reconstruction
17
Music Unmixing
  • Linear basis projection using SVD and ICA
  • spectrum subspace separation
  • fast computation of subspace ICA
  • full-rate filterbank masking
  • Blocked ICA functions
  • subspace reconstruction Y XVV
  • cluster subspaces to identify tracks
  • sum masked filterbank output to create audio


j
j
j
18
Independent Spectrogram Subspace Layers
Time Function
Spectrogram Layer
1 Component
Spectral Basis
Mixture Spectrogram
4 Components
10 Components
19
Music Unmixing Example(Pink Floyd mono - 9
subspace tracks)
20
EXAMPLE 2AUTOMATIC AUDIO CLASSIFICATION
21
Sound Model DSand related descriptors
AudioSpectrumBasisD
ContinuousHiddenMarkovModelDS
SoundModelStatePathD
x
1 3 3 2 2 3 4 4 4 4 ...
T(i,j)
AudioSpectrumEnvelopeD
AudioSpectrumProjectionD
22
Sound Recognition using HMMs
Trained HMMs
Sound Database
23
MPEG-7 Intelligent Music Browsing
24
Music Genre Classification
Class Name Num of Files Num Segments
1) Blues 79
86 2) hiphop 15
129 3) Gospel 23 25 4)
Country 27 28 5)
DrumNBass 26 275 6)
Classical 8 156 7) 2Step
39 311 8) Merengue
34 304 9) Reggae 80
398 10) Salsa 39
425 ------------------------------------------- To
tals 370 2137
25
Music Genre Classification
26
Semantic AudioGeneral Sound Taxonomy
27
DS General Audio Classification
28
EXAMPLE 3STRUCTURE EXTRACTION
29
Structure Discovery
Acoustic Features
State-Space Models
Hierarchical Structure Discovery
30
SoundModelStatePathD
A simplified representation of spectral dynamics
State Path
31
SoundModelStateHistogramD
state index
0.01s Frames
state index
seconds
32
High-Level Structure Discovery
33
S-Matrix
34
STRUCTURE EXTRACTION SEGMENTATION
35
Structure Discovery
Low level features
Acoustic Features
State-Space Models
High-level Structure
Hierarchical Structure Discovery
36
High-Level Structure Discovery
Alanis Morrisette
Machine Segmentation
Human Segmentation
37
High-Level Structure Discovery
Cranberries
Machine Segmentation
Human Segmentation
38
High-Level Structure Discovery
Nirvana
Machine Segmentation
Human Segmentation
39
High-Level Structure Discovery
40
EXAMPLE 4MUSAICS
41
Musaics (Music Mosaics)
  • C-Matrix Cross-Song Similarity Matrix
  • Outer product of target and source histograms
  • Find segments similar to target segment
  • Similarity between all target and database
    segments
  • SORT columns of similarity matrix
  • Replace segments with similar material
  • Segmentation boundaries (beat alignment)
  • Replace with best fit using DTW on most similar
    segments
  • EXAMPLES

42
Musaics
MPEG-7 Database
Extract
Segment
Match
Target
StatePathHistograms
Beats
Replace
Musaic
43
Musaics
44
Musaics
45
Musaics
46
Musaics
47
Musaics
48
Musaics
49
Musaics
50
Musaics
51
Musaics
52
Musaics
53
Musaics
54
Musaics
  • New Content by Similarity Replacement
  • C-Matrix Cross-Song Similarity Map
  • 1 Target, Many Sources
  • Constraints
  • Preserve Rhythm by Beat Tracking
  • Preserve Beats by DTW alignment
  • Bigger Source Database Better
  • Greater Number of Accurate Matches

55
Acknowledgements
  • International Standards Organisation
  • ISO/IEC JTC 1 SC29 WG11 (MPEG)
  • Mitsubishi Electric Research Labs
  • Massachusetts Institute of Technology
  • Music Mind Machine Group (formerly Machine
    Listening Group)
  • Paris Smaragdis, Youngmoo Kim, Brian Whitman
  • Iroro Orife, John Hershey, Alex Westner, Kevin
    Wilson
  • City University
  • Department of Computing
  • Centre for Computational Creativity
Write a Comment
User Comments (0)
About PowerShow.com