Title: CUIDADO: WP2'1' Audio Features extraction M2 meeting
1CUIDADO WP2.1. Audio Features extractionM2
meeting
- Geoffroy Peeters
- 2002/03/18-19
2WP21 goals until M2
- WP2.1.1. (ircam) Static description
(descriptors, mapping/matching, classification) - WP2.1.2. (ircam) Web Music Monitoring System
- WP2.1.5. (ircam) Psycho-acoustic Timbre
descriptors - WP2.1.9. (bgu) HOS for instrument and sound
FX matching - WP2.1.10 (iua-upf) Improving fundamental
frequency - WP2.1.11 (iua-upf) Blind temporal segmentation
3WP2.1.Sound description
Classification scheme
Sound and descriptors database
Classification
Search by similarity
4WP2.1.Sound description
WP2.1.11.
WP2.1.1. / WP2.1.9
WP2.1.10.
Classification scheme
Sound and descriptors database
Classification
Search by similarity
WP2.1.5.
5WP2.1.1.Static Sound Description
- WP2.1.1.
- gt Descriptors Extraction modules
- gt Class modeling modules
WP2.1.1.Sound description
6WP2.1.1.Audio descriptorsTaxonomy
- Global descriptors
- Instantaneous descriptors
- temporal modelingmean, std, derivative,
cross-correlation, ...
7WP2.1.1.Audio descriptorsTaxonomy
- DT temporal descriptors
- DE energy descriptors
- DS spectral descriptors
- DH harmonic descriptors
- DP perceptual descriptors
8WP2.1.1.Audio descriptorsDT Temporal/Energy
descriptors
Energy
Envelop
sound
- DT.log-attack time
- DT.temporal centroid
- DT.temporal decrease
- DT.effective duration
- DT.zero-crossing rate
- DT.auto-correlation
- DE.total energy
- DE.energy of sinusoidal part
- DE.energy of noise part
9WP2.1.1.Audio descriptorsDS Spectral
descriptors
sound
Window
FFT
- DS.centroid, DS.spread, DS.skewness, DS.kurtosis
- DS.slope, DS.decrease, DS.roll-off
- DS.variation
10WP2.1.1.Audio descriptorsDH Harmonic
descriptors
Window
FFT
sound
Sinudoidal model
- DH.Centroid, DH.Spread, DH.Skewness, DH.Kurtosis
- DH.Slope, DH.Decrease, DH.Roll-off
- DH.Variation
- DH.Fundamental frequency
- DH.Noisiness, DH.OddEvenRatio, DH.Inharmonicity
- DH.Tristimulus
- DH.Deviation,
11WP2.1.1.Audio descriptorsDP Perceptual
descriptors
sound
Window
FFT
Perception
Mid-ear filering
Bark scale
Mel scale
- DP.Loudness, Specific Loudness (normalized)
- DP.Sharpness
- DP.Spread
- DP.MFCC, DP.Delta-MFCC, DP.Delta-Delta-MFCC
12WP2.1.1.Audio descriptorsTemporal Modeling
- Temporal modeling
- Mean, Std, Dev
- Auto-correlation
- Polynomial model
- State model (HMM)
13WP2.1.1.Mapping/Matching
- Pre-selection of descriptors
- Linear Discriminant Analysis
- Mutual Information
- Descriptors Space Transformation
- Linear Discriminant Analysis
- Class modeling
- Multi-dimensional gaussian
- (Multi-dimensional gaussian-mixture)
- (K Nearest Neighbors)
- (Tree)
14WP2.1.1.Mapping/MatchingEvaluation
New sound
Extraction
Descriptors
Descriptor Selection
List of descriptors
Class definition
Transformation matrix
For all classes mean vector, covariance matrix
Class
Class names
15WP2.1.1.Mapping/MatchingEvaluation of the
technology
- Database 1400 sounds from Studio OnLine, 16
instruments - Learning on 66 / evaluation on the remaining 33
- Taxonomy used
16WP2.1.1.Mapping/MatchingEvaluation of the
technology
17WP2.1.1.Conclusion
- Achieved
- Set of descriptors
- Classification modules
- Remaining
- Hierarchical classifiers
- XML output (WP2.1.4.)
- Temporal modeling of descriptors (WP2.1.7
Dynamic Descirptors)
18WP2.1.2. Web Music Monitoring System
- Goal
- music identification from signal content (Music
ID) - Music ID/Watermarking
19WP2.1.2. Web Music Monitoring SystemTechnology
- Coding
- code extracted from the signal content
- represents the energy variation inside several
frequency channels - choice of the features by mutual information
(trained on a database) - features 13 words / sec
- 650 titles 11 Mo
- Search
- Algorithm1 distance (10 sec recording)
- Algorithm2 cumulative probabilities (12 sec
recording)
20WP2.1.2. Web Music Monitoring SystemEvaluation
- Evaluation
- 650 mp3 titles
- taking randomly 10 sec (12 sec) of each title
21WP2.1.5.Psychoacoustic descriptors for Timbre
Spaces (I)
- Object
- provide an optimal set of numerical descriptors
- Method
- Collect all timbre spaces studied in the
psychoacoustic litterature (McAdams 2002) - Collect available numerical descriptors (Peeters
2000, Susini 2000) - Apply multiple regression analysis on each
timbre-space for all descriptors - Propose an optimal set of descriptors
- Indicate possible limitations
- Source
- 6 timbres spaces (128 sounds)
- 69 signal purely based descriptors (48 harmonic,
21 percussive) - 3 descriptors with auditory modeling
22WP2.1.5.Psychoacoustic descriptors for Timbre
Spaces (II)
- Results
- Cluster analysis of the correlation matrix of all
descriptors - 9 basic groups of descriptors (from available
set) - spectral slope
- spectral centroïd
- spectral flux
- spectral spread (standard deviation)
- spectral deviation
- spectral shape (kurtosis, skewness, slope)
- fluctuation/roughness
- rms power and energy
- attack time
23WP2.1.5.Psychoacoustic descriptors for Timbre
Spaces (II)
- Results
- Multiple regression analysis 5 groups of
descriptors emerged - spectral centroid,
- spectral spread (standard deviation),
- spectral deviation,
- energy,
- effective duration / attack time
- Conclusions
- Found optimal list of descriptors
- A distance between sounds can be computed
- A global distance model requires an a priori
knowledge of the class of sounds and listeners
24WP2.1.9.HOS for instrumental and sound FX
matching
- See Shlomo Dubnov presentation
25WP2.1.10. Improving Fundamental Frequency
Estimation (I)
- Monophonic F0 detector based on Two-Way Mismatch
(Maher Beauchamp, 1993) - Integration into 2.1.1 modules is going to use
single frame estimations this may not be
optimal, as context (previous F0, next F0,
instrument, etc.) is not considered - Polyphonic F0 detector (Klapuri 2000) using
bandwise processing - Intended mainly for polyphonic-monotimbral
instruments, or small ensembles, not for dense
mixtures of sounds - Estimation of candidates is performed for each
analysis frame. Several candidates are obtained - The tracking of candidates is not still
implemented - Our main interest is in deriving a predominant F0
only
26WP2.1.10. Improving Fundamental Frequency
Estimation (I)
27WP2.1.11. Blind Temporal Segmentation
- See Perfecto Herrera presentation
28WP21 current status at M2
29WP21 Future works
- WP2.1.1. Add hierarchical classification
- WP2.1.3. Fast Browsing Over Sound Archives
- WP2.1.4. XML output
- WP2.1.6. Textual Timbre Descriptors
- WP2.1.7. Dynamic Sound Description
- WP2.1.7.4. HMM/Viterbi model
- WP2.1.8 ICA/ISA Signal Analysis
- WP2.1.11 Blind temporal segmentation