CUIDADO: WP2'1' Audio Features extraction M2 meeting - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

CUIDADO: WP2'1' Audio Features extraction M2 meeting

Description:

Database: 1400 sounds from Studio OnLine, 16 instruments ... Web Music Monitoring System. Evaluation. Evaluation: 650 mp3 titles ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 30
Provided by: xavier6
Category:

less

Transcript and Presenter's Notes

Title: CUIDADO: WP2'1' Audio Features extraction M2 meeting


1
CUIDADO WP2.1. Audio Features extractionM2
meeting
  • Geoffroy Peeters
  • 2002/03/18-19

2
WP21 goals until M2
  • WP2.1.1. (ircam) Static description
    (descriptors, mapping/matching, classification)
  • WP2.1.2. (ircam) Web Music Monitoring System
  • WP2.1.5. (ircam) Psycho-acoustic Timbre
    descriptors
  • WP2.1.9. (bgu) HOS for instrument and sound
    FX matching
  • WP2.1.10 (iua-upf) Improving fundamental
    frequency
  • WP2.1.11 (iua-upf) Blind temporal segmentation

3
WP2.1.Sound description
Classification scheme
Sound and descriptors database
Classification
Search by similarity
4
WP2.1.Sound description
WP2.1.11.
WP2.1.1. / WP2.1.9
WP2.1.10.
Classification scheme
Sound and descriptors database
Classification
Search by similarity
WP2.1.5.
5
WP2.1.1.Static Sound Description
  • WP2.1.1.
  • gt Descriptors Extraction modules
  • gt Class modeling modules

WP2.1.1.Sound description
6
WP2.1.1.Audio descriptorsTaxonomy
  • Global descriptors
  • Instantaneous descriptors
  • temporal modelingmean, std, derivative,
    cross-correlation, ...

7
WP2.1.1.Audio descriptorsTaxonomy
  • DT temporal descriptors
  • DE energy descriptors
  • DS spectral descriptors
  • DH harmonic descriptors
  • DP perceptual descriptors

8
WP2.1.1.Audio descriptorsDT Temporal/Energy
descriptors
Energy
Envelop
sound
  • DT.log-attack time
  • DT.temporal centroid
  • DT.temporal decrease
  • DT.effective duration
  • DT.zero-crossing rate
  • DT.auto-correlation
  • DE.total energy
  • DE.energy of sinusoidal part
  • DE.energy of noise part

9
WP2.1.1.Audio descriptorsDS Spectral
descriptors
sound
Window
FFT
  • DS.centroid, DS.spread, DS.skewness, DS.kurtosis
  • DS.slope, DS.decrease, DS.roll-off
  • DS.variation

10
WP2.1.1.Audio descriptorsDH Harmonic
descriptors
Window
FFT
sound
Sinudoidal model
  • DH.Centroid, DH.Spread, DH.Skewness, DH.Kurtosis
  • DH.Slope, DH.Decrease, DH.Roll-off
  • DH.Variation
  • DH.Fundamental frequency
  • DH.Noisiness, DH.OddEvenRatio, DH.Inharmonicity
  • DH.Tristimulus
  • DH.Deviation,

11
WP2.1.1.Audio descriptorsDP Perceptual
descriptors
sound
Window
FFT
Perception
Mid-ear filering
Bark scale
Mel scale
  • DP.Loudness, Specific Loudness (normalized)
  • DP.Sharpness
  • DP.Spread
  • DP.MFCC, DP.Delta-MFCC, DP.Delta-Delta-MFCC

12
WP2.1.1.Audio descriptorsTemporal Modeling
  • Temporal modeling
  • Mean, Std, Dev
  • Auto-correlation
  • Polynomial model
  • State model (HMM)

13
WP2.1.1.Mapping/Matching
  • Pre-selection of descriptors
  • Linear Discriminant Analysis
  • Mutual Information
  • Descriptors Space Transformation
  • Linear Discriminant Analysis
  • Class modeling
  • Multi-dimensional gaussian
  • (Multi-dimensional gaussian-mixture)
  • (K Nearest Neighbors)
  • (Tree)

14
WP2.1.1.Mapping/MatchingEvaluation
New sound
Extraction
Descriptors
Descriptor Selection
List of descriptors
Class definition
Transformation matrix
For all classes mean vector, covariance matrix
Class
Class names
15
WP2.1.1.Mapping/MatchingEvaluation of the
technology
  • Database 1400 sounds from Studio OnLine, 16
    instruments
  • Learning on 66 / evaluation on the remaining 33
  • Taxonomy used

16
WP2.1.1.Mapping/MatchingEvaluation of the
technology
17
WP2.1.1.Conclusion
  • Achieved
  • Set of descriptors
  • Classification modules
  • Remaining
  • Hierarchical classifiers
  • XML output (WP2.1.4.)
  • Temporal modeling of descriptors (WP2.1.7
    Dynamic Descirptors)

18
WP2.1.2. Web Music Monitoring System
  • Goal
  • music identification from signal content (Music
    ID)
  • Music ID/Watermarking

19
WP2.1.2. Web Music Monitoring SystemTechnology
  • Coding
  • code extracted from the signal content
  • represents the energy variation inside several
    frequency channels
  • choice of the features by mutual information
    (trained on a database)
  • features 13 words / sec
  • 650 titles 11 Mo
  • Search
  • Algorithm1 distance (10 sec recording)
  • Algorithm2 cumulative probabilities (12 sec
    recording)

20
WP2.1.2. Web Music Monitoring SystemEvaluation
  • Evaluation
  • 650 mp3 titles
  • taking randomly 10 sec (12 sec) of each title

21
WP2.1.5.Psychoacoustic descriptors for Timbre
Spaces (I)
  • Object
  • provide an optimal set of numerical descriptors
  • Method
  • Collect all timbre spaces studied in the
    psychoacoustic litterature (McAdams 2002)
  • Collect available numerical descriptors (Peeters
    2000, Susini 2000)
  • Apply multiple regression analysis on each
    timbre-space for all descriptors
  • Propose an optimal set of descriptors
  • Indicate possible limitations
  • Source
  • 6 timbres spaces (128 sounds)
  • 69 signal purely based descriptors (48 harmonic,
    21 percussive)
  • 3 descriptors with auditory modeling

22
WP2.1.5.Psychoacoustic descriptors for Timbre
Spaces (II)
  • Results
  • Cluster analysis of the correlation matrix of all
    descriptors
  • 9 basic groups of descriptors (from available
    set)
  • spectral slope
  • spectral centroïd
  • spectral flux
  • spectral spread (standard deviation)
  • spectral deviation
  • spectral shape (kurtosis, skewness, slope)
  • fluctuation/roughness
  • rms power and energy
  • attack time

23
WP2.1.5.Psychoacoustic descriptors for Timbre
Spaces (II)
  • Results
  • Multiple regression analysis 5 groups of
    descriptors emerged
  • spectral centroid,
  • spectral spread (standard deviation),
  • spectral deviation,
  • energy,
  • effective duration / attack time
  • Conclusions
  • Found optimal list of descriptors
  • A distance between sounds can be computed
  • A global distance model requires an a priori
    knowledge of the class of sounds and listeners

24
WP2.1.9.HOS for instrumental and sound FX
matching
  • See Shlomo Dubnov presentation

25
WP2.1.10. Improving Fundamental Frequency
Estimation (I)
  • Monophonic F0 detector based on Two-Way Mismatch
    (Maher Beauchamp, 1993)
  • Integration into 2.1.1 modules is going to use
    single frame estimations this may not be
    optimal, as context (previous F0, next F0,
    instrument, etc.) is not considered
  • Polyphonic F0 detector (Klapuri 2000) using
    bandwise processing
  • Intended mainly for polyphonic-monotimbral
    instruments, or small ensembles, not for dense
    mixtures of sounds
  • Estimation of candidates is performed for each
    analysis frame. Several candidates are obtained
  • The tracking of candidates is not still
    implemented
  • Our main interest is in deriving a predominant F0
    only

26
WP2.1.10. Improving Fundamental Frequency
Estimation (I)
27
WP2.1.11. Blind Temporal Segmentation
  • See Perfecto Herrera presentation

28
WP21 current status at M2
29
WP21 Future works
  • WP2.1.1. Add hierarchical classification
  • WP2.1.3. Fast Browsing Over Sound Archives
  • WP2.1.4. XML output
  • WP2.1.6. Textual Timbre Descriptors
  • WP2.1.7. Dynamic Sound Description
  • WP2.1.7.4. HMM/Viterbi model
  • WP2.1.8 ICA/ISA Signal Analysis
  • WP2.1.11 Blind temporal segmentation
Write a Comment
User Comments (0)
About PowerShow.com