25 - PowerPoint PPT Presentation

About This Presentation
Title:

25

Description:

'Singer Identification in Popular Music Recordings Using Voice Coding Features' (MIT Media Lab) ... identification in popular music recordings using voice ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 26
Provided by: tri6
Category:
Tags: music | popular

less

Transcript and Presenter's Notes

Title: 25


1
Singer SimilarityA Brief Literature Review
  • Catherine Lai
  • MUMT-611 MIR
  • March 24, 2005

1
2
Outline of Presentation
  • Introduction
  • Motivation
  • Related research
  • Recent publications
  • Kim Whitman, 2002
  • Liu Huang, 2002
  • Tsai, Wang, Rodgers, Cheng Yu, 2003
  • Bartsch Wakefield, 2004
  • Discussion
  • Conclusion

2
3
Introduction
  • Motivation
  • Multitude of audio files circulation on the
    Internet
  • Replace human documentation efforts and organize
    collection of music recordings automatically
  • Singer identification relatively easy for human
    but not machines
  • Related Research
  • Speaker identification
  • Musical instrument identification

3
4
Kim Whitman, 2002.
  • Singer Identification in Popular Music
    Recordings Using Voice Coding Features (MIT
    Media Lab)
  • Automatically establish the I.D. of singer using
    acoustic features extracted from songs in a DB of
    pop music
  • Perform segmentation of vocal region prior to
    singer I.d.
  • Classifier uses features drawn from voice coding
    based on Linear Predictive Coding (LPC)
  • Good at highlight formant locations
  • Regions of resonance significant perceptually

4
5
Kim Whitman, 2002.Detection of Vocal Region
  • Detect region of singing detect energy within
    frequencies bounded by the range of vocal energy
  • Filter audio signal with band-pass filter
  • Used Chebychev IIR digital filter of order 12
  • Attenuate other instruments fall outside of the
    vocal range regions e.g. bass and cymbals
  • Voice not only remaining instrument in the region
  • Discriminate the other sounds e.g. drums use a
    measure of harmonicity
  • Vocal segment is 90 voiced is highly harmonic
  • Measure harmonicity of filtered signal within
    analysis frame and thresholding the harmonicity
    against a fixed value

5
6
Kim Whitman, 2002.Feature Extraction
  • 12-pole LP analysis based on the general
    principle behind LPC for speech used for feature
    extraction
  • LP analysis performed on linear and warped scales
  • Linear scale treats all frequencies equally on
    linear scale
  • Human ears not equally sensitive to all
    frequencies linearly
  • Warping function adjusts closely to the Bark
    scale approx. frequency sensitivity of human
    hearing
  • Warp function better at capture formant location
    at lower frequencies

6
7
Kim Whitman, 2002.Experiments
  • Data sets include 17 different singer 200 songs
  • 2 classifier Gaussian Mixture Model (GMM) and SVM
    used on 3 different feature sets
  • Linear scaled, warped scaled, both linear and
    warped data
  • Run on entire song data and on segments
    classified as vocal only

7
8
Kim Whitman, 2002.Results
  • Linear frequency feature tend to outperform the
    warped frequency feature when each used alone
    combined best
  • Song and frame accuracy increases when using only
    vocal segments in GMM
  • Song and frame accuracy decreases when using only
    vocal segments in SVM

Kim Whitman, 2002
8
9
Kim Whitman, 2002.Discussion and Future Work
  • Better performance of linear frequency scale
    features vs. warped frequency scale features
    indicate
  • Machine find increased accuracy of the linear
    scale at higher frequencies useful
  • Contrary to human auditory system
  • The performance of the SVM decreased is puzzling
  • Finding aspects of the features not specifically
    related to voice
  • Add high-level musical knowledge to the system
  • Attempt to I.D. song structure such as locate
    verses or choruses
  • Higher probability of vocals in these sections

9
10
Liu Huang, 2002.
  • A Singer Identification Technique for
    Content-Based Classification of MP3 Music
    Objects
  • Automatically classify MP3 music objects
    according to singers
  • Major steps
  • Coefficients extracted from compressed raw data
    used to compute the MP3 features for segmentation
  • Use these features to segment MP3 objects into a
    sequence of notes or phonemes
  • Waveform of 2 phonemes
  • Each MP3 phoneme in the training set, its MP3
    features extracted and stored with its associated
    singer in phoneme DB
  • Phoneme in the MP3 DB used as discriminators in
    an MP3 classifier to I.D. the singers of unknown
    MP3 objects

Liu Huang, 2002
10
11
Liu Huang, 2002.Classification
  • Number of different phonemes a singer can sing is
    limited and singer with different timbre possess
    unique phoneme set
  • Phonemes of an unknown MP3 song can be associated
    with the similar phoneme of the same singer in
    the phoneme DB
  • kNN classifier used for classification
  • Each unknown MP3 song first segmented into
    phonemes
  • First N phonemes used and compared with every
    discriminators in the phoneme DB
  • K closest neighbors found
  • For each of the k closest neighbor,
  • If its distance within a threshold, a weighted
    vote given
  • KN weighted votes accumulated according to
    singer
  • Unknown MP3 song is assigned to the singer with
    largest score

11
12
Liu Huang, 2002.Experiments
  • Data set consists of 10 male and 10 female
    Chinese singers each with 30 songs
  • 3 factors dominate the results of the MP3 music
    classification method
  • Setting of k in the kNN classifier (best k 80
    result 90 precision rate)
  • Threshold for vote decision used by the
    discriminator (best threshold 0.2)
  • Number of singer allowed in a music class (larger
    no. higher precision)
  • Allow 1 singer in a musical class
  • Grouping several singers with similar voice
    provide ability to find songs with singers of
    similar voices

12
13
Liu Huang, 2002.Results and Future Work
  • Results within expectation
  • Songs sung by a singer with very unique style
    resulted in the highest precision rate ( 90)
  • Songs sung by a singer with a common voice
    resulted in only 50 of the precision rate
  • Future work to use more music features
  • Pitch, melody, rhythm, and harmonicity for music
    classification
  • Try to represent MP3 features according to syntax
    and semantics of the MPEG7 standards

Liu Huang, 2002
13
14
Tsai et al., 2003.
  • Blind Clustering of Popular Music Recordings
    Based on Singer Voice Characteristics (ISMIR)
  • Technique for automatically clustering
    undocumented music recording based on associated
    singers given no singer information or population
    of singers
  • Clustering method based on the singers voice
    rather than background music, genre, or others
  • 3-stage process proposed
  • Segmentation of each recording into
    vocal/non-vocal segments
  • Suppressing the characteristics of background
    from vocal segment
  • Clustering the recording based on singer
    characteristic similarity

14
15
Tsai et al., 2003.Classification
  • Classifier for vocal/non-vocal segmentation
  • Front-end signal processor to convert digital
    waveform into spectrum-based feature vectors
  • Back-end statistical processor to perform
    modeling, matching, and decision making

15
16
Tsai et al., 2003.Classification
  • Classifier operates in 2 phases training and
    testing
  • During training phase, a music DB with manual
    vocal/non-vocal transcriptions used to form two
    separate GMMS a vocal GMM and non-vocal GMM
  • In testing phase, recognizer takes as input
    feature vectors extracted from an unknown
    recording, produces as output the frame
    log-likelihoods for the vocal GMM and the
    non-vocal GMM

16
17
Tsai et al., 2003.Classification
  • Block diagram

Tsai, 2003
17
18
Tsai et al., 2003.Decision Rules
  • Decision for each frame made according to one of
    three decision rules 1. frame-based, 2.
    fixed-length-segment-based, and 3.
    homogeneous-segment based decision rules.

Assign a single classification per segment
Tsai, 2003
18
19
Tsai et al., 2003.Singer Characteristic Modeling
  • Characteristics of voice be modeled to cluster
    recordings
  • V v1, v2, v3, be features vectors from a
    vocal region, is a mixture of
  • solo feature vectors S s1, s2, s3,
  • background accompaniment feature vectors B b1,
    b2, b3,
  • S and B unobservable
  • B can be approximated from the non-vocal segments
  • S is subsequent estimated given V and B
  • A solo and a background music model is generate
    for each recording to be clustered

19
20
Tsai et al., 2003.Clustering
  • Each recording evaluated against each singers
    solo model
  • Log-likelihood of the vocal portion of one
    recording tested against one solo model computed
    (for all solo models)
  • K-mean algorithm used for clustering
  • Starts with a single cluster and recursively
    split clusters
  • Bayesian Information Criterion employed to decide
    the best value of k

20
21
Tsai et al., 2003.Experiments
  • Data set consists of 416 tracks from Mandarin pop
    music CD
  • Experiments run to validate the vocal/non-vocal
    segmentation method
  • Best accuracy achieved was 78 using the
    homogeneous segment-based method

21
22
Tsai et al., 2003.Results
  • System evaluation on the basis of average cluster
    purity
  • When k singer population, the highest purity
    0.77

Tsai, 2003
22
23
Tsai et al., 2003.Future Work
  • Test method on a wider variety of data
  • Larger singer population
  • Richer songs with different genre

23
24
Discussion and Conclusion
  • Singer similarity technique can be used to
  • Automatically organize a collection of music
    recordings based on lead singer
  • Labeling of guest performers information usually
    omitted in music in music database
  • Replace human documentation efforts
  • Extend to handle duets, chorus, background
    vocals, other musical data with multiple
    simultaneous or non-simultaneous singers
  • Rock band songs with parts sung by the guitarist,
    drummer band members can be identified

24
25
Bibliography
  • Bartsch, M. and G. Wakefield (2004). Singing
    voice identification using spectral envelope
    estimation. IEEE Transactions on Speech and Audio
    Processing, vol. 12, no. 2,100-9.
  • Kim, Y. and B. Whitman (2002). Singer
    identification in popular music recordings using
    voice coding features. In Proceedings of the 2002
    International Symposium on Music Information
    Retrieval.
  • Liu, C. and C. Huang (2002). A singer
    identification technique for content-based
    classification of mp3 music objects. In
    Proceedings of the 2002 Conference on Information
    and Knowledge Management (CIKM), 438-445.
  • Tsai, W., H. Wang, D. Rodgers, S. Cheng, and H.
    Yu (2003). Blind clustering of popular music
    recording based on singer voice characteristics.
    In Proceedings of the 2003 International
    Symposium on Music Information Retrieval.

25
Write a Comment
User Comments (0)
About PowerShow.com