Acoustic Vector Re-sampling for GMMSVM-Based Speaker Verification - PowerPoint PPT Presentation

About This Presentation
Title:

Acoustic Vector Re-sampling for GMMSVM-Based Speaker Verification

Description:

Acoustic Vector Re-sampling for GMMSVM-Based Speaker Verification Man-Wai MAK and Wei RAO The Hong Kong Polytechnic University enmwmak_at_polyu.edu.hk – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 43
Provided by: Welc168
Category:

less

Transcript and Presenter's Notes

Title: Acoustic Vector Re-sampling for GMMSVM-Based Speaker Verification


1
Acoustic Vector Re-sampling for GMMSVM-Based
Speaker Verification
  • Man-Wai MAK and Wei RAO
  • The Hong Kong Polytechnic University
  • enmwmak_at_polyu.edu.hk
  • http//www.eie.polyu.edu.hk/mwmak/

2
Outline
  • GMM-UBM for Speaker Verification
  • GMM-SVM for Speaker Verification
  • Data-Imbalance Problem in GMM-SVM
  • Utterance Partitioning for GMM-SVM
  • Experiments on NIST SRE

3
Speaker Verification
  • To verify the identify of a claimant based on
    his/her own voices

Is this Marys voice?
I am Mary
4
Verification Process
Im John
Decision Threshold
Johns Voiceprint
Johns Model

Score Normalization and Decision Making
Feature Extraction
Scores
Impostor Model
_
Accept/Reject
Impostors Voiceprints
5
Acoustic Features
  • Speech is a continuous evolution of the vocal
    tract
  • Need to extract a sequence of spectra or sequence
    of spectral coefficients
  • Use a sliding window - 25 ms window, 10 ms shift

MFCC
DCT
LogX(?)
6
GMM-UBM for Speaker Verification
  • The acoustic vectors (MFCC) of speaker s is
    modeled by a prob. density function parameterized
    by
  • Gaussian mixture model (GMM) for speaker s

7
GMM-UBM for Speaker Verification
  • The acoustic vectors of a general population is
    modeled by another GMM called the universal
    background model (UBM)
  • Parameters of the UBM

8
GMM-UBM for Speaker Verification
Enrollment Utterance (X(s)) of Client Speaker
Universal Background Model
MAP
Client Speaker Model
9
GMM-UBM Scoring
  • 2-class Hypothesis problem
  • H0 MFCC sequence X(c) comes from to the true
    speaker
  • H1 MFCC sequence X(c) comes from an impostor
  • Verification score is a likelihood ratio

Speaker Model
Score

Feature extraction
Decision
-
Background Model
10
Outline
  • GMM-UBM for Speaker Verification
  • GMM-SVM for Speaker Verification
  • Data-Imbalance Problem in GMM-SVM
  • Acoustic Vector Resampling for GMM-SVM
  • Results on NIST SRE

11
GMM-SVM for Speaker Verification
GMM supervector
UBM
Mean Stacking
Feature Extraction
MAP Adaptation
Mapping
12
GMM-SVM Scoring
SVM Scoring
Compute GMM- Supervector of Target Speaker s
Feature Extraction
UBM
Compute GMM- Supervectors of Background Speakers
Feature Extraction

Feature Extraction
Compute GMM- Supervector of Claimant c
UBM
13
GMM-UBM Scoring Vs. GMM-SVM Scoring
GMM-UBM
GMM-SVM
Normalized GMM-supervector of claimants
utterance
Normalized GMM-supervector of target-speakers
utterance
14
Outline
  • GMM-UBM for Speaker Verification
  • GMM-SVM for Speaker Verification
  • Data-Imbalance Problem in GMM-SVM
  • Utterance Partitioning for GMM-SVM
  • Results on NIST SRE

15
Data Imbalance in GMM-SVM
  • For each target speaker, we only have one
    utterance (GMM-supervector) from the target
    speaker and many utterances from the background
    speakers.
  • So, we have a highly imbalance learning problem.

Only one training vector from the target speaker
16
Data Imbalance in GMM-SVM
Orientation of the decision boundary depends
mainly on impostor-class data
17
A 3-dim two-class problem illustrating the
problem that the SVM decision plane is largely
governed by the impostor-class supervectors.
Data Imbalance in GMM-SVM
Impostor Class
Speaker Class
Region for which the target-speaker vector can be
located without changing the orientation of the
decision plane
18
Outline
  • GMM-UBM for Speaker Verification
  • GMM-SVM for Speaker Verification
  • Data-Imbalance Problem in GMM-SVM
  • Utterance Partitioning for GMM-SVM
  • Results on NIST SRE

19
Utterance Partitioning
  • Partition an enrollment utterance of a target
    speaker into number of sub-utterances, with each
    sub-utterance producing one GMM-supervector.

20
Utterance Partitioning
Background-speakers Utterances
Target-speakers Enrollment Utterance
Feature Extraction
Feature Extraction
UBM
MAP Adaptation and Mean Stacking
SVM Training
SVM of Target Speaker s
21
Length-Representation Trade-off
  • When the number of partitions increases, the
    length of sub-utterance decreases.
  • If the utterance-length is too short, the
    supervectors of the sub-utterances will be almost
    the same as that of the UBM

Supervector corresponding to the UBM
22
Utterance Partitioning with Acoustic Vector
Resampling (UP-AVR)
Goal Increase the number of sub-utterances
without compromising their representation power
Procedure of UP-AVR
  • 1. Randomly rearrange the sequence of acoustic
    vectors in an utterance
  • 2. Partition the acoustic vectors of an
    utterance into N segments
  • 3. If Step 1 and Step 2 are repeated R times,
    we obtain RN1 target-speakers supervectors .

MFCC seq. before randomization
MFCC seq. after randomization
23
Utterance Partitioning with Acoustic Vector
Resampling (UP-AVR)
Target
-
speaker

s Enrollment
U
tterance
Background
-
speaker
s

U
tterances
Feature
Extraction
and
Feature
Extraction
and
Index Randomization
Index Randomization
MAP Adaptation
and
UBM
Mean Stacking
SVM
Training
s
SVM of Target Speaker
24
Utterance Partitioning with Acoustic Vector
Resampling (UP-AVR)
  • Characteristics of supervectors created by UP-AVR
  • Average pairwise distance between sub-utt SVs is
    larger than the average pairwise distance between
    sub-utt SVs and full-utt SV.
  • Average pairwise distance between speaker-classs
    sub-utt SVs and impostor-classs SVs is smaller
    than the average pairwise distance between
    speaker-classs full-utt SV and impostor-classs
    SVs.

Imposter-class
Speaker-class
Sub-utt supervector
Full-utt supervector
25
Nuisance Attribute Projection
Nuisance Attribute Project (NAP) Solomonoff et
al., ICASSP2005
Goal To reduce the effect of session variability
Recall the GMM-supervector kernel
Define the session- and speaker-dependent
supervector as
Remove the session-dependent part (h) by removing
the sub-space that causes the session variability
Sub-space representing session variability. Define
d by V
The New kernel becomes
26
Nuisance Attribute Projection
Nuisance Attribute Project (NAP) Solomonoff et
al., ICASSP2005
Sub-space representing session variability. Define
d by V
27
Enrollment Process of GMM-SVM with UP-AVR
Resampling/ Partitioning
MFCCs of an utterance from target-speaker s
UBM
MAP and Mean Stacking
Session-dependent supervectors
NAP
Session-independent supervectors
SVM of target-speaker s
SVM Training
28
Verification Process of GMM-SVM with UP-AVR
MFCCs of a test utterance from claimant c
UBM
MAP and Mean Stacking
Session-dependent supervector
Tnorm Models
NAP
Session-independent supervector
score
Normalized score
SVM Scoring
T-Norm
SVM of target-speaker s
29
T-Norm (Auckenthaler, 2000)
Goal To shift and scale the verification scores
so that a global decision threshold can be used
for all speakers
T-Norm SVM 1
SVM Scoring
Compute Mean and Standard Deviation
Z-norm
from test utterance
SVM Scoring
T-Norm SVM R
30
Outline
  • GMM-UBM for Speaker Verification
  • GMM-SVM for Speaker Verification
  • Data-Imbalance Problem in GMM-SVM
  • Utterance Partitioning for GMM-SVM
  • Experiments on NIST SRE

31
Experiments
Speech Data
  • Evaluations on NIST SRE 2002 and 2004
  • NIST SRE 2002
  • Use NIST01 for computing the UBMs,
    impostor-class supervectors of SVMs, Tnorm
    models, and NAP parameters
  • 2983 true-speaker trials and 36287 impostor
    attempts
  • 2-min utterances for training and about 1-min utt
    for test
  • NIST SRE 2004
  • Use the Fisher corpus for computing UBMs,
    impostor-class supervectors of SVMs, and Tnorm
    models
  • NIST99 and NIST00 for computing NAP parameters
  • 2386 true-speaker trials and 23838 impostor
    attempts
  • 5-min utterances for training and testing

32
Experiments
Features and Models
  • 12 MFCC 12 ?MFCC with feature warping
  • 1024-mixture GMMs for GMM-UBM
  • 256-mixture GMMs for GMM-SVM
  • MAP relevance factor 16
  • 300 impostor-class supervectors for GMM-SVM
  • 200 T-norm models
  • 64-dim session variability subspace (NAP corank,
    rank of V)

33
Results
No. of mixtures in GMM-SVM (NIST02)
Threshold below which the variances of feature
are deemed too small
Normalized
Large number of features with small variance
34
Results
Effects of NAP on Different NIST SRE
Large eigenvalues mean large session variation
35
Results
Effect of NAP Corank on Performance
No NAP
36
Results
Comparing discriminative power of GMM-SVM and
GMM-SVM with UP-AVR
37
Results
EER and MinDCF vs. No. of Target-Speaker
Supervectors
NIST02
38
Results
Varying the number of resampling (R) and number
of partitions (N)
NIST02
39
Results
NIST02
40
Experiments and Results
Performance on NIST02
EER9.39
EER9.05
EER8.16
41
Experiments and Results
Performance on NIST04
GMM-UBM
EER16.05
GMM-SVM
GMM-SVM w/ UP-AVR
EER10.42
EER9.46
42
References
  • S.X. Zhang and M.W. Mak "Optimized Discriminative
    Kernel for SVM Scoring and its Application to
    Speaker Verification", IEEE Trans. on Neural
    Networks, to appear.
  • M.W. Mak and W. Rao, "Utterance Partitioning with
    Acoustic Vector Resampling for GMM-SVM Speaker
    Verification", Speech Communication, vol. 53 (1),
    Jan. 2011, Pages 119-130.
  • M.W. Mak and W. Rao, "Acoustic Vector Resampling
    for GMMSVM-Based Speaker Verification,
    Interspeech 2010. Sept. 2010, Makuhari, Japan,
    pp. 1449-1452.
  • S.Y. Kung, M.W. Mak, and S.H. Lin. Biometric
    Authentication A Machine Learning Approach,
    Prentice Hall, 2005
  • W. M. Campbell, D. E. Sturim, and D. A. Reynolds,
    Support vector machines using GMM supervectors
    for speaker verification, IEEE Signal Processing
    Letters, vol. 13, pp. 308311, 2006.
  • D. A. Reynolds, T. F. Quatieri, and R. B. Dunn,
    Speaker verification using adapted Gaussian
    mixture models, Digital Signal Processing, vol.
    10, pp. 1941, 2000.
Write a Comment
User Comments (0)
About PowerShow.com