PhonemeBased Speaker Verification using Adapted Gaussian Mixture Models - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

PhonemeBased Speaker Verification using Adapted Gaussian Mixture Models

Description:

Enrollment speech for. each speaker. Trained Model for. each speaker. Feature Extraction ... day. d. 33. bee. b. 32. Stops. she. sh. 31. sea. s. 30. thin. th ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:5.0/5.0
Slides: 37
Provided by: engT
Category:

less

Transcript and Presenter's Notes

Title: PhonemeBased Speaker Verification using Adapted Gaussian Mixture Models


1
Phoneme-Based Speaker Verificationusing Adapted
Gaussian Mixture Models MSc. Thesis by Yossi
Bar-Yosef Supervised by Prof. Yuval Bistritz
2
Outline
  • Speaker Verification - Introduction
  • GMM Adapted GMM
  • Phoneme-Based SV Configurations
  • Results
  • Phoneme Selection
  • Conclusion

3
Introduction to SV
  • Enrollment Phase

Enrollment speech for each speaker
Trained Model for each speaker
4
Introduction to SV
  • Verification Phase

Likelihood Ratio Test (LRT)
?
X is from the claimed speaker. X is not
from the claimed speaker.
5
Introduction to SV
  • Receiver Operating Curve (ROC)

Equal Error Rate (EER) False Acceptance rate
False Rejection rate
6
GMM
  • Gaussian Mixture Models (GMM)
  • For x, a D-dimensional feature vector, GMM is a
    weighted sum of M Gaussian densities
  • The GMM is denoted as
  • The log-likelihood of a sequence of T feature
    vectors,

7
GMM
  • Expectation-Maximization (EM)
  • An iterative algorithm for refining the models
    parameters to increase the likelihood
  • Expectation
  • Maximization

8
Adapted GMM
  • Adaptation of a GMM
  • Adjusts the current model parameters to a new
    data.
  • Expectation step is identical to EM
  • Combination

9
Adapted GMM
  • Adaptation of a GMM
  • The adaptation coefficients depends on the data

r is the relevance factor
New Training Data
Adapted GMM
GMM
10
Phoneme-Based SV Configurations
  • Basic Structure

r
ah
n
11
Phoneme-Based SV Configurations
  • Models Categorization

12
Phoneme-Based SV Configurations
  • Possible Training Paths

13
Phoneme-Based SV Configurations
  • Training Configurations

UBM
SM
UBM
Cfg1
Cfg4
PDUBM
PDSM
PDUBM
PDSM
UBM
UBM
SM
Cfg2
Cfg5
PDUBM
PDSM
PDUBM
PDSM
PDUBM
PDSM
Cfg3
14
Results
  • Benchmark

UBM
SM
UBM-SM
UBM
SM
World Bkg
Clean Speech
Telephone Speech
15
Results
  • Cfg1

UBM
SM
PDUBM
PDSM
Clean Speech
Telephone Speech
16
Results
  • Cfg2

UBM
SM
PDUBM
PDSM
Clean Speech
Telephone Speech
17
Results
  • Cfg3

PDUBM
PDSM
Clean Speech
Telephone Speech
18
Results
  • Cfg4

UBM
SM
PDUBM
PDSM
Clean Speech
Telephone Speech
19
Results
  • Cfg5

UBM
SM
PDUBM
PDSM
Clean Speech
Telephone Speech
20
Results
  • Configurations Comparison

Clean Speech
20 sec training speech
21
Results
  • Configurations Comparison

Telephone Speech
20 sec training speech
22
Results
  • Configurations Comparison

Clean Speech
10 sec training speech
23
Results
  • Configurations Comparison

Telephone Speech
10 sec training speech
24
Results
  • Configurations Comparison

25
Results
  • Configurations Comparison

26
Phoneme Selection
  • Phonetic Symbols

27
Phoneme Selection
  • Performance of Phoneme
  • Clean speech
  • 20 sec training

28
Phoneme Selection
  • Performance of Phoneme
  • Telephone speech
  • 20 sec training

29
Phoneme Selection
  • Selection Methods

Phoneme set of size N 38
  • Optimal Subset Selection

Required number of evaluations 2N
274,877,906,944
  • N-Best

Requires N evaluations, N 38
  • Knockout Rejection

Required number of evaluations N(N1)/2 - 1 740
30
Phoneme Selection
  • Results

Clean speech
UBM-SM order 128
UBM
SM
Cfg5 order 128
UBM
SM
PDUBM
PDSM
31
Phoneme Selection
  • Results

Clean speech
Cfg4 order 128
UBM
SM
PDUBM
PDSM
Cfg3 order 8
UBM
SM
PDUBM
PDSM
32
Phoneme Selection
  • Results

Telephone speech
UBM-SM order 128
UBM
SM
Cfg5 order 128
UBM
SM
PDUBM
PDSM
33
Phoneme Selection
  • Results

Telephone speech
Cfg4 order 128
UBM
SM
PDUBM
PDSM
Cfg3 order 18
UBM
SM
PDUBM
PDSM
34
Phoneme Selection
  • Results

Clean Speech
Telephone Speech
35
Phoneme Selection
  • Results

UBM
SM
UBM
SM
PDUBM
PDSM
36
Conclusion
  • The coupling principle in models generation and
    scoring
  • Adaptation of a well-trained speaker model
  • Lower storage-cost system
  • Phoneme Selection by the Knockout Rejection
    procedure
Write a Comment
User Comments (0)
About PowerShow.com