Variational Bayesian Methods for Audio Indexing - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Variational Bayesian Methods for Audio Indexing

Description:

Many applications (speaker indexing, speech recognition) ... Fully connected (ergodic) HMM topology with duration constraint. Each state represent a speaker. ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 24
Provided by: vale90
Category:

less

Transcript and Presenter's Notes

Title: Variational Bayesian Methods for Audio Indexing


1
Variational Bayesian Methodsfor Audio Indexing
  • Fabio Valente, Christian Wellekens
  • Institut Eurecom

2
Outline
  • Generalities on speaker clustering
  • Model selection/BIC
  • Variational learning
  • Variational model selection
  • Results

3
Speaker clustering
  • Many applications (speaker indexing, speech
    recognition) require clustering segments with the
    same characteristics e.g. speech from the same
    speaker.
  • Goal grouping together speech segments of the
    same speaker
  • Fully connected (ergodic) HMM topology with
    duration constraint. Each state represent a
    speaker.
  • When speaker number is not known it must be
    estimated with a model selection criterion (e.g.
    BIC,)

4
Model selection
Given data Y and model m optimal model maximizes
If prior is uniform, decision depends only on
p(Ym) (a.k.a. marginal likelihood)
Bayesian modeling assumes distributions over
parameters
The criterion is thus the marginal likelihood
Prohibitive to compute for some models (HMM,GMM)
5
Bayesian information criterion (BIC)
First order approximation obtained from the
Laplace approximation of the marginal likelihood
(Schwartz, 1978)
Generally, penalty is multiplied by a constant
(threshold)
BIC does not depend on parameter distributions !
Asymptotically (n large) BIC converges to
log-marginal likelihood
6
Variational Learning
Introduce an approximated variational
distribution
Applying Jensen inequality
ln p(Ym) maximization is then replaced by
maximization of
7
Variational Learning with hidden variables
Sometimes model optimization needs the use of
hidden variables (e.g. state sequence in the EM)
If x is the hidden variable, we can write
Independence hypothesis
8
EM-like algorithm
Under the hypothesis
E-step
M-step
9
VB Model selection
In the same way an approximated posterior
distribution over models can be defined
Maximizing w.r.t. q(m) yields
Model selection based on
Best model maximizes q(m)
10
Experimental framework
  • BN-96 Hub4 evaluation data set
  • Initialize a model with N speakers (states) and
    train the system using VB and ML (or VB and MAP
    with UBM)
  • Reduce the speaker number from N-1 to 1 and train
    using VB and ML (or MAP).
  • Score the N models with VB and BIC and choose the
    best one
  • Three score
  • Best score
  • Selected score (with VB or BIC)
  • Score obtained with the known speaker number
  • Results given in terms of
  • Acp average cluster purity
  • Asp average speaker purity

11
Experiments I
12
Experiments II
13
Dependence on threshold
K function of the threshold
Speaker number function of the threshold
14
Free Energy vs. BIC
15
Experiments III
16
Experiments IV
17
Conclusions and Future Works
  • VB uses free energy for parameter learning and
    model selection.
  • VB generalizes both ML and MAP learning
    framework.
  • VB outperforms ML/BIC on 3 of the 4 BN files.
  • VB outperforms MAP/BIC on 4 of the 4 BN files.
  • Repeat the experiments on other databases (e.g.
    NIST speaker diarization).

18
Thanks for your attention!
19
Data vs. Gaussian components
Final gaussian components function of amount of
data for each speaker
20
Experiments (file 1)
21
Experiments (file 2)
22
Experiments (file 3)
23
Experiments (file 4)
Write a Comment
User Comments (0)
About PowerShow.com