Variational Bayesian Methods for Audio Indexing - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Variational Bayesian Methods for Audio Indexing

Description:

Many applications (speaker indexing, speech recognition) ... Fully connected (ergodic) HMM topology with duration constraint. Each state represent a speaker. ... – PowerPoint PPT presentation

Number of Views:94

Avg rating:3.0/5.0

Slides: 24

Provided by: vale90

Category:

more less

Transcript and Presenter's Notes

Title: Variational Bayesian Methods for Audio Indexing

1
Variational Bayesian Methodsfor Audio Indexing

Fabio Valente, Christian Wellekens
Institut Eurecom

2
Outline

Generalities on speaker clustering
Model selection/BIC
Variational learning
Variational model selection
Results

3
Speaker clustering

Many applications (speaker indexing, speech
recognition) require clustering segments with the
same characteristics e.g. speech from the same
speaker.
Goal grouping together speech segments of the
same speaker
Fully connected (ergodic) HMM topology with
duration constraint. Each state represent a
speaker.
When speaker number is not known it must be
estimated with a model selection criterion (e.g.
BIC,)

4
Model selection
Given data Y and model m optimal model maximizes
If prior is uniform, decision depends only on
p(Ym) (a.k.a. marginal likelihood)
Bayesian modeling assumes distributions over
parameters
The criterion is thus the marginal likelihood
Prohibitive to compute for some models (HMM,GMM)
5
Bayesian information criterion (BIC)
First order approximation obtained from the
Laplace approximation of the marginal likelihood
(Schwartz, 1978)
Generally, penalty is multiplied by a constant
(threshold)
BIC does not depend on parameter distributions !
Asymptotically (n large) BIC converges to
log-marginal likelihood
6
Variational Learning
Introduce an approximated variational
distribution
Applying Jensen inequality
ln p(Ym) maximization is then replaced by
maximization of
7
Variational Learning with hidden variables
Sometimes model optimization needs the use of
hidden variables (e.g. state sequence in the EM)
If x is the hidden variable, we can write
Independence hypothesis
8
EM-like algorithm
Under the hypothesis
E-step
M-step
9
VB Model selection
In the same way an approximated posterior
distribution over models can be defined
Maximizing w.r.t. q(m) yields
Model selection based on
Best model maximizes q(m)
10
Experimental framework

BN-96 Hub4 evaluation data set
Initialize a model with N speakers (states) and
train the system using VB and ML (or VB and MAP
with UBM)
Reduce the speaker number from N-1 to 1 and train
using VB and ML (or MAP).
Score the N models with VB and BIC and choose the
best one
Three score
Best score
Selected score (with VB or BIC)
Score obtained with the known speaker number
Results given in terms of
Acp average cluster purity
Asp average speaker purity

11
Experiments I
12
Experiments II
13
Dependence on threshold
K function of the threshold
Speaker number function of the threshold
14
Free Energy vs. BIC
15
Experiments III
16
Experiments IV
17
Conclusions and Future Works

VB uses free energy for parameter learning and
model selection.
VB generalizes both ML and MAP learning
framework.
VB outperforms ML/BIC on 3 of the 4 BN files.
VB outperforms MAP/BIC on 4 of the 4 BN files.
Repeat the experiments on other databases (e.g.
NIST speaker diarization).

18
Thanks for your attention!
19
Data vs. Gaussian components
Final gaussian components function of amount of
data for each speaker
20
Experiments (file 1)
21
Experiments (file 2)
22
Experiments (file 3)
23
Experiments (file 4)

Write a Comment

User Comments (0)