Title: An introduction to the Speaker Verification Task
1An introduction to theSpeaker Verification Task
Benoit Fauve
2Outline
- Introduction a biometric problem
- Measure/Features
- Learning/Model
- Result/Decision
- Extra
3Feature in biometric
A simple biometric problem How to do an
automatic male/female discrimination?
4Feature in biometric
T1 T2
Feature extraction
Acquisition
5Features building of statistical model
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
6Features building of statistical model
T1 T2
T1 T2
T1 T2
T1 T2
T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1
- Key steps
- Measure
- Discriminative model
- Likeliness estimation, decision
7Speaker verification task
Joe Bloggs
sound sample ????
Someone else
Is it Joe Bloggs talking in the sound
sample? Similar problem than gender
discrimination Male or Female? Joe Blog or
Someone else?
8 9What is a good feature?
- We are looking for parameters with following
properties - Low variability between sessions of a same
speaker. - High variability between different speaker
- Limited perturbations due to recording channel
(codec, channel and microphone bandwidth, noise)
10Speech production
Air from the lungs
Vocal fold
Vocal tract
Speech
11Vocal tract measurement
- Limitations
- - Most database (ex NIST) only have sound
recordings. - Full access to the speaker throat required
(which he might decline to offer). - Not reproducible (limit for experiments)
12Friendly way to get vocal tract characteristics
- Ways to get to the spectral envelop
- Prediction family
- LPC Linear Prediction
- PLP Perceptual Linear Prediction
- Filter bank family
- MF Mel-frequency-spaced Filterbank
- LF Linear-frequency-spaced Filterbank
Spectral envelop reflects morphological
characteristics of the vocal tract
13Example Mel-Frequency Ceptral Coeff. MFCC
14Features in speech
X1 . . . . Xi . . . . .
Feature extraction
Acquisition
Frame length 20 ms
Shift 10 ms
Size 30 to 60
15- Probabilistic approach
- Speaker modelling
16Introduction to the probabilistic approach
Client
Speaker S
Test Y
other speakers
- H1 Y has been pronounced by the speaker S. -
H2 Y has been pronounced by someone else than
the speaker S.
World
17Probabilistic approach training
X1 . . . . Xi . . . . .
Mixture of Gaussians representing probabilities
densities
Speaker S
xi
Description of the statistical distribution of
the acoustic observation from the class S.
Features
other speakers
xi
18In practice Multi Gaussian and MAP adaptation
- Data do no follow Gaussian distribution - There
is a limited amount of data for the targeted
speaker
xi
xi
- Mixture of Gaussian 512 to 2048 - MAP
adaptation
19 20Probabilistic approach test
In theory we look for the value S(Y) log
P(YH1 ) - log P(YH2 )
P(YiH1 )
Client model
In practice Output
Test Y
S(Y) 1/N ? log P(YiH1 ) - log P(YiH2 )
Yi
YN
UBM
P(YiH2 )
21ASR decision soft/hard
??
JB
Test
ASR System
Soft
Score
Hard
Threshold
Rejected
Accepted
22Error types
23System evaluation - DET curve
S1 .Sn target scores (example of outputs when
the 2 sound samples come from the same person) Sk
.Sl non-target scores (example of outputs when
the 2 sound samples come from different persons)
24System evaluation - DET curve
S1 .Sn target scores (example of outputs when
the 2 sound samples come from the same person) Sk
.Sl non-target scores (example of outputs when
the 2 sound samples come from different persons)
DET curve
Martin, A. and Przybocki, M. A. The DET curve in
assessment of detection task performance.
Eurospeech 1997, pages 18951898
25- Extra score normalisation T-Norm
26T-Norm Principal
Scores
Model
Test File
27T-Norm Principal
Scores
Model
Test File
In practice all test files are tested over a
series of impostor models (70 - 100) Depending
on the mean and variance of these results the
final score is normalised
28Summary
Adaptation