An introduction to the Speaker Verification Task

About This Presentation

Title:

An introduction to the Speaker Verification Task

Description:

P R I F Y S G O L C Y M R U A B E R T A W E. U N I V E R S I T Y O F W A L E ... Discriminative model. Likeliness estimation, decision. Speaker Verification. 7 ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 29

Provided by: bfa9

Category:

more less

Transcript and Presenter's Notes

Title: An introduction to the Speaker Verification Task

1
An introduction to theSpeaker Verification Task
Benoit Fauve
2
Outline

Introduction a biometric problem
Measure/Features
Learning/Model
Result/Decision
Extra

3
Feature in biometric
A simple biometric problem How to do an
automatic male/female discrimination?
4
Feature in biometric
T1 T2
Feature extraction
Acquisition
5
Features building of statistical model
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
6
Features building of statistical model
T1 T2
T1 T2
T1 T2
T1 T2
T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1

Key steps
Measure
Discriminative model
Likeliness estimation, decision

7
Speaker verification task
Joe Bloggs
sound sample ????
Someone else
Is it Joe Bloggs talking in the sound
sample? Similar problem than gender
discrimination Male or Female? Joe Blog or
Someone else?
8

Feature extraction

9
What is a good feature?

We are looking for parameters with following
properties
Low variability between sessions of a same
speaker.
High variability between different speaker
Limited perturbations due to recording channel
(codec, channel and microphone bandwidth, noise)

10
Speech production
Air from the lungs
Vocal fold
Vocal tract
Speech
11
Vocal tract measurement

Limitations
- Most database (ex NIST) only have sound
recordings.
Full access to the speaker throat required
(which he might decline to offer).
Not reproducible (limit for experiments)

12
Friendly way to get vocal tract characteristics

Ways to get to the spectral envelop
Prediction family
LPC Linear Prediction
PLP Perceptual Linear Prediction
Filter bank family
MF Mel-frequency-spaced Filterbank
LF Linear-frequency-spaced Filterbank

Spectral envelop reflects morphological
characteristics of the vocal tract
13
Example Mel-Frequency Ceptral Coeff. MFCC
14
Features in speech
X1 . . . . Xi . . . . .
Feature extraction
Acquisition
Frame length 20 ms
Shift 10 ms
Size 30 to 60
15

Probabilistic approach
Speaker modelling

16
Introduction to the probabilistic approach
Client
Speaker S
Test Y
other speakers
- H1 Y has been pronounced by the speaker S. -
H2 Y has been pronounced by someone else than
the speaker S.
World
17
Probabilistic approach training
X1 . . . . Xi . . . . .
Mixture of Gaussians representing probabilities
densities
Speaker S
xi
Description of the statistical distribution of
the acoustic observation from the class S.
Features
other speakers
xi
18
In practice Multi Gaussian and MAP adaptation
- Data do no follow Gaussian distribution - There
is a limited amount of data for the targeted
speaker
xi
xi
- Mixture of Gaussian 512 to 2048 - MAP
adaptation
19

Result/Decision

20
Probabilistic approach test
In theory we look for the value S(Y) log
P(YH1 ) - log P(YH2 )
P(YiH1 )
Client model
In practice Output
Test Y
S(Y) 1/N ? log P(YiH1 ) - log P(YiH2 )
Yi
YN
UBM
P(YiH2 )
21
ASR decision soft/hard
??
JB
Test
ASR System
Soft
Score
Hard
Threshold
Rejected
Accepted
22
Error types
23
System evaluation - DET curve
S1 .Sn target scores (example of outputs when
the 2 sound samples come from the same person) Sk
.Sl non-target scores (example of outputs when
the 2 sound samples come from different persons)
24
System evaluation - DET curve
S1 .Sn target scores (example of outputs when
the 2 sound samples come from the same person) Sk
.Sl non-target scores (example of outputs when
the 2 sound samples come from different persons)
DET curve
Martin, A. and Przybocki, M. A. The DET curve in
assessment of detection task performance.
Eurospeech 1997, pages 18951898
25

Extra score normalisation T-Norm

26
T-Norm Principal
Scores
Model
Test File
27
T-Norm Principal
Scores
Model
Test File
In practice all test files are tested over a
series of impostor models (70 - 100) Depending
on the mean and variance of these results the
final score is normalised
28
Summary
Adaptation

Write a Comment

User Comments (0)

About PowerShow.com

An introduction to the Speaker Verification Task - PowerPoint PPT Presentation

An introduction to the Speaker Verification Task

P R I F Y S G O L C Y M R U A B E R T A W E. U N I V E R S I T Y O F W A L E ... Discriminative model. Likeliness estimation, decision. Speaker Verification. 7 ... – PowerPoint PPT presentation