An introduction to the Speaker Verification Task - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

An introduction to the Speaker Verification Task

Description:

P R I F Y S G O L C Y M R U A B E R T A W E. U N I V E R S I T Y O F W A L E ... Discriminative model. Likeliness estimation, decision. Speaker Verification. 7 ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 29
Provided by: bfa9
Category:

less

Transcript and Presenter's Notes

Title: An introduction to the Speaker Verification Task


1
An introduction to theSpeaker Verification Task
Benoit Fauve
2
Outline
  • Introduction a biometric problem
  • Measure/Features
  • Learning/Model
  • Result/Decision
  • Extra

3
Feature in biometric
A simple biometric problem How to do an
automatic male/female discrimination?
4
Feature in biometric
T1 T2
Feature extraction
Acquisition
5
Features building of statistical model
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
6
Features building of statistical model
T1 T2
T1 T2
T1 T2
T1 T2
T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1 T2
T1
  • Key steps
  • Measure
  • Discriminative model
  • Likeliness estimation, decision

7
Speaker verification task
Joe Bloggs
sound sample ????
Someone else
Is it Joe Bloggs talking in the sound
sample? Similar problem than gender
discrimination Male or Female? Joe Blog or
Someone else?
8
  • Feature extraction

9
What is a good feature?
  • We are looking for parameters with following
    properties
  • Low variability between sessions of a same
    speaker.
  • High variability between different speaker
  • Limited perturbations due to recording channel
    (codec, channel and microphone bandwidth, noise)

10
Speech production
Air from the lungs
Vocal fold
Vocal tract
Speech
11
Vocal tract measurement
  • Limitations
  • - Most database (ex NIST) only have sound
    recordings.
  • Full access to the speaker throat required
    (which he might decline to offer).
  • Not reproducible (limit for experiments)

12
Friendly way to get vocal tract characteristics
  • Ways to get to the spectral envelop
  • Prediction family
  • LPC Linear Prediction
  • PLP Perceptual Linear Prediction
  • Filter bank family
  • MF Mel-frequency-spaced Filterbank
  • LF Linear-frequency-spaced Filterbank

Spectral envelop reflects morphological
characteristics of the vocal tract
13
Example Mel-Frequency Ceptral Coeff. MFCC
14
Features in speech
X1 . . . . Xi . . . . .
Feature extraction
Acquisition
Frame length 20 ms
Shift 10 ms
Size 30 to 60
15
  • Probabilistic approach
  • Speaker modelling

16
Introduction to the probabilistic approach
Client
Speaker S
Test Y
other speakers
- H1 Y has been pronounced by the speaker S. -
H2 Y has been pronounced by someone else than
the speaker S.
World
17
Probabilistic approach training
X1 . . . . Xi . . . . .
Mixture of Gaussians representing probabilities
densities
Speaker S
xi
Description of the statistical distribution of
the acoustic observation from the class S.
Features
other speakers
xi
18
In practice Multi Gaussian and MAP adaptation
- Data do no follow Gaussian distribution - There
is a limited amount of data for the targeted
speaker
xi
xi
- Mixture of Gaussian 512 to 2048 - MAP
adaptation
19
  • Result/Decision

20
Probabilistic approach test
In theory we look for the value S(Y) log
P(YH1 ) - log P(YH2 )
P(YiH1 )
Client model
In practice Output
Test Y
S(Y) 1/N ? log P(YiH1 ) - log P(YiH2 )
Yi
YN
UBM
P(YiH2 )
21
ASR decision soft/hard
??
JB
Test
ASR System
Soft
Score
Hard
Threshold
Rejected
Accepted
22
Error types
23
System evaluation - DET curve
S1 .Sn target scores (example of outputs when
the 2 sound samples come from the same person) Sk
.Sl non-target scores (example of outputs when
the 2 sound samples come from different persons)
24
System evaluation - DET curve
S1 .Sn target scores (example of outputs when
the 2 sound samples come from the same person) Sk
.Sl non-target scores (example of outputs when
the 2 sound samples come from different persons)
DET curve
Martin, A. and Przybocki, M. A. The DET curve in
assessment of detection task performance.
Eurospeech 1997, pages 18951898
25
  • Extra score normalisation T-Norm

26
T-Norm Principal
Scores
Model
Test File
27
T-Norm Principal
Scores
Model
Test File
In practice all test files are tested over a
series of impostor models (70 - 100) Depending
on the mean and variance of these results the
final score is normalised
28
Summary
Adaptation
Write a Comment
User Comments (0)
About PowerShow.com