Automatic Speaker Verification : Technologies, Evaluations and Possible Future presentation

About This Presentation

Transcript and Presenter's Notes

Title: Automatic Speaker Verification : Technologies, Evaluations and Possible Future

1
Automatic Speaker Verification Technologies,
Evaluationsand Possible Future
Biometrics in Current Security Environments

Gérard CHOLLET
CNRS-LTCI, GET-ENST
chollet_at_tsi.enst.fr

2
Outline

State of affairs (tasks, security, forensic,)
Speaker characteristics in the speech signal
Automatic Speaker Verification
Decision theory
Text dependent / Text independent
Imposture (occasional, dedicated)
Voice transformations
Audio-visual speaker verification
Evaluations (algorithms, field tests, ergonomy,)
Conclusions, Perspectives

3
Why should a computer recognize who is speaking ?

Protection of individual property (habitation,
bank account, personal data, messages, mobile
phone, PDA,...)
Limited access (secured areas, data bases)
Personalization (only respond to its masters
voice)
Locate a particular person in an audio-visual
document (information retrieval)
Who is speaking in a meeting ?
Is a suspect the criminal ? (forensic
applications)

4
Tasks in Automatic Speaker Recognition

Speaker verification (Voice Biometric)
Are you really who you claim to be ?
Identification (Speaker ID)
Is this speech segment coming from a known
speaker ?
How large is the set of speakers (population of
the world) ?
Speaker detection, segmentation, indexing,
retrieval, tracking
Looking for recordings of a particular speaker
Combining Speech and Speaker Recognition
Adaptation to a new speaker, speaker typology
Personalization in dialogue systems

5
Applications

Access Control
Physical facilities, Computer networks, Websites
Transaction Authentication
Telephone banking, e-Commerce
Speech data Management
Voice messaging, Search engines
Law Enforcement
Forensics, Home incarceration

6
Voice Biometric

Avantages
Often the only modality over the telephone,
Low cost (microphone, A/D), Ubiquity
Possible integration on a smart (SIM) card
Natural bimodal fusion speaking face
Disadvantages
Lack of discretion
Possibility of imitation and electronic imposture
Lack of robustness to noise, distortion,
Temporal drift

7
Speaker Identity in Speech

Differences in
Vocal tract shapes and muscular control
Fundamental frequency (typical values)
100 Hz (Male), 200 Hz (Female), 300 Hz (Child)
Glottal waveform
Phonotactics
Lexical usage
The differences between Voices of Twins is a
limit case
Voices can also be imitated or disguised

8
(No Transcript)
9
Acoutic features

Short term spectral analysis

10
Intra- and Inter-speaker variability
11
Speaker Verification

Typology of approaches (EAGLES Handbook)
Text dependent
Public password
Private password
Customized password
Text prompted
Text independent
Incremental enrolment
Evaluation

12
History of Speaker Recognition
13
Current approaches
14
HMM structure depends on the application
15
Gaussian Mixture Model

Parametric representation of the probability
distribution of observations

16
Gaussian Mixture Models
8 Gaussians per mixture
17
Decision theory for identity verification

Two types of errors
False rejection (a client is rejected)
False acceptation (an impostor is accepted)
Decision theory given an observation O and a
claimed identity
H0 hypothesis it comes from an impostor
H1 hypothesis it comes from our client
H1 is chosen if and only if P(H1O) gt P(H0O)
which could be rewritten (using Bayes law) as

18
Signal detection theory
19
Decision
20
Distribution of scores
21
Detection Error Tradeoff (DET) Curve
22
Evaluation

Decision cost (FA, FR, priors, costs,)
Receiver Operating Characteristic Curve
Reference systems (open software)
Evaluations (algorithms, field trials, ergonomy,)

23
National Institute of Standards Technology
(NIST)Speaker Verification Evaluations

Annual evaluation since 1995
Common paradigm for comparing technologies

24
NIST evaluations Results
25
Combining Speech Recognition and Speaker
Verification.

Speaker independent phone HMMs
Selection of segments or segment classes which
are speaker specific
Preliminary evaluations are performed on the NIST
extended data set (one hour of training data per
speaker)

26
ALISP data-driven speech segmentation
27
Searching in client and world speech dictionaries
for speaker verification purposes
28
Fusion
29
Fusion results
30
Speaking Faces Motivations

A person speaking in front of a camera offers 2
modalities for identity verification (speech and
face).
The sequence of face images and the
synchronisation of speech and lip movements could
be exploited.
Imposture is much more difficult than with single
modalities.
Many PCs, PDAs, mobile phones are equiped with a
camera. Audio-Visual Identity Verification will
offer non-intrusive security for e-commerce,
e-banking,

31
Talking Face Recognition(hybrid verification)
32
Lip features

Tracking lip movements

33
A talking face model

Using Hidden Markov Models (HMMs)

34
Morphing, avatars
35
Conclusions, Perspectives

Deliberate imposture is a challenge for speech
only systems
Verification of identity based on features
extracted from talking faces should be developped
Common databases and evaluation protocols are
necessary
Free access to reference systems will facilitate
future developments

Write a Comment

User Comments (0)

About PowerShow.com

Automatic Speaker Verification : Technologies, Evaluations and Possible Future PowerPoint PPT Presentation