Title: Automatic Speaker Verification : Technologies, Evaluations and Possible Future
1Automatic Speaker Verification Technologies,
Evaluationsand Possible Future
Biometrics in Current Security Environments
- Gérard CHOLLET
- CNRS-LTCI, GET-ENST
- chollet_at_tsi.enst.fr
2Outline
- State of affairs (tasks, security, forensic,)
- Speaker characteristics in the speech signal
- Automatic Speaker Verification
- Decision theory
- Text dependent / Text independent
- Imposture (occasional, dedicated)
- Voice transformations
- Audio-visual speaker verification
- Evaluations (algorithms, field tests, ergonomy,)
- Conclusions, Perspectives
3Why should a computer recognize who is speaking ?
- Protection of individual property (habitation,
bank account, personal data, messages, mobile
phone, PDA,...) - Limited access (secured areas, data bases)
- Personalization (only respond to its masters
voice) - Locate a particular person in an audio-visual
document (information retrieval) - Who is speaking in a meeting ?
- Is a suspect the criminal ? (forensic
applications)
4Tasks in Automatic Speaker Recognition
- Speaker verification (Voice Biometric)
- Are you really who you claim to be ?
- Identification (Speaker ID)
- Is this speech segment coming from a known
speaker ? - How large is the set of speakers (population of
the world) ? - Speaker detection, segmentation, indexing,
retrieval, tracking - Looking for recordings of a particular speaker
- Combining Speech and Speaker Recognition
- Adaptation to a new speaker, speaker typology
- Personalization in dialogue systems
-
5Applications
- Access Control
- Physical facilities, Computer networks, Websites
- Transaction Authentication
- Telephone banking, e-Commerce
- Speech data Management
- Voice messaging, Search engines
- Law Enforcement
- Forensics, Home incarceration
6Voice Biometric
- Avantages
- Often the only modality over the telephone,
- Low cost (microphone, A/D), Ubiquity
- Possible integration on a smart (SIM) card
- Natural bimodal fusion speaking face
- Disadvantages
- Lack of discretion
- Possibility of imitation and electronic imposture
- Lack of robustness to noise, distortion,
- Temporal drift
7Speaker Identity in Speech
- Differences in
- Vocal tract shapes and muscular control
- Fundamental frequency (typical values)
- 100 Hz (Male), 200 Hz (Female), 300 Hz (Child)
- Glottal waveform
- Phonotactics
- Lexical usage
- The differences between Voices of Twins is a
limit case - Voices can also be imitated or disguised
8(No Transcript)
9Acoutic features
- Short term spectral analysis
10Intra- and Inter-speaker variability
11Speaker Verification
- Typology of approaches (EAGLES Handbook)
- Text dependent
- Public password
- Private password
- Customized password
- Text prompted
- Text independent
- Incremental enrolment
- Evaluation
12History of Speaker Recognition
13Current approaches
14HMM structure depends on the application
15Gaussian Mixture Model
- Parametric representation of the probability
distribution of observations
16Gaussian Mixture Models
8 Gaussians per mixture
17Decision theory for identity verification
- Two types of errors
- False rejection (a client is rejected)
- False acceptation (an impostor is accepted)
- Decision theory given an observation O and a
claimed identity - H0 hypothesis it comes from an impostor
- H1 hypothesis it comes from our client
- H1 is chosen if and only if P(H1O) gt P(H0O)
- which could be rewritten (using Bayes law) as
18Signal detection theory
19Decision
20Distribution of scores
21Detection Error Tradeoff (DET) Curve
22Evaluation
- Decision cost (FA, FR, priors, costs,)
- Receiver Operating Characteristic Curve
- Reference systems (open software)
- Evaluations (algorithms, field trials, ergonomy,)
23National Institute of Standards Technology
(NIST)Speaker Verification Evaluations
- Annual evaluation since 1995
- Common paradigm for comparing technologies
24NIST evaluations Results
25Combining Speech Recognition and Speaker
Verification.
- Speaker independent phone HMMs
- Selection of segments or segment classes which
are speaker specific - Preliminary evaluations are performed on the NIST
extended data set (one hour of training data per
speaker)
26ALISP data-driven speech segmentation
27Searching in client and world speech dictionaries
for speaker verification purposes
28Fusion
29Fusion results
30Speaking Faces Motivations
- A person speaking in front of a camera offers 2
modalities for identity verification (speech and
face). - The sequence of face images and the
synchronisation of speech and lip movements could
be exploited. - Imposture is much more difficult than with single
modalities. - Many PCs, PDAs, mobile phones are equiped with a
camera. Audio-Visual Identity Verification will
offer non-intrusive security for e-commerce,
e-banking,
31Talking Face Recognition(hybrid verification)
32Lip features
33A talking face model
- Using Hidden Markov Models (HMMs)
34Morphing, avatars
35Conclusions, Perspectives
- Deliberate imposture is a challenge for speech
only systems - Verification of identity based on features
extracted from talking faces should be developped - Common databases and evaluation protocols are
necessary - Free access to reference systems will facilitate
future developments