Title: Fusion
1Fusion
Gérard CHOLLETchollet_at_tsi.enst.fr GET-ENST/CNRS-
LTCI46 rue Barrault75634 PARIS cedex
13http//www.tsi.enst.fr/chollet
2Plan
- Motivations, Applications
- Reconnaissance de formes
- Multi-capteurs
- Rehaussement du signal
- Parametres
- Scores
- Decisions
- Conclusions
- Perspectives
3Introduction
- Reconnaissance des formes
- Pourquoi fusionner ?
- Que fusionner ?
- Des signaux issus de capteurs divers,
- Des parametres mesures sur ces signaux,
- Des scores calculés par des classificateurs,
- Des decisions prises par des classificateurs
- Comment fusionner ?
4Reconnaissance de formes
5Fusion de signaux
- Nombre de capteurs
- Types de capteurs
- Identiques ?
- Nombre de sources
- Exemples
- Réseaux de microphones
- Stérovision
- Seïsmographe
-
6Fusion de paramètres
- Issus dun seul capteur
- Issus de plusieurs capteurs
- Modèles multi-flux
- Exemples
- Reconnaissance de la parole
- Réseaux bayésiens
7Fusion de scores
8Fusion de décisions
9Vector Quantization (VQ)
SOONG, ROSENBERG 1987
10Hidden Markov Models (HMM)
ROSENBERG 1990, TSENG 1992
11Ergodic HMM
PORITZ 1982, SAVIC 1990
12Gaussian Mixture Models (GMM)
REYNOLDS 1995
13HMM structure depends on the application
14Gaussian Mixture Model
- Parametric representation of the probability
distribution of observations
15Gaussian Mixture Models
8 Gaussians per mixture
16Support Vector Machines and Speaker
Verification
- Hybrid GMM-SVM system is proposed
- SVM scoring model trained on development data to
classify true-target speakers access and
impostors access, using new feature
representation based on GMMs
17SVM principles
18Results
19Combining Speech Recognition and Speaker
Verification.
- Speaker independent phone HMMs
- Selection of segments or segment classes which
are speaker specific - Preliminary evaluations are performed on the NIST
extended data set (one hour of training data per
speaker) - Some developments were done during a 6 weeks
workshop (SuperSID) during summer 2002
20SuperSID experiments
21GMM with cepstral features
22Selection of nasals in words in -ing
being everything getting anything thing something
things going
23Fusion
24Fusion results
25Audio-Visual Identity Verification
- A person speaking in front of a camera offers 2
modalities for identity verification (speech and
face). - The sequence of face images and the
synchronisation of speech and lip movements could
be exploited. - Imposture is much more difficult than with single
modalities. - Many PCs, PDAs, mobile phones are equiped with a
camera. Audio-Visual Identity Verification will
offer non-intrusive security for e-commerce,
e-banking,
26Examples of Speaking Faces
Sequence of digits (PIN code)
Free text
27Fusion of Speech and Face
(from thesis of Conrad Sanderson, aug. 2002)
28An illustration
Insecure Network
Distant server
- Access to private data
- Secured transactions
- Acquisition of biometric signals for each
modality - Scores are computed for each modality
- Fusion of scores and decision
29Conclusions and Perspectives
- Speech is often the only usable biometric
modality (over the telephone network). - Interactive Voice Servers may use both text
dependent and text independent approaches for
improved verification accuracy. - Evaluation campaigns and research workshops are
efficient means to stimulate progress. - Most PCs, PDAs and Mobile Phones will be equipped
with cameras. Audio-Visual Identity Verification
should find applications in e-Banking,
e-Commerce, .