Title: Robust Multi-modal Person Identification with Tolerance of Facial Expression
1Robust Multi-modal Person Identification with
Tolerance of Facial Expression
Niall Fox Dr Richard Reilly
University College Dublin Ireland
2Overview
- Motivation
- Analysis for Speech and Mouth Feature Experts
- Results for Individual 2 Experts
- Automatic Integration of Experts
- Results of Integration
- Conclusions
3Motivation
- Human Communication is multimodal
- Benefits of using visual information
- - Unaffected by acoustic noise
- - Complementary to audio signal
- - Audio and visual noise is uncorellated
- - Increased robustness and accuracy
4Audio-Visual Platform
Score
Modelling/
Integration
Scoring
5Audio Expert
- 20 ms Hamming window, 10 ms overlap
- 16 static features
- 15 Mel Frequency Cepstrum Coefficients (MFCC)
- 1 Energy of each frame
- 16 delta features
6Mouth Features Expert
- ROI Extraction
- Gray scale image is employed
- Pre-processing
- Histogram-equalisation,
- De-meaning
- DCT Transform applied to ROI (Top 14 features
selected)
7Database
- XM2VTS database
- 295 subjects
- 4 sessions (monthly spaced) of the sentence
- Joe took fathers green shoe bench out
8Person Identification Tests
- Tested on 251 subjects from database of 295
- Train models on monthly sessions 1, 2 and 3,
Test on session 4 - HMMs model audio and mouth features
- AWGN was added to the audio
- JPEG compression of video images
9Audio Expert Scores
10Image Degradation Levels
- 10 levels of JPEG compression
11Mouth Features Expert Scores
12Audio-Visual Platform
Score
Modelling/
Integration
Scoring
13Expert Weightings
- Weighted Likelihood Summation
- Expert Reliability Measure
- Automatically Choose Weight
14Expert Weightings
- Automatically choose weight
15Fusion of Audio and Mouth Feature Experts
- A 37 at 21dB, V 48 at QF 2, AV 72 at
(21db, QF2)
16Conclusions
- AV system is robust to both audio and visual
degradations - High performance of mouth region (85)
- -Robust to facial expressions, occlusion.
Further work
- Test other types of audio and visual degradations
- XM2VTS DB High quality
- Record real world data in office type scenario
17XM2VTS Database
- Controlled, uniform illumination
- Constant visual background
- Controlled acousitc background
18UCD Recordings
- Non-controlled, non-uniform illumination
- Varying viusal background
- Noisy acousitc background
19Niall Fox Email niall_at_ee.ucd.ie Web
http//ee.ucd.ie/niall/
Dr Richard Reilly (richard.reilly_at_ucd.ie) DSP
Group, UCD, Dublin, Ireland
This work is supported by Enterprise Ireland
under the Informatics Research Initiative