Robust Multi-modal Person Identification with Tolerance of Facial Expression - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Robust Multi-modal Person Identification with Tolerance of Facial Expression

Description:

Niall Fox. Dr Richard Reilly. University College Dublin. Ireland ... Gray scale image is employed. Pre-processing: Histogram-equalisation, De-meaning ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 20
Provided by: prei198
Category:

less

Transcript and Presenter's Notes

Title: Robust Multi-modal Person Identification with Tolerance of Facial Expression


1
Robust Multi-modal Person Identification with
Tolerance of Facial Expression
Niall Fox Dr Richard Reilly
University College Dublin Ireland
2
Overview
  • Motivation
  • Analysis for Speech and Mouth Feature Experts
  • Results for Individual 2 Experts
  • Automatic Integration of Experts
  • Results of Integration
  • Conclusions
  •  

3
Motivation
  • Human Communication is multimodal
  • Benefits of using visual information
  • - Unaffected by acoustic noise
  • - Complementary to audio signal
  • - Audio and visual noise is uncorellated
  • - Increased robustness and accuracy

4
Audio-Visual Platform
Score
Modelling/
Integration
Scoring
5
Audio Expert
  • 20 ms Hamming window, 10 ms overlap
  • 16 static features
  • 15 Mel Frequency Cepstrum Coefficients (MFCC)
  • 1 Energy of each frame
  • 16 delta features

6
Mouth Features Expert
  • ROI Extraction
  • Gray scale image is employed
  • Pre-processing
  • Histogram-equalisation,
  • De-meaning
  • DCT Transform applied to ROI (Top 14 features
    selected)

7
Database
  • XM2VTS database
  • 295 subjects
  • 4 sessions (monthly spaced) of the sentence
  • Joe took fathers green shoe bench out

8
Person Identification Tests
  • Tested on 251 subjects from database of 295
  • Train models on monthly sessions 1, 2 and 3,
    Test on session 4
  • HMMs model audio and mouth features
  • AWGN was added to the audio
  • JPEG compression of video images

9
Audio Expert Scores
  • 97 at 48 dB, 37 at 21dB
  • Large roll off

10
Image Degradation Levels
  • Image frames
  • Mouth regions
  • 10 levels of JPEG compression

11
Mouth Features Expert Scores
  • 86 at GF 50, 48 at QF 2

12
Audio-Visual Platform
Score
Modelling/
Integration
Scoring
13
Expert Weightings
  • Weighted Likelihood Summation
  • Expert Reliability Measure
  • Automatically Choose Weight

14
Expert Weightings
  • Automatically choose weight

15
Fusion of Audio and Mouth Feature Experts
  • A 37 at 21dB, V 48 at QF 2, AV 72 at
    (21db, QF2)

16
Conclusions
  • AV system is robust to both audio and visual
    degradations
  • High performance of mouth region (85)
  • -Robust to facial expressions, occlusion.

Further work
  • Test other types of audio and visual degradations
  • XM2VTS DB High quality
  • Record real world data in office type scenario

17
XM2VTS Database
  • Controlled, uniform illumination
  • Constant visual background
  • Controlled acousitc background

18
UCD Recordings
  • Non-controlled, non-uniform illumination
  • Varying viusal background
  • Noisy acousitc background

19
Niall Fox Email niall_at_ee.ucd.ie Web
http//ee.ucd.ie/niall/
Dr Richard Reilly (richard.reilly_at_ucd.ie) DSP
Group, UCD, Dublin, Ireland
This work is supported by Enterprise Ireland
under the Informatics Research Initiative
Write a Comment
User Comments (0)
About PowerShow.com