Title: Performance Evaluation Measures for Face Detection Algorithms
1Performance Evaluation Measures for Face
Detection Algorithms
- Prag Sharma, Richard B. Reilly
- DSP Research Group,
- Department of Electronic and Electrical
Engineering, - University College Dublin, Ireland.
2Aim
- To highlight the lack of standard performance
evaluation measures for face detection purposes. - To propose a method for the evaluation and
comparison of existing face detection algorithms
in an unbiased manner. - To apply the proposed method on an existing face
detection algorithm.
3Face Detection Applications and Challenges Posed
4Need for Face Detection
- Face Recognition
- Intelligent Vision-based Human Computer
Interaction - Object-based Video Processing
- Content-based functionalities
- Improved Coding Efficiency
- Improved Error-Robustness
- Content Description
5Challenges Associated with Face Detection
- Pose Estimation and Orientation
- Presence or Absence of Structural Component
- Facial Expressions and Occlusion
- Imaging Conditions
6Performance Evaluation Measures
7Need for Standard Performance Evaluation Measures
- Main reason for advancement of research by
comparison and testing. - In order to obtain an impartial and empirical
evaluation and comparison of any two methods, it
is important to consider the following points - Use of a standard and representative test set for
evaluation. - Use of standard terminology for the presentation
of results.
8Standard and Representative Test Set for
Evaluation
9Use of Standard Terminology
- Lack of standard terminology to describe results
leads to difficulty in comparing algorithms. - Eg, while one algorithm may consider a successful
detection if the bounding box contains eyes and
mouth, another may require the entire face
(including forehead and hair) to be enclosed in a
bounding box for a positive result.
Successful face detection by (a) Rowley et al.
(b) Hsu et al.
(a)
(b)
10Use of Standard Terminology
- Lack of standard terminology to describe results
leads to difficulty in comparing algorithms. - Moreover, there may be differences in the
definition of a face (e.g., cartoon, hand-drawn
or human faces).
11Use of Standard Terminology
- Therefore, first step towards a standard
evaluation protocol is to answer the following
questions - What is a face?
- What constitutes successful face detection?
12Use of Standard Terminology
- What is a face?
- Several databases contain human faces, animal
faces, cartoon faces, line-drawn faces, frontal
and profile view faces. - MIT-23 contains 23 images with 149 faces.
- MIT20 contains only 20 images with 136 faces
(excluding hand-drawn and cartoon faces). - CMU Rowley established ground truth for 483
faces in this database based on excluding some of
the occluded faces and non-human faces.
Therefore, total number of faces in a database
can vary for different algorithms!!
13Use of Standard Terminology
- To eliminate this problem!!
- Use only standard databases that come with
clearly marked faces in terms of cartoon/human,
pose, orientation, occlusion and presence or
absence of structural components such as glasses
or sunglasses. - Previous work in this has led to the development
of the UCD Colour Face Image Database. Each face
in the database is marked using clearly defined
terms. http//dsp.ucd.ie/prag - This eliminates any misinterpretation between
pose variations, orientation etc. by different
researchers as a fixed number for cartoon faces,
hand-drawn faces and faces in different poses and
orientations is provided with the database.
14Use of Standard Terminology
- What constitutes successful face detection?
- Most face detection algorithms do not clearly
define a successful face detection process. - A uniform criterion should be adopted to define a
successful detection.
- Test image.
- Possible face detection results to be classified
as face or non-face.
(a)
(b)
15Use of Standard Terminology
- What constitutes successful face detection?
- Criterion adopted by Rowley the center of the
detected bounding box must be within four pixels
and the scale must be within a factor of 1.2
(their scale step size) of ground truth (recorded
manually). - Face detection results should be presented in
such a manner so that the interpretation of
results is open for specific applications. - Graphical representation number of faces vs.
percentage overlap - Use database that comes with hand-segmented
results outlining each face, e.g. UCD Colour Face
Image Database
Therefore, a correct face detection is one in
which the bounding box includes the visible eyes
and the mouth region and the overlap region
between the hand-segmented results and the
detection result is greater than a fixed
threshold (the threshold dependent on the
application).
16Use of Standard Terminology
- What constitutes successful face detection?
- Use of standard terminology in describing
results. - Detection rate The number of faces correctly
detected to the number of faces determined by a
human expert (hand-segmented results). - False positives This is when an image region
is declared to be a face but it is not. - False negatives This is when an image region
that is a face is not detected at all. - False detections False detections False
positives False negatives.
17Use of Standard Terminology
- What constitutes successful face detection?
- For methods that require training
- The number and variety of training examples
have a direct effect on the classification
performance. - The training and execution time varies for
different algorithms. - Most of these systems can often be tested at
different threshold values to balance the
detection rate and the number of false positives.
18Use of Standard Terminology
- What constitutes successful face detection?
- To standardize these variability
- Training should be complete on a different
dataset prior to testing. - The number and variety of training examples
should be left to the algorithm developer - The training and execution time should always
be mentioned for all algorithms that require
training. - All methods should present results in terms of
an ROC curve.
19Overall Procedure
- Employ a colour face detection database that
comes with hand-segmented results in the form of
eyes and mouth coordinates along with segmented
face regions. - The face database should also contain details of
the faces in standard terminology of pose,
orientation, occlusion and presence of structural
components along with the type of faces
(hand-draw, cartoon etc.). - Clearly define the type of faces the algorithm
can detect.
20Overall Procedure
- For algorithms that require training, the
training should be completed prior to testing
using face recognition databases for the face
class and the boot-strap training technique for
the non-face class. - All results should be presented in the form of
two graphical plots. The ROC curves should be
used to show the correct detection/false-positives
trade-off while the "number of faces vs.
percentage overlap" should be presented for
determining correct face detection. - All results should also present the training and
execution times for comparison.
21Presentation of Results
- The above procedure is implemented for the
performance evaluation of a previously developed
face detection algorithm as follows - The colour face detection database chosen is the
HHI MPEG-7 image database. - The algorithm developed does not require any
training before execution. - The results are presented in terms of number of
faces vs. percentage overlap for the HHI MPEG-7
database (see figure). - Since there is no adjustable variable
threshold, the ROC curve is not presented. - The execution time is 3.54 seconds/image on a
Pentium III processor.
22Presentation of Results
The graph shows that there are 13 faces with no
overlap (i.e. false detections) and 43 faces with
over 85 overlap with the hand segmented results.
23Conclusions
- This paper highlights the problems associated
with evaluating and comparing the performance of
new and existing face detection methods in an
unbiased manner. - A solution in the form of a standard procedure
for the evaluation and presentation of results
has been presented. - The evaluation procedure described in this paper
concentrates on using standard terminology along
with carefully labelled face databases for
evaluation purposes. - The method also recommends that results should be
presented graphically the ROC curves to show the
correct detection/false-positives trade-off while
the "number of faces vs. percentage overlap" to
determine correct face detection accuracy.
24Questions??