Neural NetworkBased Face Detection - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Neural NetworkBased Face Detection

Description:

Detection and false-positive rates of individual networks are quite close ... Test Set 1 examines the false-positive rate. Test Set 2 examines the angular sensitivity ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 30

Provided by: Ros112

Category:

more less

Transcript and Presenter's Notes

Title: Neural NetworkBased Face Detection

1
Neural Network-Based Face Detection

Henry A. Rowley, Shumeet Baluja, and Takeo Kanade
January 1998

Presented by Roscoe Cook UCSD ECE285 Jan 30, 2002
2
Overview

Upright frontal face detection using multiple
neural networks
Each neural network examines windows of varying
size and location, deciding if each contains a
face or not
The system then arbitrates between the results of
each neural network to improve results

3
Stage 1 Neural Network-Based Filter

Receives a preprocessed 20x20 pixel set
(subsampled if necessary)
Outputs a value ranging from -1 to 1
Window sizes are incremented by a scale factor of
1.2
Image examined using each window size at every
pixel

4
Preprocessing

Goal Compensate for differences in camera input
gains and improve contrast
Fit a linear function to an oval region inside
the window and subtract it from the image
Histogram equalization nonlinearly map intensity
values to expand the intensity range

5
Preprocessing
6
(No Transcript)
7
Neural Network Architecture

Three types of hidden units
4 look at 10x10 pixel subregions
16 look at 5x5 pixel subregions
6 look at 20x5 pixel horizontal stripes
Horizontal stripes are useful for finding
features such as a mouth or pair of eyes
Square subregions are useful for finding
individual features, such as an eye or nose

8
(No Transcript)
9
Training

Positive Test Set
Normalized face examples
Negative Test Set
Generated during training using a bootstrap method

10
Training Face Examples

1,050 face examples gathered from face databases
at CMU and Harvard and from the World Wide Web
Various sizes, orientations, positions, and
intensities
Eyes, tip of nose, and corners and center of
mouth labeled manually
Labeling used to normalize each face to the same
scale, orientation, and position
Normalization maps each face to a 20x20 pixel
window
Fifteen faces for the training set generated from
each original image by randomly rotating the
images up to 10, scaling between 90 and 110,
translating up to half a pixel, and mirroring

11
Training Non-face examples

Non-face examples collected during training
Initial non-face set 1000 randomly generated,
then preprocessed images
Train to output 1 for face and -1 for non-face
inputs
Run the system on an image of scenery which
contains no faces, collecting subimages which the
network incorrectly identifies as a face (output
gt 0)
Select up to 250 of these images at random, apply
preprocessing, and add into training set as
negative examples
Go to step 2.

12
Stage 2 Merging Overlapping Detections and
Arbitration

Most faces are detected at multiple nearby
positions or scales
False detections are less consistent
Heuristics eliminate many false detections
Spatial thresholding collapse multiple
detections
Overlap elimination when detections overlap,
keep only the most dominant

13
Initial Detection Results
14
Arbitration Between Multiple Networks

Detection and false-positive rates of individual
networks are quite close
Individual networks have different biases and
make different errors (because of self-selection
of negative training examples)
This allows for improved results by combining the
results of the individual networks

15
Arbitration Strategies

Simple logic strategies
ANDing
ORing
Voting
Neural Network strategies
Input the number of detections in a 3x3 region
that each face-detecting neural net found
Output the decision of whether or not there is a
face at the center of the 3x3 region

16
Results

Sensitivity Analysis
Which parts of the face is the detector most
sensitive to?
Testing 2 test sets
Test Set 1 examines the false-positive rate
Test Set 2 examines the angular sensitivity

17
Sensitivity Analysis

Goal Find which parts of the face are most
important for detection
Divide the 20x20 pixel input images into 100 2x2
pixel region
For every 2x2 region of every image in a positive
test set, replace the region with random noise
and input it into the neural network
The resulting RMS error of the network on the
test set is an indication of how important that
portion of the image is for detection

18
Sensitivity Analysis Results
The networks rely most heavily on the eyes, then
the nose, then the mouth.
19
Testing

Two test sets
Set 1 130 images from CMU
Sources web, photographs, newspapers, TV
broadcast
Contains 507 frontal faces.
Wide variety of complex backgrounds
Useful in measuring false-detection rates
Set 2 From FERET Database.
One face per image
Uniform background and good lighting
Taken from a variety of angles
Useful in measuring angular sensitivity

20
Detection Threshold Analysis

Output values range from -1 to 1
Zero used as threshold for training
Changing threshold varies how conservative the
systems is
Tradeoff false-positives vs. missed faces
Detection and false-positive rates measured while
varying the threshold

21
(No Transcript)
22
Detection and Error Rates for Test Set 1
23
(No Transcript)
24
Function Legend

threshold (distance, threshold) Only accept a
detection if there are at least threshold
detections within a cube (extending along x, y,
and scale) in the detection pyramid surrounding
the detection. The size of the cube is determined
by distance, which is the number of a pixels from
the center of the cube to its edge (in either
position or scale).
overlap elimination It is possible that a set of
detections erroneously indicate that faces are
overlapping with one another. This heuristic
examines detections in order (from those having
the most votes within a small neighborhood to
those having the least), and. removing
conflicting overlaps as it goes.
voting(distance), AND(distance), OR(distance)
These heuristics are used for arbitrating among
multiple networks. They take a distance
parameter, similar to that used by the threshold
heuristic, which indicates how close detections
from individual networks must be to one another
to be counted as occurring at the same location
and scale. A distance of zero indicates that the
detections must occur at precisely the same
location and scale. Voting requires two out of
three networks to detect a face, AND requires two
out of two, and OR requires one out of two to
signal a detection.
network arbitration(architecture) The results
from three detection networks are fed into an
arbitration network. The parameter specifies the
network architecture used a simple perceptron, a
network with a hidden layer of5 fully connected
hidden units, or a network with two hidden
layers of 5 fully connected hidden units each,
with additional connections from the first hidden
layer to the output.

25
Example Output
26
Improving Speed

Applying two networks to a 320x240 pixel image
(246,766 windows) on a 200 MHz R4400 SGI Indigo 2
takes approximately 383 seconds. (computational
cost of arbitration is negligible less than one
second)
Increasing invariance to translation will allow
for less windows to be processed

27
Fast Detection

When training, allow the face to be offset as
much a 5 pixels in any direction
Increase window size to 30x30 pixels to ensure
that entire face falls within window
The center of the face will fall within a 10x10
window
The detector can be moved in steps of 10 pixels

28
Fast Method

Algorithm runs much faster
Many more false-positives produced
Detections used as candidates for the original
20x20 pixel method
10x10 pixel regions surrounding all candidates
are scanned
Heuristics overlap removal, ANDing
Processing time on same machine 7.2 sec.
Restriction based on skin tones also increases
speed

29
Comparison to Other Systems
Tested on a 23 image subset from test set 1
Performance is generally comparable or somewhat
improved, in comparison.
30
Conclusion

Detects between 77.9 and 90.3 percent of faces
from a database of 130 faces with unconstrained
backgrounds while maintaining an acceptable rate
of false detections
Can be adjusted to be more or less conservative,
depending on application
A fast version can process a 320x240 image in two
to four seconds on a 200 MHz R4400 SGI Indigo 2

Write a Comment

User Comments (0)