Neural NetworkBased Face Detection - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Neural NetworkBased Face Detection

Description:

Detection and false-positive rates of individual networks are quite close ... Test Set 1 examines the false-positive rate. Test Set 2 examines the angular sensitivity ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 30
Provided by: Ros112
Category:

less

Transcript and Presenter's Notes

Title: Neural NetworkBased Face Detection


1
Neural Network-Based Face Detection
  • Henry A. Rowley, Shumeet Baluja, and Takeo Kanade
  • January 1998

Presented by Roscoe Cook UCSD ECE285 Jan 30, 2002
2
Overview
  • Upright frontal face detection using multiple
    neural networks
  • Each neural network examines windows of varying
    size and location, deciding if each contains a
    face or not
  • The system then arbitrates between the results of
    each neural network to improve results

3
Stage 1 Neural Network-Based Filter
  • Receives a preprocessed 20x20 pixel set
    (subsampled if necessary)
  • Outputs a value ranging from -1 to 1
  • Window sizes are incremented by a scale factor of
    1.2
  • Image examined using each window size at every
    pixel

4
Preprocessing
  • Goal Compensate for differences in camera input
    gains and improve contrast
  • Fit a linear function to an oval region inside
    the window and subtract it from the image
  • Histogram equalization nonlinearly map intensity
    values to expand the intensity range

5
Preprocessing
6
(No Transcript)
7
Neural Network Architecture
  • Three types of hidden units
  • 4 look at 10x10 pixel subregions
  • 16 look at 5x5 pixel subregions
  • 6 look at 20x5 pixel horizontal stripes
  • Horizontal stripes are useful for finding
    features such as a mouth or pair of eyes
  • Square subregions are useful for finding
    individual features, such as an eye or nose

8
(No Transcript)
9
Training
  • Positive Test Set
  • Normalized face examples
  • Negative Test Set
  • Generated during training using a bootstrap method

10
Training Face Examples
  • 1,050 face examples gathered from face databases
    at CMU and Harvard and from the World Wide Web
  • Various sizes, orientations, positions, and
    intensities
  • Eyes, tip of nose, and corners and center of
    mouth labeled manually
  • Labeling used to normalize each face to the same
    scale, orientation, and position
  • Normalization maps each face to a 20x20 pixel
    window
  • Fifteen faces for the training set generated from
    each original image by randomly rotating the
    images up to 10, scaling between 90 and 110,
    translating up to half a pixel, and mirroring

11
Training Non-face examples
  • Non-face examples collected during training
  • Initial non-face set 1000 randomly generated,
    then preprocessed images
  • Train to output 1 for face and -1 for non-face
    inputs
  • Run the system on an image of scenery which
    contains no faces, collecting subimages which the
    network incorrectly identifies as a face (output
    gt 0)
  • Select up to 250 of these images at random, apply
    preprocessing, and add into training set as
    negative examples
  • Go to step 2.

12
Stage 2 Merging Overlapping Detections and
Arbitration
  • Most faces are detected at multiple nearby
    positions or scales
  • False detections are less consistent
  • Heuristics eliminate many false detections
  • Spatial thresholding collapse multiple
    detections
  • Overlap elimination when detections overlap,
    keep only the most dominant

13
Initial Detection Results
14
Arbitration Between Multiple Networks
  • Detection and false-positive rates of individual
    networks are quite close
  • Individual networks have different biases and
    make different errors (because of self-selection
    of negative training examples)
  • This allows for improved results by combining the
    results of the individual networks

15
Arbitration Strategies
  • Simple logic strategies
  • ANDing
  • ORing
  • Voting
  • Neural Network strategies
  • Input the number of detections in a 3x3 region
    that each face-detecting neural net found
  • Output the decision of whether or not there is a
    face at the center of the 3x3 region

16
Results
  • Sensitivity Analysis
  • Which parts of the face is the detector most
    sensitive to?
  • Testing 2 test sets
  • Test Set 1 examines the false-positive rate
  • Test Set 2 examines the angular sensitivity

17
Sensitivity Analysis
  • Goal Find which parts of the face are most
    important for detection
  • Divide the 20x20 pixel input images into 100 2x2
    pixel region
  • For every 2x2 region of every image in a positive
    test set, replace the region with random noise
    and input it into the neural network
  • The resulting RMS error of the network on the
    test set is an indication of how important that
    portion of the image is for detection

18
Sensitivity Analysis Results
The networks rely most heavily on the eyes, then
the nose, then the mouth.
19
Testing
  • Two test sets
  • Set 1 130 images from CMU
  • Sources web, photographs, newspapers, TV
    broadcast
  • Contains 507 frontal faces.
  • Wide variety of complex backgrounds
  • Useful in measuring false-detection rates
  • Set 2 From FERET Database.
  • One face per image
  • Uniform background and good lighting
  • Taken from a variety of angles
  • Useful in measuring angular sensitivity

20
Detection Threshold Analysis
  • Output values range from -1 to 1
  • Zero used as threshold for training
  • Changing threshold varies how conservative the
    systems is
  • Tradeoff false-positives vs. missed faces
  • Detection and false-positive rates measured while
    varying the threshold

21
(No Transcript)
22
Detection and Error Rates for Test Set 1
23
(No Transcript)
24
Function Legend
  • threshold (distance, threshold) Only accept a
    detection if there are at least threshold
    detections within a cube (extending along x, y,
    and scale) in the detection pyramid surrounding
    the detection. The size of the cube is determined
    by distance, which is the number of a pixels from
    the center of the cube to its edge (in either
    position or scale).
  • overlap elimination It is possible that a set of
    detections erroneously indicate that faces are
    overlapping with one another. This heuristic
    examines detections in order (from those having
    the most votes within a small neighborhood to
    those having the least), and. removing
    conflicting overlaps as it goes.
  • voting(distance), AND(distance), OR(distance)
    These heuristics are used for arbitrating among
    multiple networks. They take a distance
    parameter, similar to that used by the threshold
    heuristic, which indicates how close detections
    from individual networks must be to one another
    to be counted as occurring at the same location
    and scale. A distance of zero indicates that the
    detections must occur at precisely the same
    location and scale. Voting requires two out of
    three networks to detect a face, AND requires two
    out of two, and OR requires one out of two to
    signal a detection.
  • network arbitration(architecture) The results
    from three detection networks are fed into an
    arbitration network. The parameter specifies the
    network architecture used a simple perceptron, a
    network with a hidden layer of5 fully connected
    hidden units, or a network with two hidden
    layers of 5 fully connected hidden units each,
    with additional connections from the first hidden
    layer to the output.

25
Example Output
26
Improving Speed
  • Applying two networks to a 320x240 pixel image
    (246,766 windows) on a 200 MHz R4400 SGI Indigo 2
    takes approximately 383 seconds. (computational
    cost of arbitration is negligible less than one
    second)
  • Increasing invariance to translation will allow
    for less windows to be processed

27
Fast Detection
  • When training, allow the face to be offset as
    much a 5 pixels in any direction
  • Increase window size to 30x30 pixels to ensure
    that entire face falls within window
  • The center of the face will fall within a 10x10
    window
  • The detector can be moved in steps of 10 pixels

28
Fast Method
  • Algorithm runs much faster
  • Many more false-positives produced
  • Detections used as candidates for the original
    20x20 pixel method
  • 10x10 pixel regions surrounding all candidates
    are scanned
  • Heuristics overlap removal, ANDing
  • Processing time on same machine 7.2 sec.
  • Restriction based on skin tones also increases
    speed

29
Comparison to Other Systems
Tested on a 23 image subset from test set 1
Performance is generally comparable or somewhat
improved, in comparison.
30
Conclusion
  • Detects between 77.9 and 90.3 percent of faces
    from a database of 130 faces with unconstrained
    backgrounds while maintaining an acceptable rate
    of false detections
  • Can be adjusted to be more or less conservative,
    depending on application
  • A fast version can process a 320x240 image in two
    to four seconds on a 200 MHz R4400 SGI Indigo 2
Write a Comment
User Comments (0)
About PowerShow.com