A Robust Real Time Face Detection - PowerPoint PPT Presentation

About This Presentation
Title:

A Robust Real Time Face Detection

Description:

Face Detection (not face recognition) Face Detection in ... We analyze faces in a specific location' Robust Real-Time Face Detection. Viola and Jones, 2003 ... – PowerPoint PPT presentation

Number of Views:346
Avg rating:3.0/5.0
Slides: 56
Provided by: ari6
Category:
Tags: detection | face | real | robust | time

less

Transcript and Presenter's Notes

Title: A Robust Real Time Face Detection


1
A Robust Real Time Face Detection
2
Outline
  • AdaBoost Learning Algorithm
  • Face Detection in real life
  • Using AdaBoost for Face Detection
  • Improvements
  • Demonstration

3
AdaBoost
  • A short Introduction to Boosting (Freund
    Schapire, 1999)
  • Logistic Regression, AdaBoost and Bregman
    Distances (Collins, Schapire, Singer, 2002)

4
Boosting
  • The Horse-Racing Gambler Problem
  • Rules of thumb for a set of races
  • How should we choose the set of races in order to
    get the best rules of thumb?
  • How should the rules be combined into a single
    highly accurate prediction rule?
  • Boosting !

5
AdaBoost - the idea
  • AdaBoost agglomerates many weak classifiers into
    one strong classifier.
  • Initialize sample weights
  • For each cycle
  • Find a classifier that performs well on the
    weighted sample
  • Increase weights of misclassified examples
  • Return a weighted list of classifiers

IQ
Shoe size
Shoe size
6
AdaBoost - algorithm
7
AdaBoost training error
  • Freund and Schapire (1997) proved that
  • AdaBoost ADApts to the error rates of the
    individual weak hypotheses.

8
AdaBoost generalization error
  • Freund and Schapire (1997) showed that

9
AdaBoost generalization error
  • The analysis implies that boosting will overfit
    if run for too many rounds
  • However, it was observed empirically that
    AdaBoost does not overfit, even when run
    thousands of rounds.
  • Moreover, it was observed that the generalization
    error continues to drive down long after training
    error reached zero

10
AdaBoost generalization error
  • An alternative analysis was presented by Schapire
    et al. (1998), that suits the empirical findings

11
AdaBoost different point of view
  • We try to solve the problem of approximating the
    ys using a linear combination of weak hypotheses
  • In other words, we are interested in the problem
    of finding a vector of parameters a such that
    is a good approximation of yi
  • For classification problems we try to match the
    sign of f(xi) to yi

12
AdaBoost different point of view
  • Sometimes it is advantageous to minimize some
    other (non-negative) loss function instead of the
    number of classification errors
  • For AdaBoost the loss function is
  • This point of view was used by Collins, Schapire
    and Singer (2002) to demonstrate that AdaBoost
    converges to optimality

13
Face Detection (not face recognition)
14
Face Detection in Monkeys
  • There are cells that detect faces

15
Face Detection in Human
  • There are processes of face detection

16
Faces Are Special
  • We analyze faces in a different way

17
Faces Are Special
  • We analyze faces in a different way

18
Faces Are Special
We analyze faces in a different way
19
Face Recognition in Human
  • We analyze faces in a specific location

20
Robust Real-Time Face Detection
  • Viola and Jones, 2003

21
Features
  • Picture analysis, Integral Image

22
Features
  • The system classifies images based on the value
    of simple features

Two-rectangle
Value ? (pixels in white area) - ? (pixels in
black area)
Three-rectangle
Four-rectangle
23
Contrast Features
Source
Result
24
Features
  • Notice that each feature is related to a special
    location in the sub-window
  • Why features and not pixels?
  • Encode domain knowledge
  • Feature based system operates faster
  • Inspiration from human V1

25
Features
  • Later we will see that there are other features
    that can be used to implement an efficient face
    detector
  • The original system of Viola and Jones used only
    rectangle features

26
Computing Features
  • Given a detection resolution of 24x24, and size
    of 200x200, the set of rectangle features is
    160,000 !
  • We need to find a way to rapidly compute the
    features

27
Integral Image
  • Intermediate representation of the image
  • Computed in one pass over the original image

28
Integral Image
Using the integral image representation one can
compute the value of any rectangular sum in
constant time. For example the integral sum
inside rectangle D we can compute as ii(4)
ii(1) ii(2) ii(3)
29
Integral Image
Integral Image
30
Building a Detector
  • Cascading, training a cascade

31
Main Ideas
  • The Features will be used as weak classifiers
  • We will concatenate several detectors serially
    into a cascade
  • We will boost (using a version of AdaBoost) a
    number of features to get good enough detectors

32
Main Ideas
  • The Features will be used as weak classifiers
  • We will concatenate several detectors serially
    into a cascade
  • We will boost (using a version of AdaBoost) a
    number of features to get good enough detectors

33
Weak Classifiers
  • Weak Classifier A feature which best separates
    the examples
  • Given a sub-window (x), a feature (f), a
    threshold (T), and a polarity (p) indicating the
    direction of the inequality

34
Weak Classifiers
  • A weak classifier is a combination of a feature
    and a threshold
  • We have K features
  • We have N thresholds where N is the number of
    examples
  • Thus there are KN weak classifiers

35
Weak Classifier Selection
  • For each feature sort the examples based on
    feature value
  • For each element evaluate the total sum of
    positive/negative example weights (T/T-) and the
    sum of positive/negative weights below the
    current example (S/S-)
  • The error for a threshold which splits the range
    between the current and previous example in the
    sorted list is

36
An example

e B A S- S T- T W f y x
2/5 3/5 2/5 0 0 2/5 3/5 1/5 2 -1 X1
1/5 4/5 1/5 1/5 0 2/5 3/5 1/5 3 -1 X2
0 5/5 0 2/5 0 2/5 3/5 1/5 5 1 X3
1/5 4/5 1/5 2/5 1/5 2/5 3/5 1/5 7 1 X4
2/5 3/5 2/5 2/5 2/5 2/5 3/5 1/5 8 1 X5
37
Main Ideas
  • The Features will be used as weak classifiers
  • We will concatenate several detectors serially
    into a cascade
  • We will boost (using a version of AdaBoost) a
    number of features to get good enough detectors

38
Main Ideas
  • The Features will be used as weak classifiers
  • We will concatenate several detectors serially
    into a cascade
  • We will boost (using a version of AdaBoost) a
    number of features to get good enough detectors

39
Cascading
  • We start with simple classifiers which reject
    many of the negative sub-windows while detecting
    almost all positive sub-windows
  • Positive results from the first classifier
    triggers the evaluation of a second (more
    complex) classifier, and so on
  • A negative outcome at any point leads to the
    immediate rejection of the sub-window

40
Cascading
41
Main Ideas
  • The Features will be used as weak classifiers
  • We will concatenate several detectors serially
    into a cascade
  • We will boost (using a version of AdaBoost) a
    number of features to get good enough detectors

42
Main Ideas
  • The Features will be used as weak classifiers
  • We will concatenate several detectors serially
    into a cascade
  • We will boost (using a version of AdaBoost) a
    number of features to get good enough detectors

43
Training a cascade
  • User selects values for
  • Maximum acceptable false positive rate per layer
  • Minimum acceptable detection rate per layer
  • Target overall false positive rate
  • User gives a set of positive and negative examples

44
Training a cascade (cont.)
  • While the overall false positive rate is not met
  • While the false positive rate of current layer is
    less than the maximum per layer
  • Train a classifier with n features using AdaBoost
    on set of positive and negative examples
  • Decrease threshold for current classifier
    detection rate of the layer is more than the
    minimum
  • Evaluate current cascade classifier on validation
    set
  • Evaluate current cascade detector on a set of non
    faces images and put any false detections into
    the negative training set

45
Results
46
Training Data Set
  • 4916 hand labeled faces
  • Aligned to base resolution (24x24)
  • Non faces for first layer were collected from
    9500 non faces images
  • Non faces for subsequent layers were obtained by
    scanning the partial cascade across non faces and
    collecting false positives (max 6000 for each
    layer)

47
Structure of the Detector
  • 38 layer cascade
  • 6060 features

48
Speed of final Detector
  • On a 700Mhz Pentium III processor, the face
    detector can process a 384 by 288 pixel image in
    about .067 seconds

49
Improvements
  • Learning Object Detection from a Small Number of
    Examples the Importance of Good Features (Levy
    Weiss, 2004)

50
Improvements
  • Performance depends crucially on the features
    that are used to represent the objects (Levy
    Weiss, 2004)
  • Good Features imply
  • Good results from small training databases
  • Better generalization abilities
  • Shorter (faster) classifiers

51
Edge Orientation Histogram
  • Invariant to global illumination changes
  • Captures geometric properties of faces
  • Domain knowledge represented
  • Inner part of the face includes more horizontal
    edges then vertical
  • The ration between vertical and horizontal edges
    is bounded
  • The area of the eyes includes mainly horizontal
    edges
  • The chin has more or less the same number of
    oblique edges on both sides

52
Edge Orientation Histogram
  • The EOH can be calculated using some kind of
    Integral Image
  • We find the gradients at the point (x,y) using
    Sobel masks
  • We calculate the orientation of the edge (x,y)
  • We divide the edges into K bins
  • The result is stored in K matrices
  • We use the same idea of Integral Image for the
    matrices

53
EOH Features
  • The ratio between two orientations
  • The dominance of a given orientation
  • Symmetry Features

54
Results
  • Already with only 250 positive examples we can
    see above 90 detection rate
  • Faster classifier
  • Better performance in profile faces

55
DemoImplementing Viola Jones systemFrank
Fritze, 2004
Write a Comment
User Comments (0)
About PowerShow.com