Title: A Robust Real Time Face Detection
1A Robust Real Time Face Detection
2Outline
- AdaBoost Learning Algorithm
- Face Detection in real life
- Using AdaBoost for Face Detection
- Improvements
- Demonstration
3AdaBoost
- A short Introduction to Boosting (Freund
Schapire, 1999) - Logistic Regression, AdaBoost and Bregman
Distances (Collins, Schapire, Singer, 2002)
4Boosting
- The Horse-Racing Gambler Problem
- Rules of thumb for a set of races
- How should we choose the set of races in order to
get the best rules of thumb? - How should the rules be combined into a single
highly accurate prediction rule? - Boosting !
5 6AdaBoost - the idea
- AdaBoost agglomerates many weak classifiers into
one strong classifier.
- Initialize sample weights
- For each cycle
- Find a classifier that performs well on the
weighted sample - Increase weights of misclassified examples
- Return a weighted list of classifiers
IQ
Shoe size
Shoe size
7AdaBoost - algorithm
distribution
example
decision
step
8AdaBoost training error
- Freund and Schapire (1997) proved that
- AdaBoost adapts to the error rates of the
individual weak hypotheses. - Therefore it is called ADABoost.
9AdaBoost generalization error
- Freund and Schapire (1997) showed that
10AdaBoost generalization error
- The analysis implies that boosting will overfit
if the algorithm is run for too many rounds - However, it was observed empirically that
AdaBoost does not overfit, - even when run thousands of rounds.
- Moreover, it was observed that the generalization
error continues to drive down long after training
error reached zero
11AdaBoost generalization error
- An alternative analysis was presented by Schapire
et al. (1998), that suits the empirical findings
12AdaBoost different point of view
- We try to solve the problem of approximating the
ys using a linear combination of weak hypotheses - In other words, we are interested in the problem
of finding a vector of parameters a such that
is a good approximation of yi - For classification problems we try to match the
sign of f(xi) to yi
13AdaBoost different point of view
- Sometimes it is advantageous to minimize some
other (non-negative) loss function instead of the
number of classification errors - For AdaBoost the loss function is
- This point of view was used by Collins, Schapire
and Singer (2002) to demonstrate that AdaBoost
converges to optimality
14Face Detection (not face recognition)
15Face Detection in Monkeys
- There are cells that detect faces
16Face Detection in Human
- There are processes of face detection
17Faces Are Special
- We humans analyze faces in a different way
18Faces Are Special
- We humans analyze faces in a different way
19Faces Are Special
We analyze faces in a different way
20Face Recognition in Human
- We analyze faces in a specific location
21Robust Real-Time Face Detection
22Features
- Picture analysis, Integral Image
23Features
- The system classifies images based on the value
of simple features
Two-rectangle
Value ? (pixels in white area) - ? (pixels in
black area)
Three-rectangle
Four-rectangle
24Contrast Features
Source
Result
Notice that each feature is related to a special
location in the sub-window
25Features
- Notice that each feature is related to a special
location in the sub-window - Why features and not pixels?
- Encode domain knowledge
- Feature based system operates faster
- Inspiration from human vision
26Features
- Later we will see that there are other features
that can be used to implement an efficient face
detector - The original system of Viola and Jones used only
rectangle features
27Computing Features
- Given a detection resolution of 24x24, and size
of 200x200, the set of rectangle features is
160,000 ! - We need to find a way to rapidly compute the
features
28Integral Image
- Intermediate representation of the image
- Computed in one pass over the original image
-
29Integral Image
Using the integral image representation one can
compute the value of any rectangular sum in
constant time. For example the integral sum
inside rectangle D we can compute as ii(4)
ii(1) ii(2) ii(3)
30Integral Image
Integral Image
31Building a Detector
- Cascading, training a cascade
32Main Ideas
- The Features will be used as weak classifiers
- We will concatenate several detectors serially
into a cascade - We will boost (using a version of AdaBoost) a
number of features to get good enough detectors
33Weak Classifiers
- Weak Classifier A feature which best separates
the examples - Given a sub-window (x), a feature (f), a
threshold (T), and a polarity (p) indicating the
direction of the inequality
Probability for this threshold
34Weak Classifiers
- A weak classifier is a combination of a feature
and a threshold - We have K features
- We have N thresholds where N is the number of
examples - Thus there are KN weak classifiers
35Weak Classifier Selection
- For each feature sort the examples based on
feature value - For each element evaluate the total sum of
positive/negative example weights (T/T-) and the
sum of positive/negative weights below the
current example (S/S-) - The error for a threshold which splits the range
between the current and previous example in the
sorted list is -
36An example
positive/negative example weights
the sum of positive/negative weights below the
current example
For e calculation
weight
Feature value
e B A S- S T- T W f y x
2/5 3/5 2/5 0 0 2/5 3/5 1/5 2 -1 X1
1/5 4/5 1/5 1/5 0 2/5 3/5 1/5 3 -1 X2
0 5/5 0 2/5 0 2/5 3/5 1/5 5 1 X3
1/5 4/5 1/5 2/5 1/5 2/5 3/5 1/5 7 1 X4
2/5 3/5 2/5 2/5 2/5 2/5 3/5 1/5 8 1 X5
Error min(A,B)
examples
decision
37Main Ideas Cascading
- The Features will be used as weak classifiers
- We will concatenate several detectors serially
into a cascade - We will boost (using a version of AdaBoost) a
number of features to get good enough detectors
38Cascading
- We start with simple classifiers which reject
many of the negative sub-windows while detecting
almost all positive sub-windows - Positive results from the first classifier
triggers the evaluation of a second (more
complex) classifier, and so on - A negative outcome at any point leads to the
immediate rejection of the sub-window
39Cascading
40Main Ideas Boosting
- The Features will be used as weak classifiers
- We will concatenate several detectors serially
into a cascade - We will boost (using a version of AdaBoost) a
number of features to get good enough detectors
41Training a cascade
- User selects values for
- Maximum acceptable false positive rate per layer
- Minimum acceptable detection rate per layer
- Target overall false positive rate
- User gives a set of positive and negative examples
42Training a cascade (cont.)
- While the overall false positive rate is not met
- While the false positive rate of current layer is
less than the maximum per layer - Train a classifier with n features using AdaBoost
on a set of positive and negative examples - Decrease threshold when the current classifier
detection rate of the layer is more than the
minimum - Evaluate current cascade classifier on validation
set - Evaluate current cascade detector on a set of non
faces images and put any false detections into
the negative training set
43Results
44Training Data Set
- 4916 hand labeled faces
- Aligned to base resolution (24x24)
- Non faces for first layer were collected from
9500 non faces images - Non faces for subsequent layers were obtained by
scanning the partial cascade across non faces and
collecting false positives (max 6000 for each
layer)
45Structure of the Detector
- 38 layer cascade
- 6060 features
46Speed of final Detector
- On a 700Mhz Pentium III processor, the face
detector can process a 384 by 288 pixel image in
about .067 seconds
47Improvements
- Learning Object Detection from a Small Number of
Examples the Importance of Good Features (Levy
Weiss, 2004)
48Improvements
- Performance depends crucially on the features
that are used to represent the objects (Levy
Weiss, 2004) - Good Features imply
- Good results from small training databases
- Better generalization abilities
- Shorter (faster) classifiers
49Edge Orientation Histogram
- Invariant to global illumination changes
- Captures geometric properties of faces
- Domain knowledge represented
- Inner part of the face includes more horizontal
edges then vertical - The ration between vertical and horizontal edges
is bounded - The area of the eyes includes mainly horizontal
edges - The chin has more or less the same number of
oblique edges on both sides
50Edge Orientation Histogram
- Called EOH
- The EOH can be calculated using some kind of
Integral Image - We find the gradients at the point (x,y) using
Sobel masks - We calculate the orientation of the edge (x,y)
- We divide the edges into K bins
- The result is stored in K matrices
- We use the same idea of Integral Image for the
matrices
51EOH Features
- The ratio between two orientations
- The dominance of a given orientation
- Symmetry Features
52Results
- Already with only 250 positive examples we can
see above 90 detection rate - Faster classifier
- Better performance in profile faces
53DemoImplementing Viola Jones systemFrank
Fritze, 2004