A Robust Real Time Face Detection - PowerPoint PPT Presentation

About This Presentation

Title:

A Robust Real Time Face Detection

Description:

A Robust Real Time Face Detection Outline AdaBoost Learning Algorithm Face Detection in real life Using AdaBoost for Face Detection Improvements Demonstration ... – PowerPoint PPT presentation

Number of Views:215

Avg rating:3.0/5.0

Slides: 54

Provided by: webCecsP

Learn more at: http://web.cecs.pdx.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Robust Real Time Face Detection

1
A Robust Real Time Face Detection
2
Outline

AdaBoost Learning Algorithm
Face Detection in real life
Using AdaBoost for Face Detection
Improvements
Demonstration

3
AdaBoost

A short Introduction to Boosting (Freund
Schapire, 1999)
Logistic Regression, AdaBoost and Bregman
Distances (Collins, Schapire, Singer, 2002)

4
Boosting

The Horse-Racing Gambler Problem
Rules of thumb for a set of races
How should we choose the set of races in order to
get the best rules of thumb?
How should the rules be combined into a single
highly accurate prediction rule?
Boosting !

Boosting

6
AdaBoost - the idea

AdaBoost agglomerates many weak classifiers into
one strong classifier.

Initialize sample weights
For each cycle
Find a classifier that performs well on the
weighted sample
Increase weights of misclassified examples
Return a weighted list of classifiers

IQ
Shoe size
Shoe size
7
AdaBoost - algorithm
distribution
example
decision
step
8
AdaBoost training error

Freund and Schapire (1997) proved that
AdaBoost adapts to the error rates of the
individual weak hypotheses.
Therefore it is called ADABoost.

9
AdaBoost generalization error

Freund and Schapire (1997) showed that

10
AdaBoost generalization error

The analysis implies that boosting will overfit
if the algorithm is run for too many rounds
However, it was observed empirically that
AdaBoost does not overfit,
even when run thousands of rounds.
Moreover, it was observed that the generalization
error continues to drive down long after training
error reached zero

11
AdaBoost generalization error

An alternative analysis was presented by Schapire
et al. (1998), that suits the empirical findings

12
AdaBoost different point of view

We try to solve the problem of approximating the
ys using a linear combination of weak hypotheses
In other words, we are interested in the problem
of finding a vector of parameters a such that
is a good approximation of yi
For classification problems we try to match the
sign of f(xi) to yi

13
AdaBoost different point of view

Sometimes it is advantageous to minimize some
other (non-negative) loss function instead of the
number of classification errors
For AdaBoost the loss function is
This point of view was used by Collins, Schapire
and Singer (2002) to demonstrate that AdaBoost
converges to optimality

14
Face Detection (not face recognition)
15
Face Detection in Monkeys

There are cells that detect faces

16
Face Detection in Human

There are processes of face detection

17
Faces Are Special

We humans analyze faces in a different way

18
Faces Are Special

We humans analyze faces in a different way

19
Faces Are Special
We analyze faces in a different way
20
Face Recognition in Human

We analyze faces in a specific location

21
Robust Real-Time Face Detection

Viola and Jones, 2003

22
Features

Picture analysis, Integral Image

23
Features

The system classifies images based on the value
of simple features

Two-rectangle
Value ? (pixels in white area) - ? (pixels in
black area)
Three-rectangle
Four-rectangle
24
Contrast Features
Source
Result
Notice that each feature is related to a special
location in the sub-window
25
Features

Notice that each feature is related to a special
location in the sub-window
Why features and not pixels?
Encode domain knowledge
Feature based system operates faster
Inspiration from human vision

26
Features

Later we will see that there are other features
that can be used to implement an efficient face
detector
The original system of Viola and Jones used only
rectangle features

27
Computing Features

Given a detection resolution of 24x24, and size
of 200x200, the set of rectangle features is
160,000 !
We need to find a way to rapidly compute the
features

28
Integral Image

Intermediate representation of the image
Computed in one pass over the original image

29
Integral Image
Using the integral image representation one can
compute the value of any rectangular sum in
constant time. For example the integral sum
inside rectangle D we can compute as ii(4)
ii(1) ii(2) ii(3)
30
Integral Image
Integral Image
31
Building a Detector

Cascading, training a cascade

32
Main Ideas

The Features will be used as weak classifiers
We will concatenate several detectors serially
into a cascade
We will boost (using a version of AdaBoost) a
number of features to get good enough detectors

33
Weak Classifiers

Weak Classifier A feature which best separates
the examples
Given a sub-window (x), a feature (f), a
threshold (T), and a polarity (p) indicating the
direction of the inequality

Probability for this threshold
34
Weak Classifiers

A weak classifier is a combination of a feature
and a threshold
We have K features
We have N thresholds where N is the number of
examples
Thus there are KN weak classifiers

35
Weak Classifier Selection

For each feature sort the examples based on
feature value
For each element evaluate the total sum of
positive/negative example weights (T/T-) and the
sum of positive/negative weights below the
current example (S/S-)
The error for a threshold which splits the range
between the current and previous example in the
sorted list is

36
An example
positive/negative example weights
the sum of positive/negative weights below the
current example
For e calculation
weight
Feature value

e B A S- S T- T W f y x
2/5 3/5 2/5 0 0 2/5 3/5 1/5 2 -1 X1
1/5 4/5 1/5 1/5 0 2/5 3/5 1/5 3 -1 X2
0 5/5 0 2/5 0 2/5 3/5 1/5 5 1 X3
1/5 4/5 1/5 2/5 1/5 2/5 3/5 1/5 7 1 X4
2/5 3/5 2/5 2/5 2/5 2/5 3/5 1/5 8 1 X5
Error min(A,B)
examples
decision
37
Main Ideas Cascading

The Features will be used as weak classifiers
We will concatenate several detectors serially
into a cascade
We will boost (using a version of AdaBoost) a
number of features to get good enough detectors

38
Cascading

We start with simple classifiers which reject
many of the negative sub-windows while detecting
almost all positive sub-windows
Positive results from the first classifier
triggers the evaluation of a second (more
complex) classifier, and so on
A negative outcome at any point leads to the
immediate rejection of the sub-window

39
Cascading
40
Main Ideas Boosting

The Features will be used as weak classifiers
We will concatenate several detectors serially
into a cascade
We will boost (using a version of AdaBoost) a
number of features to get good enough detectors

41
Training a cascade

User selects values for
Maximum acceptable false positive rate per layer
Minimum acceptable detection rate per layer
Target overall false positive rate
User gives a set of positive and negative examples

42
Training a cascade (cont.)

While the overall false positive rate is not met
While the false positive rate of current layer is
less than the maximum per layer
Train a classifier with n features using AdaBoost
on a set of positive and negative examples
Decrease threshold when the current classifier
detection rate of the layer is more than the
minimum
Evaluate current cascade classifier on validation
set
Evaluate current cascade detector on a set of non
faces images and put any false detections into
the negative training set

43
Results
44
Training Data Set

4916 hand labeled faces
Aligned to base resolution (24x24)
Non faces for first layer were collected from
9500 non faces images
Non faces for subsequent layers were obtained by
scanning the partial cascade across non faces and
collecting false positives (max 6000 for each
layer)

45
Structure of the Detector

38 layer cascade
6060 features

46
Speed of final Detector

On a 700Mhz Pentium III processor, the face
detector can process a 384 by 288 pixel image in
about .067 seconds

47
Improvements

Learning Object Detection from a Small Number of
Examples the Importance of Good Features (Levy
Weiss, 2004)

48
Improvements

Performance depends crucially on the features
that are used to represent the objects (Levy
Weiss, 2004)
Good Features imply
Good results from small training databases
Better generalization abilities
Shorter (faster) classifiers

49
Edge Orientation Histogram