Learning to Detect Objects in Images via a Sparse, Part-Based Representation presentation

About This Presentation

Transcript and Presenter's Notes

Title: Learning to Detect Objects in Images via a Sparse, Part-Based Representation

1
Learning to Detect Objects in Images via a
Sparse, Part-Based Representation

S. Agarwal, A. Awan and D. Roth
IEEE Transactions on Pattern Analysis and Machine
Intelligence

Antón Escobedo cse252c
2
Outline

Introduction
Problem Specification
Related Work
Overview of the Approach
Evaluation
Experimental Results and Analysis
Conclusion and Future Scope

3
Introduction

Automatic detection of objects in images
Different objects belonging to the same category
can vary
Successful object detection system
Proposed solution Sparse-Part based
representation
Part-based representation is computationally
efficient and has its roots in biological vision

4
Problem Specification

Input An image
Output A list of locations at which instances of
the object class are detected in the image
The experiments are performed on images of side
views of cars but can be applied to any object
that consists of distinguishable parts arranged
in a relatively fixed spatial configuration
The present problem is a detection problem
rather than a simple classification problem

5
Previous Related Work

Raw Pixel Intensities
Global Image
Local features
Part Based Representations using hand labeled
features

6
Algorithm Overview

Four Stages
Vocabulary Construction Building a vocabulary of
parts that will represent objects
Image Representation Input images are
represented in terms of binary feature vectors
Learning a Classifier Two target classes
feature vector (object) and feature vector
(nonobject)
Detection Hypothesis Using the Learned
Classifier
Classifier activation map for the single-scale
case
Classifier activation pyramid for multiscale
cases

7
Vocabulary Construction

Extraction of interest points using Forstner
interest operator
Experiments carried out on 50 representative
images of size 100 x 40 pixels. A total of 400
patches, each of size 13 x 13 pixels were
extracted
To facilitate learning, a bottom-up clustering
procedure was adopted where similarity was
measured by normalized correlation
Similarity between two clusters C1 and C2 is
finally measured by the average similarity
between their respective patches

8
Vocabulary Construction
Forstner applied to sample image
Sample patches
Clusters from sample patches
9
Image Representation

For each patch q in an image, a similarity-based
indexing is performed into the part vocabulary P
using
For each highlighted patch q, the most similar
vocabulary part P(q) is given by

10
Image Representation Feature Vector

Spatial relations among the parts detected in an
image are defined in terms of distance (5 bins)
and directions (8 ranges of 45 degrees each)
giving 20 possible relations between 2 parts.
2-6 parts per Positive Window
Each 100x40 training image is represented as a
feature vector with 290 elements.
Pn(i) ith occurrence of a part of type n in the
image (1n270 n is a particular part-cluster)
Rm(j)(Pn1, Pn2) jth occurrence of relation Rm
between a part of type n1 and a part of type n2
(1m20 m is a distance-direction combination)

11
Learning a Classifier

Train classifier using 1000 labeled images, each
100 x 40 pixels in size
No synthetic training images
ve examples Various cars with varied
backgrounds
- ve examples Natural scenes like buildings,
roads
High dimensionality of feature vector 270 types,
20 relations, repeats.
Use of Sparse Network of Winnows (SNoW) learning
architecture.
Winnow to reduce in number until only the best
are left

12
SNoW Sparse network of linear units over a
Boolean or real valued feature space
(activation)
Target Nodes
Edges are allocated dynamically
Input Layer Feature Layer
set of examples e (represented as a list of
active features)
13
SNoW Predicted target t for example e
Activation calculated by the summation for target
node t
? - O
Learning Algorithm Specific Sigmoid function
whose transition from an output close to 0 to an
output close to 1, centers around ? .
14
SNoW Basic Learning Rules

Several weight update rules can be used update
rules are variations of Winnow and Perceptron
Winnow update rule The number of examples
required to learn a linear function grows
linearly with the number of relevant features and
only logarithmically with the total number of
features.

15
A Training Example
2, 2, 2, 2
2, 2
2, 2, 2
1001, 1005, 1007
Update rule Winnow a
2, ß ½, ? 3.5
16
Detection Hypothesis using Learned Classifier

Classifier Activation Map for single scale
Neighborhood Suppresion Based on nonmaximum
suppression.
Repeated Part Elimination Greedy algorithm, uses
windows around highest activation points.

17
DetectionClassifier Activation Pyramid

Scale the input image a number of times to form a
multi-scale image pyramid
Apply the learned classifier to fixed-size
windows in each image in the pyramid
Form a three-dimensional classifier activation
pyramid instead of the earlier two-dimensional
classifier activation map.

18
Evaluation Criteria

Test Set I consists of 170 images containing 200
cars of same size and is tested for single scale
case. In this case for each car in the test
images, the location of best 100 x 40 window
containing the car is determined.
Test Set II consists of 108 images containing 139
cars of different sizes and is tested for multi
scale case. In this case for each car in the test
images, the location and scale of the best 100 x
40 window containing the car is determined.

19
Performance Measures

Goal is to maximize the number of correct
detections and minimize the number of false
detections.
One method for expressing the trade-off between
correct and false detections is to use the
receiver operating characteristics (ROC) curve.
This curve plots the true positive rate vs. the
false positive rate.
of true positive (TP)
True positive rate ----------------------------
----------------------
Total of positives in the data set
(nP)
of false positive (FP)
False positive rate ---------------------------
----------------------
Total of negatives in the data
set (nN)
This measures the accuracy of the system as
a classifier rather than a detector.

20
Performance Measures (contd.)

We are really interested in knowing how many of
the objects it detects (given by recall), and how
often the detections it makes are false (given by
1-precision). This trade-off is thus captured
very accurately by (recall) vs. (1-precision)
curve where
TP TP
Recall ------------- 1 Precision
---------------
nP TP FP
The threshold parameter that achieves the
best trade-off between the two quantities is
measured by the point of highest F-measure, where
2 Recall Precision
F-measure ---------------------------
Recall Precision

21
Experimental Results
Activation Threshold Recall (R) TP/200 Precision (P) TP/(TPFP) F-measure 2RP/(RP)
0.40 84.5 54.69 66.40
0.85 76.5 77.66 77.08
0.9995 4.0 100 7.69
Single-scale detection with Neighborhood
Suppression Algorithm
Activation Threshold Recall (R) TP/200 Precision (P) TP/(TPFP) F-measure 2RP/(RP)
0.20 91.5 24.73 38.94
0.85 72.5 81.46 76.72
0.995 4.0 100 7.69
Single-scale detection with Repeated Part
Elimination Algorithm
22
Experimental Results (contd.)
Activation Threshold Recall (R) TP/139 Precision (P) TP/(TPFP) F-measure 2RP/(RP)
0.65 50.36 24.56 33.02
0.95 38.85 49.09 43.37
0.9999 2.88 100 5.59
Multi-scale detection with Neighborhood
Suppression Algorithm
Activation Threshold Recall (R) TP/139 Precision (P) TP/(TPFP) F-measure 2RP/(RP)
0.20 80.58 8.43 15.27
0.95 39.57 49.55 44.0
0.9999 2.88 100 5.59
Multi-scale detection with Repeated Part
Elimination Algorithm
23
Some Graphical Results
24
Analysis A. Performance of Interest Operator
25
Analysis B. Performance of Part Matching Process
26
Analysis C. Performance of Learned Classifier
27
Conclusion

Automatic vocabulary construction from sample
images
Methodologies for object detection
Detector from Classifier
Standardizing evaluation criterion
Good for classification of objects with
distinguishable parts

28
Questions?
Slides adapted from http//www.cs.uga.edu/ananda/
ML_Talk.ppt and http//l2r.cs.uiuc.edu/cogcomp/t
utorial/SNoW.ppt

Write a Comment

User Comments (0)

About PowerShow.com

Learning to Detect Objects in Images via a Sparse, Part-Based Representation PowerPoint PPT Presentation