Learning to Detect Objects in Images via a Sparse, Part-Based Representation PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Learning to Detect Objects in Images via a Sparse, Part-Based Representation


1
Learning to Detect Objects in Images via a
Sparse, Part-Based Representation
  • S. Agarwal, A. Awan and D. Roth
  • IEEE Transactions on Pattern Analysis and Machine
    Intelligence

Antón Escobedo cse252c
2
Outline
  • Introduction
  • Problem Specification
  • Related Work
  • Overview of the Approach
  • Evaluation
  • Experimental Results and Analysis
  • Conclusion and Future Scope

3
Introduction
  • Automatic detection of objects in images
  • Different objects belonging to the same category
    can vary
  • Successful object detection system
  • Proposed solution Sparse-Part based
    representation
  • Part-based representation is computationally
    efficient and has its roots in biological vision

4
Problem Specification
  • Input An image
  • Output A list of locations at which instances of
    the object class are detected in the image
  • The experiments are performed on images of side
    views of cars but can be applied to any object
    that consists of distinguishable parts arranged
    in a relatively fixed spatial configuration
  • The present problem is a detection problem
    rather than a simple classification problem

5
Previous Related Work
  • Raw Pixel Intensities
  • Global Image
  • Local features
  • Part Based Representations using hand labeled
    features

6
Algorithm Overview
  • Four Stages
  • Vocabulary Construction Building a vocabulary of
    parts that will represent objects
  • Image Representation Input images are
    represented in terms of binary feature vectors
  • Learning a Classifier Two target classes
    feature vector (object) and feature vector
    (nonobject)
  • Detection Hypothesis Using the Learned
    Classifier
  • Classifier activation map for the single-scale
    case
  • Classifier activation pyramid for multiscale
    cases

7
Vocabulary Construction
  • Extraction of interest points using Forstner
    interest operator
  • Experiments carried out on 50 representative
    images of size 100 x 40 pixels. A total of 400
    patches, each of size 13 x 13 pixels were
    extracted
  • To facilitate learning, a bottom-up clustering
    procedure was adopted where similarity was
    measured by normalized correlation
  • Similarity between two clusters C1 and C2 is
    finally measured by the average similarity
    between their respective patches

8
Vocabulary Construction
Forstner applied to sample image
Sample patches
Clusters from sample patches
9
Image Representation
  • For each patch q in an image, a similarity-based
    indexing is performed into the part vocabulary P
    using
  • For each highlighted patch q, the most similar
    vocabulary part P(q) is given by

10
Image Representation Feature Vector
  • Spatial relations among the parts detected in an
    image are defined in terms of distance (5 bins)
    and directions (8 ranges of 45 degrees each)
    giving 20 possible relations between 2 parts.
  • 2-6 parts per Positive Window
  • Each 100x40 training image is represented as a
    feature vector with 290 elements.
  • Pn(i) ith occurrence of a part of type n in the
    image (1n270 n is a particular part-cluster)
  • Rm(j)(Pn1, Pn2) jth occurrence of relation Rm
    between a part of type n1 and a part of type n2
    (1m20 m is a distance-direction combination)

11
Learning a Classifier
  • Train classifier using 1000 labeled images, each
    100 x 40 pixels in size
  • No synthetic training images
  • ve examples Various cars with varied
    backgrounds
  • - ve examples Natural scenes like buildings,
    roads
  • High dimensionality of feature vector 270 types,
    20 relations, repeats.
  • Use of Sparse Network of Winnows (SNoW) learning
    architecture.
  • Winnow to reduce in number until only the best
    are left

12
SNoW Sparse network of linear units over a
Boolean or real valued feature space
(activation)
Target Nodes
Edges are allocated dynamically
Input Layer Feature Layer
set of examples e (represented as a list of
active features)
13
SNoW Predicted target t for example e
Activation calculated by the summation for target
node t
? - O
Learning Algorithm Specific Sigmoid function
whose transition from an output close to 0 to an
output close to 1, centers around ?  .
14
SNoW Basic Learning Rules
  • Several weight update rules can be used update
    rules are variations of Winnow and Perceptron
  • Winnow update rule The number of examples
    required to learn a linear function grows
    linearly with the number of relevant features and
    only logarithmically with the total number of
    features.

15
A Training Example
2, 2, 2, 2
2, 2
2, 2, 2
1001, 1005, 1007
Update rule Winnow a
2, ß ½, ? 3.5
16
Detection Hypothesis using Learned Classifier
  • Classifier Activation Map for single scale
  • Neighborhood Suppresion Based on nonmaximum
    suppression.
  • Repeated Part Elimination Greedy algorithm, uses
    windows around highest activation points.

17
DetectionClassifier Activation Pyramid
  • Scale the input image a number of times to form a
    multi-scale image pyramid
  • Apply the learned classifier to fixed-size
    windows in each image in the pyramid
  • Form a three-dimensional classifier activation
    pyramid instead of the earlier two-dimensional
    classifier activation map.

18
Evaluation Criteria
  • Test Set I consists of 170 images containing 200
    cars of same size and is tested for single scale
    case. In this case for each car in the test
    images, the location of best 100 x 40 window
    containing the car is determined.
  • Test Set II consists of 108 images containing 139
    cars of different sizes and is tested for multi
    scale case. In this case for each car in the test
    images, the location and scale of the best 100 x
    40 window containing the car is determined.

19
Performance Measures
  • Goal is to maximize the number of correct
    detections and minimize the number of false
    detections.
  • One method for expressing the trade-off between
    correct and false detections is to use the
    receiver operating characteristics (ROC) curve.
    This curve plots the true positive rate vs. the
    false positive rate.
  • of true positive (TP)
  • True positive rate ----------------------------
    ----------------------
  • Total of positives in the data set
    (nP)
  • of false positive (FP)
  • False positive rate ---------------------------
    ----------------------
  • Total of negatives in the data
    set (nN)
  • This measures the accuracy of the system as
    a classifier rather than a detector.

20
Performance Measures (contd.)
  • We are really interested in knowing how many of
    the objects it detects (given by recall), and how
    often the detections it makes are false (given by
    1-precision). This trade-off is thus captured
    very accurately by (recall) vs. (1-precision)
    curve where
  • TP TP
  • Recall ------------- 1 Precision
    ---------------
  • nP TP FP
  • The threshold parameter that achieves the
    best trade-off between the two quantities is
    measured by the point of highest F-measure, where
  • 2 Recall Precision
  • F-measure ---------------------------
  • Recall Precision

21
Experimental Results
Activation Threshold Recall (R) TP/200 Precision (P) TP/(TPFP) F-measure 2RP/(RP)
0.40 84.5 54.69 66.40
0.85 76.5 77.66 77.08
0.9995 4.0 100 7.69
Single-scale detection with Neighborhood
Suppression Algorithm
Activation Threshold Recall (R) TP/200 Precision (P) TP/(TPFP) F-measure 2RP/(RP)
0.20 91.5 24.73 38.94
0.85 72.5 81.46 76.72
0.995 4.0 100 7.69
Single-scale detection with Repeated Part
Elimination Algorithm
22
Experimental Results (contd.)
Activation Threshold Recall (R) TP/139 Precision (P) TP/(TPFP) F-measure 2RP/(RP)
0.65 50.36 24.56 33.02
0.95 38.85 49.09 43.37
0.9999 2.88 100 5.59
Multi-scale detection with Neighborhood
Suppression Algorithm
Activation Threshold Recall (R) TP/139 Precision (P) TP/(TPFP) F-measure 2RP/(RP)
0.20 80.58 8.43 15.27
0.95 39.57 49.55 44.0
0.9999 2.88 100 5.59
Multi-scale detection with Repeated Part
Elimination Algorithm
23
Some Graphical Results
24
Analysis A. Performance of Interest Operator
25
Analysis B. Performance of Part Matching Process
26
Analysis C. Performance of Learned Classifier
27
Conclusion
  • Automatic vocabulary construction from sample
    images
  • Methodologies for object detection
  • Detector from Classifier
  • Standardizing evaluation criterion
  • Good for classification of objects with
    distinguishable parts

28
Questions?
Slides adapted from http//www.cs.uga.edu/ananda/
ML_Talk.ppt and http//l2r.cs.uiuc.edu/cogcomp/t
utorial/SNoW.ppt
Write a Comment
User Comments (0)
About PowerShow.com