Title: SemiSupervised Training for AppearanceBased Statistical Object Detection Methods
1Semi-Supervised Trainingfor Appearance-Based
Statistical Object Detection Methods
- Charles Rosenberg
- Thesis Oral
- May 10, 2004
- Thesis Committee
- Martial Hebert, co-chair
- Sebastian Thrun, co-chair
- Henry Schneiderman
- Avrim Blum
- Tom Minka, Microsoft Research
2Motivation Object Detection
Example eye detections from the Schneiderman
detector.
- Modern object detection systems work.
- Lots of manually labeled training data required.
- How can we reduce the cost of training data?
3Approach Semi-Supervised Training
- Supervised training costly fully labeled data
- Semi-Supervised training fully and weakly
labeled data. - Goal Develop semi-supervised approach for the
object detection problem and characterize issues.
4What is Semi-Supervised Training?
- Supervised Training
- Standard training approach
- Training with fully labeled data
- Semi-Supervised Training
- Training with a combination of fully labeled data
and unlabeled or weakly labeled data - Weakly Labeled Data
- Certain label values unknown
- E.g. object is present, but location and scale
unknown - Labeling is relatively cheap
- Unlabeled Data
- No label information known
5Issues for Object Detection
- What semi-supervised approaches are applicable?
- Ability to handle object detection problem
uniqueness. - Compatibility with existing detector
implementations. - What are the practical concerns?
- Object detector interactions
- Training data issues
- Detector parameter settings
- What kind of performance gain possible?
- How much labeled training data is needed?
6Contributions
- Devised approach which achieves substantial
performance gains through semi-supervised
training. - Comprehensive evaluation of semi-supervised
training applied to object detection. - Detailed characterization and comparison of
semi-supervised approaches used.
7Presentation Outline
- Introduction
- Background
- Semi-supervised Training Approach
- Analysis Filter Based Detector
- Analysis Schneiderman Detector
- Conclusions and Future Work
8What is Unique About Object Detection?
- Complex feature set
- high dimensional, continuous with a complex
distribution - Large inherent variation
- lighting, viewpoint, scale, location, etc.
- Many examples per training image
- many negative examples and a very small
number of positive examples. - Negative examples are free.
- Large class overlap
- the object class is a subset
of the clutter class
9Background
- Graph-Based Approaches
- Graph is constructed to represent the labeled and
unlabeled data relationships construction
method important. - Edges in the graph are weighted according to
distance measure. - Blum, Chawla, ICML 2001. Szummer, Jaakkola, NIPS
2001. Zhu, Ghahramani, Lafferty, ICML 2003. - Information Regularization
- explicit about information transferred from P(X)
to P(YX) - Szummer, Jaakkola, NIPS 2002 Corduneanu,
Jaakkola, UAI 2003. - Multiple Instance Learning
- Addresses multiple examples per data element
- Dietterich, Lathrop, Lozano-Perez, AI 97. Maron,
Lozano-Perez, NIPS 1998. Zhang, Goldman, NIPS
2001. - Transduction, other methods
10Presentation Outline
- Introduction
- Background
- Semi-supervised Training Approach
- Analysis Filter Based Detector
- Analysis Schneiderman Detector
- Conclusions and Future Work
11Semi-Supervised Training Approaches
- Expectation-Maximization (EM)
- Batch Algorithm
- All data processed each iteration
- Soft Class Assignments
- Likelihood distribution over class labels
- Distribution recomputed each iteration
- Self-Training
- Incremental Algorithm
- Data added to active pool at iteration
- Hard Class Assignments
- Most likely class assigned
- Labels do not change once assigned
12Semi-Supervised Training with EM
- Dempster, Laird, Rubin, 1977.
- Nigam, McCallum, Thrun, Mitchell. 1999.
Train initial detector model with initial labeled
data set.
- Repeat for a fixed number of iterations
- or until convergence.
Run detector on weakly labeled set and compute
most likely detection.
Compute expected statistics of fully labeled
examples and weakly labeled examples weighted by
class likelihoods.
Update the parameters of the detection model.
Maximization Step
Expectation step
13Semi-Supervised Training with Self-Training
Train detector model with the labeled data set.
- Repeat until weakly labeled data exhausted for
until some other stopping criterion.
Run detector on weakly labeled set and compute
most likely detection.
Select the m best scoring examples and add them
to the labeled training set.
Score each detection with the selection metric.
Nigam, Ghani, 2000. Moreno, Agaarwal, ICML 2003
14Self-Training Selection Metrics
- Detector Confidence
- Score detection confidence
- Intuitively appealing
- Can prove problematic in practice
- Nearest Neighbor (NN) Distance
- Score minimum distance between detection and
labeled examples
15Selection Metric Behavior
Confidence Metric
Nearest-Neighbor (NN) Metric
class 1
class 2
unlabeled
16Selection Metric Behavior
Confidence Metric
Nearest-Neighbor (NN) Metric
class 1
class 2
unlabeled
17Selection Metric Behavior
Confidence Metric
Nearest-Neighbor (NN) Metric
class 1
class 2
unlabeled
18Selection Metric Behavior
Confidence Metric
Nearest-Neighbor (NN) Metric
class 1
class 2
unlabeled
19Selection Metric Behavior
Confidence Metric
Nearest-Neighbor (NN) Metric
class 1
class 2
unlabeled
20Selection Metric Behavior
Confidence Metric
Nearest-Neighbor (NN) Metric
class 1
class 2
unlabeled
21Semi-Supervised Training Computer Vision
- EM Approaches
- S. Baluja. Probabilistic Modeling for Face
Orientation Discrimination Learning from Labeled
and Unlabeled Data. NIPS 1998. - R. Fergus, P. Perona, A. Zisserman. Object Class
Recognition by Unsupervised Scale-Invariant
Learning. CVPR 2003. - Self Training
- A. Selinger. Minimally Supervised Acquisition of
3D Recognition Models from Cluttered Images. CVPR
2001. - Summary
- Reasonable performance improvements reported
- One of experiments
- No insight into issues or general application.
22Presentation Outline
- Introduction
- Background
- Semi-supervised Training Approach
- Analysis Filter Based Detector
- Analysis Schneiderman Detector
- Conclusions and Future Work
23Filter Based Detector
Clutter GMM
Input Image
Object GMM
xi
Filter Bank
Feature Vector
Gaussian Mixture Models
fi
MoMc
24Filter Based Detector Overview
- Input Features and Model
- Features output of 20 filters at each pixel
location - Generative Model separate Gaussian Mixture
Model for object and clutter class - A single model is used for all locations on the
object - Detection
- Compute filter responses and likelihood under the
object and clutter models at each pixel location - Spatial Model used to aggregate pixel responses
into object level responses
25Spatial Model
Training Images
Object Masks
Spatial Model
Example Detection
Log Likelihood Ratio
Log Likelihood Ratio
26Typical Example Filter Model Detections
Sample Detection Plots
Log Likelihood Ratio Plots
27Filter Based Detector Overview
- Fully Supervised Training
- fully labeled example image pixel mask
- Gaussian Mixture Model parameters trained
- Spatial model trained from pixel masks
- Semi-Supervised Training
- weakly labeled example image with the object
- Initial model is trained using the fully labeled
object and clutter data - The spatial model and clutter class model are
fixed once trained with the initial labeled data
set. - EM and self-training variants are evaluated
28Self-Training Selection Metrics
- Confidence based selection metric
- selection is detector odds ratio
- Nearest neighbor (NN) selection metric
- selection is distance to closest labeled example
- distance is based on a model of each weakly
labeled example
29Filter Based Experiment Details
- Training Data
- 12 images desktop telephone clutter, view
points /- 90 degrees - roughly constant scale and lighting conditions
- 96 images clutter only
- Experimental variations
- 12 repetitions with different fully / weakly
training data splits - Testing data
- 12 images, disjoint set, similar imaging
conditions
Correct Detection
Incorrect Detection
30Example Filter Model Results
Labeled Data Only
Expectation-Maximization
Self-Training Confidence Metric
Self-Training NN Metric
31Single Image Semi-Supervised Results
Expect-Max 19.2
Confidence Metric 34.2
1-NN Selection Metric 47.5
32Two Image Semi-Supervised Results
Close
Far
Near
Labeled Data Only Near Pair 52.5
4-NN Metric Near Pair 85.8
33Presentation Outline
- Introduction
- Background
- Semi-supervised Training Approach
- Analysis Filter Based Detector
- Analysis Schneiderman Detector
- Conclusions and Future Work
34Example Schneiderman Face Detections
35Schneiderman Detector Details
Schneiderman 98,00,03,04
Detection Process
Wavelet Transform
Feature Construction
Search Over Location Scale
Classifier
Wavelet Transform
Feature Search
Feature Selection
Adaboost
Training Process
36Schneiderman Detector Training Data
- Fully Supervised Training
- fully labeled examples with landmark locations
- Semi-Supervised Training
- weakly labeled example image containing the
object - initial model is trained using fully labeled data
- Variants of self-training are evaluated
37Self Training Selection Metrics
- Confidence based selection metric
- Classifier output / odds ratio
- Nearest Neighbor selection metric
- Preprocessing high pass filter normalized
variance - Mahalanobis distance to closest labeled example
Labeled Images
Candidate Image
38Schneiderman Experiment Details
- Training Data
- 231 images from the Feret data set and the web
- Multiple eyes per image 480 training examples
- 80 synthetic variations position, scale,
orientation - Native object resolution 24x16 pixels
- 15,000 non-object examples from clutter images
39Schneiderman Experiment Details
- Evaluation Metric
- /- 0.5 object radius and /- 1 scale octave are
correct - Area under the ROC curve (AUC) performance
measure - ROC curve Receiver Operating Characteristic
Curve - Detection rate vs. false positive count
Detection Rate in Percent
Number of False Positives
40Schneiderman Experiment Details
- Experimental Variations
- 5-10 runs with random data splits per experiment
- Experimental Complexity
- Training the detector one iteration
- One iteration 12 CPU hours on a 2 GHz class
machine - One run 10 iterations 120 CPU hours 5 CPU
days - One experiment 10 runs 50 CPU days
- All experiments took approximately 3 CPU years
- Testing Data
- Separate set of 44 images with 102 examples
41Example Detection Results
Fully Labeled Data Only
Fully Labeled Weakly Labeled Data
42Example Detection Results
Fully Labeled Data Only
Fully Labeled Weakly Labeled Data
43When can weakly labeled data help?
Full Data Normalized AUC
saturated
smooth
failure
Fully Labeled Training Set Size on a Log Scale
- It can help in the smooth regime
- Three regimes of operation saturated, smooth,
failure
44Performance of Confidence Metric Self-Training
Full Data Normalized AUC
24 30 34
40 48 60
Fully Labeled Training Set Size
- Improved performance over range of data set
sizes. - Not all improvements significant at 95 level.
45Performance of NN Metric Self-Training
Full Data Normalized AUC
24 30 34
40 48 60
Fully Labeled Training Set Size
- Improved performance over range of data set
sizes. - All improvements significant at 95 level.
46MSE Metric Changes to Self-Training Behavior
Base Data Normalized AUC
Base Data Normalized AUC
Iteration Number
Iteration Number
Confidence Metric Performance vs. Iteration
NN Metric Performance vs. Iteration
NN metric performance trend is level or upwards
47Example Training Image Progression
0.822
0.822
Confidence Metric
NN Metric
1
0.770
0.867
2
0.882
0.798
48Example Training Image Progression
3
0.798
0.922
4
0.745
0.931
5
0.759
0.906
49How much weakly labeled data is used?
Weakly labeled data set size
Weakly labeled data set ratio
Ratio of Weakly to Fully Labeled Data
Training Data Size
24 30 34
40 48 60
24 30 34
40 48 60
Fully Labeled Training Set Size
Fully Labeled Training Set Size
It is relatively constant over initial data set
size.
50Presentation Outline
- Introduction
- Background
- Semi-supervised Training Approach
- Analysis Filter Based Detector
- Analysis Schneiderman Detector
- Conclusions and Future Work
51Contributions
- Devised approach which achieves substantial
performance gains through semi-supervised
training. - Comprehensive evaluation (3 CPU years) of
semi-supervised training applied to object
detection. - Detailed characterization and comparison of
semi-supervised approaches used much more
analysis and many more details in the thesis.
52Future Work
- Enabling the use of training images with clutter
for context - Context priming
- A. Torralba, P. Sinha. ICCV 2001 and A. Torralba,
K. Murphy, W. Freeman, M. Rubin. ICCV 2003. - Training with weakly labeled data only
- Online robot learning
- Mining the web for object detection
- K. Barnard, D. Forsyth. ICCV 2001.
- K. Barnard, P. Duygulu, N. de Frietas, D.
Forsyth. D. Blei. M. Jordan. JMLR 2003.
53Conclusions
- Semi-supervised training can be practically
applied to object detection to good effect. - Self-training approach can substantially
outperform EM. - Selection metric is crucial for self-training
performance.
54 55 56Filter Model Results
- Key Points
- Batch EM does not provide performance increase
- Self-training provides a performance increase
- 1-NN and 4-NN metrics work better than confidence
- Near Pair accuracy is highest
57Weakly Labeled Point Performance
- Does confidence metric self-training improve
point performance? - Yes - over a range of data set sizes.
58Weakly Labeled Point Performance
- Does MSE metric self-training improve point
performance? - Yes to a significant level over a range of
data set sizes.
59Schneiderman Features
60Schneiderman Detection Process
61Sample Schneiderman Face Detections
62 63Simulation Data
Labeled and Unlabeled Data
Hidden Labels
64Simulation Data
Nearest Neighbor
Confidence Metric
65Simulation Data
Model Based
Confidence Metric
66 67Future Work Mining the Web
Clinton Colors
Green regions are Not-Clinton.
Not-Clinton Colors
68Future Work Mining the Web
Flag Colors
Green regions are Not-Flag.
Not-Flag Colors