Shared features and Joint Boosting - PowerPoint PPT Presentation

About This Presentation
Title:

Shared features and Joint Boosting

Description:

Essentially more positive samples. Reuse the data ... Weighting samples ... Many classes, each class has only a new samples ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 60
Provided by: tyd
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Shared features and Joint Boosting


1
Shared features and Joint Boosting
Sharing visual features for multiclass and
multiview object detectionA. Torralba, K. P.
Murphy and W. T. Freeman PAMI. vol. 29, no. 5,
pp. 854-869, May, 2007.
  • Yuandong Tian

2
Outline
  • Motivation to choose this paper
  • Motivation of this paper
  • Basic ideas in boosting
  • Joint Boost
  • Feature used in this paper
  • My results in face recognition

3
Motivation to choose this paper
  • Axiom
  • Computer vision is hard.
  • Assumption (smart-stationary)
  • Equally smart people are equally distributed
    over time.
  • Conjure
  • If computer vision cannot be solved in 30 years,
    it wont be solved forever!

4
  • Wrong!

Because we are standing on the Shoulder of
Giants.
5
Where are the Giants?
  • More computing resources?
  • Lots of data?
  • Advancement of new algorithm?
  • Machine Learning?

What I believe
6
Cruel Reality
  • Why ML seems not to help much in CV (at least for
    now)?

My answer CV and ML are weakly coupled
7
A typical question in CV
Q Why do we use feature A instead of feature B?
A1 Feature A gives better performance.
A2 Feature A has some fancy properties.
A3 The following step requires the feature to
have a certain property that only A has.
A strongly-coupled answer
8
Typical CV pipeline
Preprocessing Steps (Computer Vision)
Feature/Similarity
ML black box
Have some domain-specific structures
Design for generic structures
9
Contribution of this paper
  • Tune the ML algorithm in a CV context
  • A good attempt to break the black box and
    integrate them together

10
Outline
  • Motivation to choose this paper
  • Motivation of this paper
  • Basic ideas in boosting
  • Joint Boost
  • Feature used in this paper
  • My results in face recognition

11
This paper
  • Object Recognition Problem
  • Many object category.
  • Few images per category
  • SolutionFeature sharing
  • Find common features that distinguish a subset of
    classes against the rest.

12
Feature sharing
Concept of Feature Sharing
13
Typical behavior of feature sharing
Template-like features 100 accuracy for a single
object But too specific.
Wavelet-like features, weaker discriminative
power but shared in many classes.
14
Result of feature sharing
15
Why feature sharing?
  • ML Regularizationavoid over-fitting
  • Essentially more positive samples
  • Reuse the data
  • CV Utilize the intrinsic structure of object
    category
  • Use domain-specific prior to bias the machine
    learning algorithm

16
Outline
  • Motivation to choose this paper
  • Motivation of this paper
  • Basic ideas in boosting
  • Joint Boost
  • Feature used in this paper
  • My results in face recognition

17
Basic idea in Boosting
  • Concept Binary classification
  • samples, labels(1 or -1)
  • Goal Find a function (classifier) H which
  • maps positive samples to the positive value
  • Optimization Minimize the
  • exponential loss w.r.t the classifier H

18
Basic idea in boosting(2)
  • Boosting Assume H is additive
  • Each is a weak learner (classifier).
  • Almost random but uniformly better
  • than random
  • Example
  • Single feature classifier
  • make decision only on a single dimension

19
How weak learner looks like
Key point
The addition of weak classifiers gives a strong
classifier!
20
Basic idea in boosting(3)
  • How to minimize?
  • Greedy Approach
  • Fix H, add one h in each iteration
  • Weighting samples
  • After each iteration, wrongly classified samples
    (difficult samples) get higher weights

21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
Technical parts
  • Greedy -gt Second-order Taylor Expansion in each
    iteration

weights
The weak learner to be optimized in this
iteration
labels
Solved by Least Square
28
(No Transcript)
29
Outline
  • Motivation to choose this paper
  • Motivation of this paper
  • Basic ideas in boosting
  • Joint Boost
  • Feature used in this paper
  • My results in face recognition

30
Joint BoostMulticlass
  • We can minimize a similar function using
    one-vs-all strategy
  • This doesnt work very well, since it is
    separable in c.
  • Put constraints. -gt shared features!

31
Joint Boost (2)
  • In each iteration, choose
  • One common feature
  • A subset of classes that use this feature
  • So that the objective decreases most

32
Sharing Diagram
Iteration
class
I II III IV V
Features 1 3 4 5 2 1 4 6 2 7
3
33
Key insight
  • Each class may have its own favorite feature
  • a common feature may not be any of them, however
    it simultaneously decreases errors of many
    classes.

34
Joint Boost Illustration
35
(No Transcript)
36
(No Transcript)
37
Computational issue
  • Choose the best subset is prohibitive
  • Use greedy approach
  • Choose one class and one feature so that the
    objective decreases the most
  • Iteratively add more classes until the objective
    increases again
  • Note the common feature may change
  • From O(2C) to O(C2)

38
features O(log class)
(greedy)
39
0.95 ROC
29 objects, average over 20 training sets
40
When to stop?
  • Good questions!
  • It never stops.
  • It keeps maximizing the margin.
  • Big margin means
  • potentially good generalization performance
  • seriously over-fitting on the decision boundary.

41
Outline
  • Motivation to choose this paper
  • Motivation of this paper
  • Basic ideas in boosting
  • Joint Boost
  • Feature used in this paper
  • My results in face recognition

42
Feature they used in the paper
  • Dictionary
  • 2000 random sampled patches
  • Of size from 4x4 to 14x14
  • no clustering
  • Each patch is associated with a spatial mask

43
The candidate features
position
template
Dictionary of 2000 candidate patches and position
masks, randomly sampled from the training images
44
Features
  • Building feature vectors
  • Normalized correlation with each patch to get
    response
  • Raise the response to some power
  • Large value gets even larger and dominate the
    response (max operation)
  • Use spatial mask to align the response to the
    object center (voting)
  • Extract response vector at object center

45
(No Transcript)
46
Results
  • Multiclass object recognition
  • Dataset LabelMe
  • 21 objects, 50 samples per object
  • 500 rounds
  • Multiview car recognition
  • Train on LabelMe, test on PASCAL
  • 12 views, 50 samples per view
  • 300 rounds

47
(No Transcript)
48
(No Transcript)
49
70 rounds, 20 training per class, 21 objects
50
(No Transcript)
51
12 views 50 samples per class 300 features
52
Outline
  • Motivation to choose this paper
  • Motivation of this paper
  • Basic ideas in boosting
  • Joint Boost
  • Feature used in this paper
  • My results in face recognition

53
Simple Experiment
  • Main point of this paper
  • They claimed shared feature helps in the
    situation of
  • many categories, only a few samples in each
    category.
  • Test it!
  • Dataset face recognition
  • Face in the wild dataset.
  • Many famous figures

54
Experiment configuration
  • Use Gist-like feature but
  • Only Gabor response
  • Use finer grid to gather histogram
  • Face is aligned in the dataset.
  • Feature statistics
  • 8 orientation, 2 scale, 8x8 grid
  • 1024 dimension

55
Experiment
  • Training and testing
  • Find 50 identities with most images
  • For each identity, random select 3 as training
  • The rest for testing

56
Nearest neighbor (50 classes, 3 per class) Chance
rate 0.02
57
80 better than NN
58
(No Transcript)
59
Result on More images
  • 50 people, 7 images each
  • Chance rate 2
  • Nearest neighbor

L1 0.2856 (0.1868 in 50/3) L2 0.2022 Chisqr
0.2596
60
Joint Boost doubles the accuracy of NN
More feature is shared
Single-gtPairwise Pairwise-gtJoint 7 percent
61
Result on More Identities
  • 100 people, 3 images each
  • Chance rate 1
  • Nearest neighbor

L1 0.1656 (0.1868 in 50/3) L2 0.1235 Chisqr
0.1623
62
Joint Boost is still better than NN yet the
increment is less (60) compared to the
previous cases.
The performance of single Boost is the same as
NN
63
Conclusion
  • Joint Boosting indeed works
  • Especially when the number of images per class is
    not too small (otherwise NN)
  • Better performance in the presence of
  • Many classes, each class has only a new samples
  • Introduce regularization that reduce overfitting
  • Disadvantages
  • Train slowly, O(C2).

64
Thanks!
  • Any questions?
Write a Comment
User Comments (0)
About PowerShow.com