Wei Zhang wei.zhang22hp.com - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Wei Zhang wei.zhang22hp.com

Description:

Through the first half of the 20th century, most of the scientific community ... Term-weighting approaches in automatic text retrieval. Information Processing ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 23
Provided by: carbonVide
Category:
Tags: com | halfterm | wei | zhang | zhang22hp

less

Transcript and Presenter's Notes

Title: Wei Zhang wei.zhang22hp.com


1
Learning Non-Redundant Codebooks for Classifying
Complex Objects
  • Wei Zhang
    wei.zhang22_at_hp.com

  • (zhangwe_at_eecs.oregonstate.edu)
  • Akshat Surve
    survea_at_eecs.oregonstate.edu
  • Xiaoli Fern
    xfern_at_eecs.oregonstate.edu
  • Thomas Dietterich
    tgd_at_eecs.oregonstate.edu

2
Contents
  • Learning codebooks for object classification
  • Learning non-redundant codebooks
  • Framework
  • Boost-Resampling algorithm
  • Boost-Reweighting algorithm
  • Experiments
  • Conclusions and future work

3
Contents
  • Learning codebooks for object classification
  • Learning non-redundant codebooks
  • Framework
  • Boost-Resampling algorithm
  • Boost-Reweighting algorithm
  • Experiments
  • Conclusions and future work

4
Problem 1 Stonefly Recognition
5
Visual Codebook for Object Recognition
Training image
Visual Codebook
Interest Region Detector
20
17
Region Descriptors
Testing image
3
18
Classifier
2
6
Image Attribute Vector (Term Frequency)
6
Problem 2 Document Classification
Fixed-length Bag-of-words
Variable-length Document
  • Through the first half of the 20th century,
    most of the scientific community believed
    dinosaurs to have been slow, unintelligent
    cold-blooded animals. Most research conducted
    since the 1970s, however, has supported the view
    that dinosaurs were active animals with elevated
    metabolisms and numerous adaptations for social
    interaction. The resulting transformation in the
    scientific understanding of dinosaurs has
    gradually filtered

absent 0 active 1 animal 2 believe
1 dinosaur 3 social1
7
Codebook for Document Classification
  • Cluster the words to form code-words

codebook
Through the first half of the 20th century,
most of the scientific community believed
dinosaurs to have been slow, unintelligent
cold-blooded animals. Most research conducted
since the 1970s, however, has supported the view
that dinosaurs were active animals with elevated
metabolisms and numerous adaptations for social
interaction. The resulting transformation in the
scientific understanding of dinosaurs has
gradually filtered
dog, canine, hound, ...
Through the first half of the 20th century,
most of the scientific community believed
dinosaurs to have been slow, unintelligent
cold-blooded animals. Most research conducted
since the 1970s, however, has supported the view
that dinosaurs were active animals with elevated
metabolisms and numerous adaptations for social
interaction. The resulting transformation in the
scientific understanding of dinosaurs has
gradually filtered
cluster 1
Through the first half of the 20th century,
most of the scientific community believed
dinosaurs to have been slow, unintelligent
cold-blooded animals. Most research conducted
since the 1970s, however, has supported the view
that dinosaurs were active animals with elevated
metabolisms and numerous adaptations for social
interaction. The resulting transformation in the
scientific understanding of dinosaurs has
gradually filtered
car, automobile, vehicle,
cluster 2
Training corpus


cluster K
Through the first half of the 20th century,
most of the scientific community believed
dinosaurs to have been slow, unintelligent
cold-blooded animals. Most research conducted
since the 1970s, however, has supported the view
that dinosaurs were active animals with elevated
metabolisms and numerous adaptations for social
interaction. The resulting transformation in the
scientific understanding of dinosaurs has
gradually filtered
20
1

0
2
Input document
Classifier
7
8
Contents
  • Learning codebooks for object classification
  • Learning non-redundant codebooks
  • Framework
  • Boost-Resampling algorithm
  • Boost-Reweighting algorithm
  • Experiments
  • Conclusions and future work

9
Learning Non-Redundant Codebooks



Non-Redundant Learning
Codebook Approaches k-means, Gaussian Mixture
Modeling, Information Bottleneck, Vocabulary
trees, Spatial pyramid
Motivation Improve the discriminative
performance of any codebook and classifier
learning approach by encouraging non-redundancy
in the learning process. Approach learn
multiple codebooks and classifiers wrap the
codebook and classifier learning process inside a
boosting procedure 1.
1 Freund, Y. and Schapire, R. (1996).
Experiments with a new boosting algorithm. ICML.
10
Non-Redundant Codebook and Classifier Learning
Framework



W1(B)
Predictions L1
Wt(B)
Update boosting weights
Predictions Lt
Final Predictions L
Update boosting weights

WT(B)
Update boosting weights
Predictions LT
11
Instantiations of the Framework


  • Boost-Reweighting (discrete feature space)
  • Supervised clustering features X based on the
    joint distribution table Pt(X, Y) (Y represents
    the class labels). This table is updated at each
    iteration based on the new boosting weights.
  • Boost-Resampling (continuous feature space)
  • Generate a non-redundant clustering set by
    sampling the training examples according to the
    updated boosting weights. The codebook is
    constructed by clustering the features in this
    clustering set.

12
Codebook Learning and Classification Algorithms
  • Documents
  • Codebook Learning Information Bottleneck (IB)
    1
  • L I(X X) - ßI(X Y)
  • Classification Naïve Bayes
  • Objects
  • Codebook Learning K-Means
  • Classification Bagged Decision Trees

1 Bekkerman, R., El-yaniv, R., Tishby, N.,
Winter, Y., Guyon, I. and Elisseeff, A. (2003).
Distributional word clusters vs. words for text
categorization. JMLR.
13
Image Attributes tf-idf Weights



Visual Codebook
20
17
Classifier
3
18
tf-idf
2
Interest Regions
6
Region Descriptors
Image Attribute Vector
Term-frequency-inverse document frequency
(tf-idf) weight 1 "Document" Image "Term"
Instance of a visual word
1 Salton, G. and Buckley, C. (1988).
Term-weighting approaches in automatic text
retrieval. Information Processing Management.
14
Contents
  • Learning codebooks for object classification
  • Learning non-redundant codebooks
  • Framework
  • Boost-Resampling algorithm
  • Boost-Reweighting algorithm
  • Experiments
  • Conclusions and future work

15
Experimental Results - Stonefly Recognition


  • 3-fold cross validation experiments
  • The size of each codebook K 100
  • The number of boosting iterations T 50

1 Larios, N., Deng, H., Zhang, W., Sarpola, M.,
Yuen, J., Paasch, R., Moldenke, A., Lytle, D.,
Ruiz Correa, S., Mortensen, E., Shapiro, L. and
Dietterich, T. (2008). Automated insect
identification through concatenated histograms of
local appearance features. Machine Vision and
Applications. 2 Opelt, A., Pinz, A.,
Fussenegger, M. and Auer, P. (2006). Generic
object recognition with boosting. PAMI.
16
Experimental Results - Stonefly Recognition
(cont.)


  • Single learns only a single codebook of size
    KT 5000.
  • Random weighted sampling is replaced with
    uniform random sampling that neglects the
    boosting weights.
  • Boost achieves 77 error reduction comparing with
    Single on STONEFLY9.

17
Experimental Results - Stonefly Recognition
(cont.)



18
Experimental Results - Document Classification


  • S1000 learns a single codebook of size 1000.
  • S100 learns a single codebook of size 100.
  • Random 10 bagged samples of the original
    training corpus are used to estimate the joint
    distribution table Pt(X, Y).

19
Experimental Results - Document
Classification (cont.)


  • TODO add Figure 5 in a similar format as
    Figure 4

20
Contents
  • Learning codebooks for object classification
  • Learning non-redundant codebooks
  • Framework
  • Boost-Resampling algorithm
  • Boost-Reweighting algorithm
  • Experiments
  • Conclusions and future work

21
Conclusions and Future Work
  • Conclusions
  • Non-redundant learning is a simple and general
    framework to
  • effectively improve the performance of codebooks.
  • Future work
  • Explore the underlying reasons for the
    effectiveness of non-redundant codebooks
    discriminative analysis, non-redundancy tests
  • More comparison experiments on well-established
    datasets.

22
Acknowledgements


  • Supported by Oregon State University insect ID
    project http//web.engr.oregonstate.edu/tgd/bugi
    d
  • Supported by NSF under grant number IIS-0705765.
  • Thank you !
Write a Comment
User Comments (0)
About PowerShow.com