Title: Wei Zhang wei.zhang22hp.com
1Learning Non-Redundant Codebooks for Classifying
Complex Objects
- Wei Zhang
wei.zhang22_at_hp.com -
(zhangwe_at_eecs.oregonstate.edu) - Akshat Surve
survea_at_eecs.oregonstate.edu - Xiaoli Fern
xfern_at_eecs.oregonstate.edu - Thomas Dietterich
tgd_at_eecs.oregonstate.edu
2Contents
- Learning codebooks for object classification
- Learning non-redundant codebooks
- Framework
- Boost-Resampling algorithm
- Boost-Reweighting algorithm
- Experiments
- Conclusions and future work
3Contents
- Learning codebooks for object classification
- Learning non-redundant codebooks
- Framework
- Boost-Resampling algorithm
- Boost-Reweighting algorithm
- Experiments
- Conclusions and future work
4Problem 1 Stonefly Recognition
5Visual Codebook for Object Recognition
Training image
Visual Codebook
Interest Region Detector
20
17
Region Descriptors
Testing image
3
18
Classifier
2
6
Image Attribute Vector (Term Frequency)
6Problem 2 Document Classification
Fixed-length Bag-of-words
Variable-length Document
- Through the first half of the 20th century,
most of the scientific community believed
dinosaurs to have been slow, unintelligent
cold-blooded animals. Most research conducted
since the 1970s, however, has supported the view
that dinosaurs were active animals with elevated
metabolisms and numerous adaptations for social
interaction. The resulting transformation in the
scientific understanding of dinosaurs has
gradually filtered
absent 0 active 1 animal 2 believe
1 dinosaur 3 social1
7Codebook for Document Classification
- Cluster the words to form code-words
codebook
Through the first half of the 20th century,
most of the scientific community believed
dinosaurs to have been slow, unintelligent
cold-blooded animals. Most research conducted
since the 1970s, however, has supported the view
that dinosaurs were active animals with elevated
metabolisms and numerous adaptations for social
interaction. The resulting transformation in the
scientific understanding of dinosaurs has
gradually filtered
dog, canine, hound, ...
Through the first half of the 20th century,
most of the scientific community believed
dinosaurs to have been slow, unintelligent
cold-blooded animals. Most research conducted
since the 1970s, however, has supported the view
that dinosaurs were active animals with elevated
metabolisms and numerous adaptations for social
interaction. The resulting transformation in the
scientific understanding of dinosaurs has
gradually filtered
cluster 1
Through the first half of the 20th century,
most of the scientific community believed
dinosaurs to have been slow, unintelligent
cold-blooded animals. Most research conducted
since the 1970s, however, has supported the view
that dinosaurs were active animals with elevated
metabolisms and numerous adaptations for social
interaction. The resulting transformation in the
scientific understanding of dinosaurs has
gradually filtered
car, automobile, vehicle,
cluster 2
Training corpus
cluster K
Through the first half of the 20th century,
most of the scientific community believed
dinosaurs to have been slow, unintelligent
cold-blooded animals. Most research conducted
since the 1970s, however, has supported the view
that dinosaurs were active animals with elevated
metabolisms and numerous adaptations for social
interaction. The resulting transformation in the
scientific understanding of dinosaurs has
gradually filtered
20
1
0
2
Input document
Classifier
7
8Contents
- Learning codebooks for object classification
- Learning non-redundant codebooks
- Framework
- Boost-Resampling algorithm
- Boost-Reweighting algorithm
- Experiments
- Conclusions and future work
9Learning Non-Redundant Codebooks
Non-Redundant Learning
Codebook Approaches k-means, Gaussian Mixture
Modeling, Information Bottleneck, Vocabulary
trees, Spatial pyramid
Motivation Improve the discriminative
performance of any codebook and classifier
learning approach by encouraging non-redundancy
in the learning process. Approach learn
multiple codebooks and classifiers wrap the
codebook and classifier learning process inside a
boosting procedure 1.
1 Freund, Y. and Schapire, R. (1996).
Experiments with a new boosting algorithm. ICML.
10Non-Redundant Codebook and Classifier Learning
Framework
W1(B)
Predictions L1
Wt(B)
Update boosting weights
Predictions Lt
Final Predictions L
Update boosting weights
WT(B)
Update boosting weights
Predictions LT
11Instantiations of the Framework
- Boost-Reweighting (discrete feature space)
- Supervised clustering features X based on the
joint distribution table Pt(X, Y) (Y represents
the class labels). This table is updated at each
iteration based on the new boosting weights. - Boost-Resampling (continuous feature space)
- Generate a non-redundant clustering set by
sampling the training examples according to the
updated boosting weights. The codebook is
constructed by clustering the features in this
clustering set.
12Codebook Learning and Classification Algorithms
- Documents
- Codebook Learning Information Bottleneck (IB)
1 - L I(X X) - ßI(X Y)
- Classification Naïve Bayes
- Objects
- Codebook Learning K-Means
- Classification Bagged Decision Trees
1 Bekkerman, R., El-yaniv, R., Tishby, N.,
Winter, Y., Guyon, I. and Elisseeff, A. (2003).
Distributional word clusters vs. words for text
categorization. JMLR.
13Image Attributes tf-idf Weights
Visual Codebook
20
17
Classifier
3
18
tf-idf
2
Interest Regions
6
Region Descriptors
Image Attribute Vector
Term-frequency-inverse document frequency
(tf-idf) weight 1 "Document" Image "Term"
Instance of a visual word
1 Salton, G. and Buckley, C. (1988).
Term-weighting approaches in automatic text
retrieval. Information Processing Management.
14Contents
- Learning codebooks for object classification
- Learning non-redundant codebooks
- Framework
- Boost-Resampling algorithm
- Boost-Reweighting algorithm
- Experiments
- Conclusions and future work
15Experimental Results - Stonefly Recognition
- 3-fold cross validation experiments
- The size of each codebook K 100
- The number of boosting iterations T 50
1 Larios, N., Deng, H., Zhang, W., Sarpola, M.,
Yuen, J., Paasch, R., Moldenke, A., Lytle, D.,
Ruiz Correa, S., Mortensen, E., Shapiro, L. and
Dietterich, T. (2008). Automated insect
identification through concatenated histograms of
local appearance features. Machine Vision and
Applications. 2 Opelt, A., Pinz, A.,
Fussenegger, M. and Auer, P. (2006). Generic
object recognition with boosting. PAMI.
16Experimental Results - Stonefly Recognition
(cont.)
- Single learns only a single codebook of size
KT 5000. -
- Random weighted sampling is replaced with
uniform random sampling that neglects the
boosting weights. - Boost achieves 77 error reduction comparing with
Single on STONEFLY9.
17Experimental Results - Stonefly Recognition
(cont.)
18Experimental Results - Document Classification
- S1000 learns a single codebook of size 1000.
-
- S100 learns a single codebook of size 100.
- Random 10 bagged samples of the original
training corpus are used to estimate the joint
distribution table Pt(X, Y).
19Experimental Results - Document
Classification (cont.)
- TODO add Figure 5 in a similar format as
Figure 4
20Contents
- Learning codebooks for object classification
- Learning non-redundant codebooks
- Framework
- Boost-Resampling algorithm
- Boost-Reweighting algorithm
- Experiments
- Conclusions and future work
21Conclusions and Future Work
- Conclusions
- Non-redundant learning is a simple and general
framework to - effectively improve the performance of codebooks.
- Future work
- Explore the underlying reasons for the
effectiveness of non-redundant codebooks
discriminative analysis, non-redundancy tests - More comparison experiments on well-established
datasets.
22Acknowledgements
- Supported by Oregon State University insect ID
project http//web.engr.oregonstate.edu/tgd/bugi
d - Supported by NSF under grant number IIS-0705765.
- Thank you !