Wei Zhang wei.zhang22hp.com - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Wei Zhang wei.zhang22hp.com

Description:

Through the first half of the 20th century, most of the scientific community ... Term-weighting approaches in automatic text retrieval. Information Processing ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 23

Provided by: carbonVide

Category:

more less

Transcript and Presenter's Notes

Title: Wei Zhang wei.zhang22hp.com

1
Learning Non-Redundant Codebooks for Classifying
Complex Objects

Wei Zhang
wei.zhang22_at_hp.com
(zhangwe_at_eecs.oregonstate.edu)
Akshat Surve
survea_at_eecs.oregonstate.edu
Xiaoli Fern
xfern_at_eecs.oregonstate.edu
Thomas Dietterich
tgd_at_eecs.oregonstate.edu

2
Contents

Learning codebooks for object classification
Learning non-redundant codebooks
Framework
Boost-Resampling algorithm
Boost-Reweighting algorithm
Experiments
Conclusions and future work

3
Contents

Learning codebooks for object classification
Learning non-redundant codebooks
Framework
Boost-Resampling algorithm
Boost-Reweighting algorithm
Experiments
Conclusions and future work

4
Problem 1 Stonefly Recognition
5
Visual Codebook for Object Recognition
Training image
Visual Codebook
Interest Region Detector
20
17
Region Descriptors
Testing image
3
18
Classifier
2
6
Image Attribute Vector (Term Frequency)
6
Problem 2 Document Classification
Fixed-length Bag-of-words
Variable-length Document

Through the first half of the 20th century,
most of the scientific community believed
dinosaurs to have been slow, unintelligent
cold-blooded animals. Most research conducted
since the 1970s, however, has supported the view
that dinosaurs were active animals with elevated
metabolisms and numerous adaptations for social
interaction. The resulting transformation in the
scientific understanding of dinosaurs has
gradually filtered

absent 0 active 1 animal 2 believe
1 dinosaur 3 social1
7
Codebook for Document Classification

Cluster the words to form code-words

codebook
Through the first half of the 20th century,
most of the scientific community believed
dinosaurs to have been slow, unintelligent
cold-blooded animals. Most research conducted
since the 1970s, however, has supported the view
that dinosaurs were active animals with elevated
metabolisms and numerous adaptations for social
interaction. The resulting transformation in the
scientific understanding of dinosaurs has
gradually filtered
dog, canine, hound, ...
Through the first half of the 20th century,
most of the scientific community believed
dinosaurs to have been slow, unintelligent
cold-blooded animals. Most research conducted
since the 1970s, however, has supported the view
that dinosaurs were active animals with elevated
metabolisms and numerous adaptations for social
interaction. The resulting transformation in the
scientific understanding of dinosaurs has
gradually filtered
cluster 1
Through the first half of the 20th century,
most of the scientific community believed
dinosaurs to have been slow, unintelligent
cold-blooded animals. Most research conducted
since the 1970s, however, has supported the view
that dinosaurs were active animals with elevated
metabolisms and numerous adaptations for social
interaction. The resulting transformation in the
scientific understanding of dinosaurs has
gradually filtered
car, automobile, vehicle,
cluster 2
Training corpus

cluster K
Through the first half of the 20th century,
most of the scientific community believed
dinosaurs to have been slow, unintelligent
cold-blooded animals. Most research conducted
since the 1970s, however, has supported the view
that dinosaurs were active animals with elevated
metabolisms and numerous adaptations for social
interaction. The resulting transformation in the
scientific understanding of dinosaurs has
gradually filtered
20
1

0
2
Input document
Classifier
7
8
Contents

Learning codebooks for object classification
Learning non-redundant codebooks
Framework
Boost-Resampling algorithm
Boost-Reweighting algorithm
Experiments
Conclusions and future work

9
Learning Non-Redundant Codebooks

Non-Redundant Learning
Codebook Approaches k-means, Gaussian Mixture
Modeling, Information Bottleneck, Vocabulary
trees, Spatial pyramid
Motivation Improve the discriminative
performance of any codebook and classifier
learning approach by encouraging non-redundancy
in the learning process. Approach learn
multiple codebooks and classifiers wrap the
codebook and classifier learning process inside a
boosting procedure 1.
1 Freund, Y. and Schapire, R. (1996).
Experiments with a new boosting algorithm. ICML.
10
Non-Redundant Codebook and Classifier Learning
Framework

W1(B)
Predictions L1
Wt(B)
Update boosting weights
Predictions Lt
Final Predictions L
Update boosting weights

WT(B)
Update boosting weights
Predictions LT
11
Instantiations of the Framework

Boost-Reweighting (discrete feature space)
Supervised clustering features X based on the
joint distribution table Pt(X, Y) (Y represents
the class labels). This table is updated at each
iteration based on the new boosting weights.
Boost-Resampling (continuous feature space)
Generate a non-redundant clustering set by
sampling the training examples according to the
updated boosting weights. The codebook is
constructed by clustering the features in this
clustering set.

12
Codebook Learning and Classification Algorithms

Documents
Codebook Learning Information Bottleneck (IB)
1
L I(X X) - ßI(X Y)
Classification Naïve Bayes
Objects
Codebook Learning K-Means
Classification Bagged Decision Trees

1 Bekkerman, R., El-yaniv, R., Tishby, N.,
Winter, Y., Guyon, I. and Elisseeff, A. (2003).
Distributional word clusters vs. words for text
categorization. JMLR.
13
Image Attributes tf-idf Weights

Visual Codebook
20
17
Classifier
3
18
tf-idf
2
Interest Regions
6
Region Descriptors
Image Attribute Vector
Term-frequency-inverse document frequency
(tf-idf) weight 1 "Document" Image "Term"
Instance of a visual word
1 Salton, G. and Buckley, C. (1988).
Term-weighting approaches in automatic text
retrieval. Information Processing Management.
14
Contents

Learning codebooks for object classification
Learning non-redundant codebooks
Framework
Boost-Resampling algorithm
Boost-Reweighting algorithm
Experiments
Conclusions and future work

15
Experimental Results - Stonefly Recognition

3-fold cross validation experiments
The size of each codebook K 100
The number of boosting iterations T 50

1 Larios, N., Deng, H., Zhang, W., Sarpola, M.,
Yuen, J., Paasch, R., Moldenke, A., Lytle, D.,
Ruiz Correa, S., Mortensen, E., Shapiro, L. and
Dietterich, T. (2008). Automated insect
identification through concatenated histograms of
local appearance features. Machine Vision and
Applications. 2 Opelt, A., Pinz, A.,
Fussenegger, M. and Auer, P. (2006). Generic
object recognition with boosting. PAMI.
16
Experimental Results - Stonefly Recognition
(cont.)

Single learns only a single codebook of size
KT 5000.
Random weighted sampling is replaced with
uniform random sampling that neglects the
boosting weights.
Boost achieves 77 error reduction comparing with
Single on STONEFLY9.

17
Experimental Results - Stonefly Recognition
(cont.)

18
Experimental Results - Document Classification

S1000 learns a single codebook of size 1000.
S100 learns a single codebook of size 100.
Random 10 bagged samples of the original
training corpus are used to estimate the joint
distribution table Pt(X, Y).

19
Experimental Results - Document
Classification (cont.)

TODO add Figure 5 in a similar format as
Figure 4

20
Contents

Learning codebooks for object classification
Learning non-redundant codebooks
Framework
Boost-Resampling algorithm
Boost-Reweighting algorithm
Experiments
Conclusions and future work

21
Conclusions and Future Work

Conclusions
Non-redundant learning is a simple and general
framework to
effectively improve the performance of codebooks.
Future work
Explore the underlying reasons for the
effectiveness of non-redundant codebooks
discriminative analysis, non-redundancy tests
More comparison experiments on well-established
datasets.

22
Acknowledgements

Supported by Oregon State University insect ID
project http//web.engr.oregonstate.edu/tgd/bugi
d
Supported by NSF under grant number IIS-0705765.
Thank you !

Write a Comment

User Comments (0)