Feature Selection for Image Retrieval presentation

About This Presentation

Transcript and Presenter's Notes

Title: Feature Selection for Image Retrieval

1
Feature Selection for Image Retrieval

By Karina Zapién Arreola
January 21th, 2005

2
Introduction

Variable and feature selection have become the
focus of much research in areas of applications
for datasets with many variables are available
Text processing
Gene expression
Combinatorial chemistry

3
Motivation

The objective of feature selection is three-fold
Improving the prediction performance of the
predictors
Providing a faster and more cost-effective
predictors
Providing a better understanding of the
underlying process that generated the data

4
Why use feature selection in CBIR

Different users may need different features for
image retrieval
From each selected sample, a specific feature set
can be chosen

5
Boosting

Method for improving the accuracy of any learning
algorithm
Use of weak algorithms for single rules
Weighting of the weak algorithms
Combination of weak rules into a strong learning
algorithm

6
Adaboost Algorithm

Is a iterative boosting algorithm
Notation
Samples (x1,y1),,(xm,ym), where, yi -1,1
There are m positive samples, and l negative
samples
Weak classifiers hi
For iteration t, the error is defined as
et min (½)Si ?i hi(xi) yi
where ?i is a weight for xi.

7
Adaboost Algorithm

Given samples (x1,y1),,(xm,ym), where yi -1,1
Initialize ?1,i1/(2m), 1/(2l), for yi 1,-1
For t1,,T
Normalize ?t,i ?t,i /(Sj ?t,j)
Train base learner ht,i using distribution ?i,j
Choose ht that minimize et with error ei
Update ?t1,i ?t,i ßt1-ei
Set ßt (et)/(1- et) and at log(1/ ßt)
Output the final classifier H(x) sign( St at
ht(x) )

8
Adaboost Application

Searching similar groups
A particular image class is chosen
A positive sample of this group is given randomly
A negative sample of the rest of the images is
given randomly

9
Check list Feature Selection

Domain knowledge
Commensurate features
Interdependence of features
Prune of input variables
Asses features individually
Dirty data
Predictor linear predictor
Comparison
Stable solution

10
Domain knowledge

Features used
colordb_sumRGB_entropy_d1
col_gpd_hsv
col_gpd_lab
col_gpd_rgb
col_hu_hsv2
col_hu_lab2
col_hu_lab
col_hu_rgb2
col_hu_rgb
col_hu_seg2_hsv
col_hu_seg2_lab
col_hu_seg2_rgb

Features used
col_hu_seg_hsv
col_hu_seg_lab
col_hu_seg_rgb
col_hu_yiq
col_ngcm_rgb
col_sm_hsv
col_sm_lab
col_sm_rgb
col_sm_yiq
text_gabor
text_tamura
edgeDB
waveletDB

Features used
hist_phc_hsv
hist_phc_rgb
Hist_Grad_RGB
haar_RGB
haar_HSV
haar_rgb
haar_hmmd

11
Check list Feature Selection

Domain knowledge
Commensurate features
Normalize features between an appropriated range
Adaboost takes each feature independent so it is
not necessary to normalize them

12
Check list Feature Selection

Domain knowledge
Commensurate features
Interdependence of features
Prune of input variables
Asses features individually
Dirty data
Predictor linear predictor
Comparison
Stable solution

13
Feature construction and space dimensionality
reduction

Clustering
Correlation coefficient
Supervised feature selection
Filters

14
Check list Feature Selection

Domain knowledge
Commensurate features
Interdependence of features
Prune of input variables
Features with the same value for all samples
(variance0) were eliminated

From
Linear Features
3583 were selected

15
Check list Feature Selection

Domain knowledge
Commensurate features
Interdependence of features
Prune of input variables
Asses features individually
When there is no asses method, use Variable
Ranking method. In Adaboost this is not necessary

16
Variable Ranking

Preprocessing step
Independent of the choice of the predictor
Correlation criteria
It can only detect linear dependencies
Single variable classifiers

17
Variable Ranking

Noise reduction and better classification may be
obtained by adding variables that are presumable
redundant
Perfectly correlated variables are truly
redundant in the sense that no additional
information is gained by adding them. It doesnt
mean absence of variable complementarily
Two variables that are useless by themselves can
be useful together

18
Check list Feature Selection

Domain knowledge
Commensurate features
Interdependence of features
Prune of input variables
Asses features individually
Dirty data
Predictor linear predictor
Comparison
Stable solution

19
Check list Feature Selection

Domain knowledge
Commensurate features
Interdependence of features
Prune of input variables
Asses features individually
Dirty data
Predictor linear predictor
Comparison
Stable solution

20
Adaboost Algorithm

Given samples (x1,y1),,(xm,ym), where xi, yi
-1,1
Initialize ?1,i1/(2m), 1/(2l), for yi -1,1
For t1,,T
Normalize ?t,i ?t,i /(Sj ?t,j)
Train base learner ht,i using distribution ?i,j
Choose ht that minimize et with error ei
Update ?t1,i ?t,i ßt1-ei
Set ßt (et)/(1- et) and at log(1/ ßt)
Output the final classifier H(x) sign( St at
ht(x) )

21
Weak classifier

Each weak classifier hi is defined as follows
hi.pos_mean mean value for positive samples
hi.neg_mean mean value for negative sample
A sample is classified as
1 if it is closer to hi.pos_mean
-1 if it is closer to hi.neg_mean

22
Weak classifier

hi.pos_mean mean value for positive samples
hi.neg_mean mean value for negative sample
A Linear Classifier was used

23
Check list Feature Selection

Domain knowledge
Commensurate features
Interdependence of features
Prune of input variables
Asses features individually
Dirty data
Predictor linear predictor
Comparison
Stable solution

24
Adaboost experiments and results
10 positives
4 positives
25
Few positive samples
Use of 4 positive samples
26
More positive samples
Use of 10 positive samples
False Positive
27
Training data
Use of 10 positive samples
28
Changing number of Training Iterations
The number of iterations Used was from 5 to 50
Iterations 30 was set
29
Changing Sample Size
30
Few negative samples
Use of 15 negative samples
31
More negative samples
Use of 75 negative samples
32
Check list Feature Selection

Domain knowledge
Commensurate features
Interdependence of features
Prune of input variables
Asses features individually
Dirty data
Predictor linear predictor
Comparison (ideas, time, comp. resources,
examples)
Stable solution

33
Stable solution

For Adaboost is important to have a
representative sample
Chosen parameters
Positives samples 15
Negative samples 100
Iteration number 30

34
Stable solution with more samples and iterations
Dinosaurs
Roses
Buses
Horses
Elephants
Buildings
Food
Humans
Mountains
Beaches
35
Stable solution for Dinosaurs

Use of
15 Positive samples
100 Negative samples
30 Iterations

36
Stable solution for Roses

Use of
15 Positive samples
100 Negative samples
30 Iterations

37
Stable solution for Buses

Use of
15 Positive samples
100 Negative samples
30 Iterations

38
Stable solution for Beaches

Use of
15 Positive samples
100 Negative samples
30 Iterations

39
Stable solution for Food

Use of
15 Positive samples
100 Negative samples
30 Iterations

40
Unstable Solution
41
Unstable solution for Roses

Use of
5 Positive samples
10 Negative samples
30 Iterations

42
Best features for classification

Humans
Beaches
Buildings
Buses
Dinosaurs
Elephants
Roses
Horses
Mountains
Food

And the winner is

44
Feature frequency
45
Extensions

Searching similar images
Pairs of images are built
The difference for each feature is calculated
Each difference is classified as
1 if both images belong to the same class
-1 if both images belong to different classes
Multiclass adaboost

46
Extensions

Use of another weak classifier
Design weak classifier using multiple features
? classifier fusion
Use different weak classifier such as SVM, NN,
threshold function, etc.
Different feature selection method SVM

47
Discussion

Is important to add feature Selection for Image
retrieval
A good methodology for selecting features should
be used
Adaboost is a learning algorithm
? data dependent
It is important to have representative samples
Adaboost can help to improve the classification
potential of simple algorithms

Thank you !

Write a Comment

User Comments (0)

About PowerShow.com

Feature Selection for Image Retrieval PowerPoint PPT Presentation