Title: Feature Selection for Image Retrieval
 1Feature Selection for Image Retrieval
- By Karina Zapién Arreola 
 - January 21th, 2005
 
  2Introduction
- Variable and feature selection have become the 
focus of much research in areas of applications 
for datasets with many variables are available  - Text processing 
 - Gene expression 
 - Combinatorial chemistry
 
  3Motivation
- The objective of feature selection is three-fold 
 - Improving the prediction performance of the 
predictors  - Providing a faster and more cost-effective 
predictors  - Providing a better understanding of the 
underlying process that generated the data 
  4Why use feature selection in CBIR
- Different users may need different features for 
image retrieval  - From each selected sample, a specific feature set 
can be chosen 
  5Boosting
- Method for improving the accuracy of any learning 
algorithm  - Use of weak algorithms for single rules 
 - Weighting of the weak algorithms 
 - Combination of weak rules into a strong learning 
algorithm 
  6Adaboost Algorithm
- Is a iterative boosting algorithm 
 - Notation 
 - Samples (x1,y1),,(xm,ym), where, yi -1,1 
 - There are m positive samples, and l negative 
samples  - Weak classifiers hi 
 - For iteration t, the error is defined as 
 - et  min (½)Si ?i hi(xi)  yi 
 - where ?i is a weight for xi.
 
  7Adaboost Algorithm
- Given samples (x1,y1),,(xm,ym), where yi  -1,1 
 - Initialize ?1,i1/(2m), 1/(2l), for yi  1,-1 
 - For t1,,T 
 - Normalize ?t,i  ?t,i /(Sj ?t,j) 
 - Train base learner ht,i using distribution ?i,j 
 - Choose ht that minimize et with error ei 
 - Update ?t1,i  ?t,i ßt1-ei 
 - Set ßt  (et)/(1- et) and at  log(1/ ßt) 
 - Output the final classifier H(x)  sign( St at 
ht(x) ) 
  8Adaboost Application
- Searching similar groups 
 - A particular image class is chosen 
 - A positive sample of this group is given randomly 
 - A negative sample of the rest of the images is 
given randomly 
  9Check list Feature Selection
- Domain knowledge 
 - Commensurate features 
 - Interdependence of features 
 - Prune of input variables 
 - Asses features individually 
 - Dirty data 
 - Predictor  linear predictor 
 - Comparison 
 - Stable solution
 
  10Domain knowledge
- Features used 
 - colordb_sumRGB_entropy_d1 
 - col_gpd_hsv 
 - col_gpd_lab 
 - col_gpd_rgb 
 - col_hu_hsv2 
 - col_hu_lab2 
 - col_hu_lab 
 - col_hu_rgb2 
 - col_hu_rgb 
 - col_hu_seg2_hsv 
 - col_hu_seg2_lab 
 - col_hu_seg2_rgb
 
- Features used 
 - col_hu_seg_hsv 
 - col_hu_seg_lab 
 - col_hu_seg_rgb 
 - col_hu_yiq 
 - col_ngcm_rgb 
 - col_sm_hsv 
 - col_sm_lab 
 - col_sm_rgb 
 - col_sm_yiq 
 - text_gabor 
 - text_tamura 
 - edgeDB 
 - waveletDB
 
- Features used 
 - hist_phc_hsv 
 - hist_phc_rgb 
 - Hist_Grad_RGB 
 - haar_RGB 
 - haar_HSV 
 - haar_rgb 
 - haar_hmmd
 
  11Check list Feature Selection
- Domain knowledge 
 - Commensurate features 
 - Normalize features between an appropriated range 
 - Adaboost takes each feature independent so it is 
not necessary to normalize them 
  12Check list Feature Selection
- Domain knowledge 
 - Commensurate features 
 - Interdependence of features 
 - Prune of input variables 
 - Asses features individually 
 - Dirty data 
 - Predictor  linear predictor 
 - Comparison 
 - Stable solution
 
  13Feature construction and space dimensionality 
reduction
- Clustering 
 - Correlation coefficient 
 - Supervised feature selection 
 - Filters
 
  14Check list Feature Selection
- Domain knowledge 
 - Commensurate features 
 - Interdependence of features 
 - Prune of input variables 
 - Features with the same value for all samples 
(variance0) were eliminated 
- From 
 -  Linear Features 
 - 3583 were selected
 
  15Check list Feature Selection
- Domain knowledge 
 - Commensurate features 
 - Interdependence of features 
 - Prune of input variables 
 - Asses features individually 
 - When there is no asses method, use Variable 
Ranking method. In Adaboost this is not necessary 
  16Variable Ranking
- Preprocessing step 
 - Independent of the choice of the predictor 
 - Correlation criteria 
 - It can only detect linear dependencies 
 - Single variable classifiers
 
  17Variable Ranking
- Noise reduction and better classification may be 
obtained by adding variables that are presumable 
redundant  - Perfectly correlated variables are truly 
redundant in the sense that no additional 
information is gained by adding them. It doesnt 
mean absence of variable complementarily  - Two variables that are useless by themselves can 
be useful together 
  18Check list Feature Selection
- Domain knowledge 
 - Commensurate features 
 - Interdependence of features 
 - Prune of input variables 
 - Asses features individually 
 - Dirty data 
 - Predictor  linear predictor 
 - Comparison 
 - Stable solution
 
  19Check list Feature Selection
- Domain knowledge 
 - Commensurate features 
 - Interdependence of features 
 - Prune of input variables 
 - Asses features individually 
 - Dirty data 
 - Predictor  linear predictor 
 - Comparison 
 - Stable solution
 
  20Adaboost Algorithm
- Given samples (x1,y1),,(xm,ym), where xi, yi 
-1,1  - Initialize ?1,i1/(2m), 1/(2l), for yi  -1,1 
 - For t1,,T 
 - Normalize ?t,i  ?t,i /(Sj ?t,j) 
 - Train base learner ht,i using distribution ?i,j 
 - Choose ht that minimize et with error ei 
 - Update ?t1,i  ?t,i ßt1-ei 
 - Set ßt  (et)/(1- et) and at  log(1/ ßt) 
 - Output the final classifier H(x)  sign( St at 
ht(x) ) 
  21Weak classifier
- Each weak classifier hi is defined as follows 
 - hi.pos_mean  mean value for positive samples 
 - hi.neg_mean  mean value for negative sample 
 - A sample is classified as 
 - 1 if it is closer to hi.pos_mean 
 -  -1 if it is closer to hi.neg_mean 
 
  22Weak classifier
- hi.pos_mean  mean value for positive samples 
 - hi.neg_mean  mean value for negative sample 
 - A Linear Classifier was used
 
  23Check list Feature Selection
- Domain knowledge 
 - Commensurate features 
 - Interdependence of features 
 - Prune of input variables 
 - Asses features individually 
 - Dirty data 
 - Predictor  linear predictor 
 - Comparison 
 - Stable solution
 
  24Adaboost experiments and results 
10 positives
 4 positives 
 25Few positive samples
Use of 4 positive samples 
 26More positive samples
Use of 10 positive samples
False Positive 
 27Training data
Use of 10 positive samples 
 28Changing number of Training Iterations
The number of iterations Used was from 5 to 50
Iterations  30 was set 
 29Changing Sample Size 
 30Few negative samples
Use of 15 negative samples 
 31More negative samples
Use of 75 negative samples 
 32Check list Feature Selection
- Domain knowledge 
 - Commensurate features 
 - Interdependence of features 
 - Prune of input variables 
 - Asses features individually 
 - Dirty data 
 - Predictor  linear predictor 
 - Comparison (ideas, time, comp. resources, 
examples)  - Stable solution
 
  33Stable solution
- For Adaboost is important to have a 
representative sample  - Chosen parameters 
 - Positives samples 15 
 - Negative samples 100 
 - Iteration number 30
 
  34Stable solution with more samples and iterations
Dinosaurs
Roses
Buses
Horses
Elephants
Buildings
Food
Humans
Mountains
Beaches 
 35Stable solution for Dinosaurs
- Use of 
 - 15 Positive samples 
 - 100 Negative samples 
 - 30 Iterations
 
  36Stable solution for Roses
- Use of 
 - 15 Positive samples 
 - 100 Negative samples 
 - 30 Iterations
 
  37Stable solution for Buses
- Use of 
 - 15 Positive samples 
 - 100 Negative samples 
 - 30 Iterations
 
  38Stable solution for Beaches
- Use of 
 - 15 Positive samples 
 - 100 Negative samples 
 - 30 Iterations
 
  39Stable solution for Food
- Use of 
 - 15 Positive samples 
 - 100 Negative samples 
 - 30 Iterations
 
  40Unstable Solution 
 41Unstable solution for Roses
- Use of 
 - 5 Positive samples 
 - 10 Negative samples 
 - 30 Iterations
 
  42Best features for classification
- Humans 
 - Beaches 
 - Buildings 
 - Buses 
 - Dinosaurs 
 - Elephants 
 - Roses 
 - Horses 
 - Mountains 
 - Food
 
  43  44Feature frequency 
 45Extensions
- Searching similar images 
 - Pairs of images are built 
 - The difference for each feature is calculated 
 - Each difference is classified as 
 -  1 if both images belong to the same class 
 - -1 if both images belong to different classes 
 - Multiclass adaboost
 
  46Extensions
- Use of another weak classifier 
 - Design weak classifier using multiple features 
 -  ? classifier fusion 
 - Use different weak classifier such as SVM, NN, 
threshold function, etc.  - Different feature selection method SVM
 
  47Discussion
- Is important to add feature Selection for Image 
retrieval  - A good methodology for selecting features should 
be used  - Adaboost is a learning algorithm 
 -  ? data dependent 
 - It is important to have representative samples 
 - Adaboost can help to improve the classification 
potential of simple algorithms  
  48