Title: Fuzzy Entropy based feature selection for classification of hyperspectral data
1Fuzzy Entropy based feature selection for
classification of hyperspectral data Mahesh
Pal Department of Civil Engineering National
Institute of Technology Kurukshetra
2- Hyperspectral data
- Measurement of radiation in the visible to the
infrared spectral region in many finely spaced
spectral wavebands. - Provide greater detail on the spectral variation
of targets than conventional multispectral
systems. - The availability of large amounts of data
represents a challenge to classification
analyses. - Each spectral waveband used in the classification
process should add an independent set of
information. However, features are highly
correlated, suggesting a degree of redundancy in
the available information which can have a
negative impact on classification accuracy.
3An example MULTISPECTRAL DATA Discrete
wave-bands for example Landsat 7 Band 1-
0.45-0.515 µm Band2- 0.525-0.605 µm Between 0.45
-2.235 µm - A total of six bands HYPERSPECTRAL
DATA DAIS data Between 0.502-2.395 µm - A
total of 72 bands Continuous bands at 10-45 nm
bandwidth 0.4-0.7 µm visible, 0.7-1.3 µm-
NIR, 1.0-3.0 µm-MIR, 3-100 µm- Thermal
4- Various approaches could be adopted for the
appropriate classification of high dimensional
data - Adoption of a classifier that is relatively
insensitive to the Hughes effect (Vapnik, 1995). - Using a methods to effectively increase training
set size i.e. semi-supervised classification (Chi
and Bruzzone, 2005) and use of unlabelled data
(Shahshahani and D. A. Landgrebe, 1994) - Use of some form of dimensionality reduction
procedure prior to the classification analysis.
5Feature reduction
- Two broad categories are feature selection and
feature extraction. - Feature reduction may speed-up the classification
process by reducing data set size. - May increase the predictive accuracy.
- May increase the ability to understand the
classification rules. - feature selection select a subset of the original
features those maintains the useful information
to separate the classes by removing redundant
features.
6Feature selection
Three approaches of feature selection
are Filters uses a search algorithm to search
through the space of possible features and
evaluate each feature by using a filter such as
correlation and mutual information Wrappers
uses a search algorithm to search through the
space of possible features and evaluate each
subset by using a classification algorithm.
Embedded some classification processes such as
random forest produce a ranked list of features
during classification. This study aims to explore
the usefulness of four filter based feature
selection approaches.
7Feature selection approaches
- Four filter based feature selection approaches
were used. - Entropy
- Fuzzy entropy
- Signal-to-noise ratio
- RELIEF
8Entropy and Fuzzy Entropy
For a finite set
,if P is the probability distribution on X,
Yagers entropy is defined by
For a given fuzzy information system defined by
(U, A, V, f), where U is a finite set of objects
(Hu and Yu, 2005), A is set of features i.e. If
Q is a subset of attribute set A, and is the
fuzzy relation matrix by an indiscernibility
relation The significance of a is defined as ,
Significance If significance
, attribute a is considered
redundant. Further details of this algorithms can
be found in Hu and Yu (2005).
9Signal to noise ratio
This approach rank all features in order to
define how well a feature discriminates between
two classes. In order to use this approach for
multiclass classification problem, one against
one approach was used in this study.
10RELIEF
- The general idea of RELIEF is to choose the
features that can be most distinguished between
classes. - At each step of an iterative process, an instance
is chosen at random from the dataset and the
weight for each feature is updated according to
the distance of this instance to its Near-miss
and Near-hit (Kira and Rendell, 1992). - An instance from the dataset will be a near-hit
to X, if it belongs to the close neighbourhood of
X and belongs to the same class as that of X. - An instance would be called a near-miss if
belongs to the neighbourhood of X but not to the
same class as that of X.
11Data Set
- DAIS 7915 sensor by German Space Agency flown on
29 June 2000. - The sensor acquire information in 79-bands at a
spatial resolution of 5m in the wavelength range
of 0.50212.278 µm. - 7 features located in the mid- and thermal
infrared region and 7 features from spectral
region of 0.502 2.395 µm due to striping noise
were removed. - An area of 512 pixels by 512 pixels and 65
features covering the test site was used.
12(No Transcript)
13Training and test data
- Random sampling was used to collect train and
test using a ground reference image. - Eight land cover classes i.e. wheat, water, salt
lake, hydrophytic vegetation, vineyards, bare
soil, pasture and built-up land. - A total of 800 training pixels and a total of
3800 test pixels was used.
14- Classification Method
- Support vector machines using one against one
approach for multiclass data was used. - Radial basis function kernel was used.
- Regularisation parameter (C) 5000 and Gamma 2
was used. - In all feature selection approach classification
accuracy with test dataset was obtained. - Test for non-inferiority using McNemar test was
used.
15Selected features with different feature
selection approaches
Feature selection method Selected feature
Entropy 32, 51, 63, 35, 8, 49, 42, 27, 48, 64, 6, 50, 65, 11, 53, 39, 22
Fuzzy entropy 32, 41, 50, 6, 27, 63, 36, 49, 10, 22, 65, 51, 40, 48
Relief 3, 4, 2, 11, 10, 5, 8, 6, 9, 7, 12, 1, 13, 23, 22, 25, 24, 20, 31, 30
Signal to noise ratio 5, 7, 8, 9, 6, 10, 11, 4, 12, 3, 32, 31, 33, 30, 24, 23, 25, 29, 13, 26
16Classification accuracy with SVM classifier with
different selected features
Feature selection method Number of features used in classification Classification accuracy ()
No feature selection 65 91.76
Fuzzy entropy 14 91.68
Entropy 17 91.61
Signal to noise ratio 20 91.68
Relief 20 88.61
17Difference and non-inferiority test results based
on 95 confidence interval on the estimated
difference in accuracy from the accuracy achieved
with 65 features and the feature sets selected
using different approach.
Number of features Accuracy () Difference in accuracy () 95 confidence interval Conclusion (at 0.05 level of significance)
65 91.76 0.00 0.000-0.000 -
14 91.68 0.36 0.071-0.089 Non-inferior
17 91.61 0.13 0.142-0.158 Non-inferior
20 91.68 0.26 0.071-0.089 Non-inferior
20 88.61 3.00 3.140-3.160 Inferior
18Conclusions
- Fuzzy entropy based feature selection approach
works well with this dataset and provides
comparable performance with small number of
selected features. - Accuracy achieved by signal to noise ratio and
entropy based approaches is also comparable to
that is achieved with full dataset but require
more number of selected features than fuzzy
entropy based approach. - Results with Relief based approach show a
significant decline in classification accuracy in
comparison to full dataset.