Title: HSI Course
1Introduction to Hyperspectral Imaging HSI
Feature Extraction Methods
Dr. Richard B. Gomez, Instructor
2Outline
- What is Hyperspectral Image Data?
- Interpretation of Digital Image Data
- Pixel Classification
- HSI Data Processing Techniques
- Methods and Algorithms (Continued)
- Principal Component Analysis
- Unmixing Pixel Problem
- Spectral Mixing Analysis
- Other
- Feature Extraction Techniques
- N-dimensional Exploitation
- Cluster Analysis
3What is Hyperspectral Image Data?
- Hyperspectral image data is image data that is
- In digital form, i.e., a picture that a
computer can read, manipulate, store, and
display - Spatially quantized into picture elements
(pixels) - Radiometrically quantized into discrete
brightness levels - It can be in the form of Radiance, Apparent
Reflectance, True Reflectance, or Digital Number
4Difference Between Radiance and Reflectance
- Radiance is the variable directly measured by
remote sensing instruments - Radiance has units of watt/steradian/square
meter - Reflectance is the ratio of the amount of light
leaving a target to the amount of light striking
the target - Reflectance has no units
- Reflectance is a property of the material being
observed - Radiance depends on the illumination (both its
intensity and direction), the orientation and
position of the target, and the path of the light
through the atmosphere - Atmospheric effects and the solar illumination
can be compensated for in digital remote sensing
data. This yields something, which is called
"apparent reflectance," and it differs from true
reflectance in that shadows and directional
effects on reflectance have not been dealt with
5Interpretation of Digital Image Data
- Qualitative Approach Photointerpretation by a
human analyst/interpreter - On a scale large relative to pixel size
- Limited multispectral analysis
- Inaccurate area estimates
- Limited use of brightness levels
- Quantitative Approach Analysis by computer
- At individual pixel level
- Accurate area estimates possible
- Exploits all brightness levels
- Can perform true multidimensional analysis
6Data Space Representations
- Spectral Signatures - Physical Basis for Response
- N-Dimensional Space - For Use in Pattern Analysis
7 Hyperspectral Imaging Barriers
- Scene - The most complex and dynamic part
- Sensor - Also not under analysts control
- Processing System - Analysts choices
8Finding Optimal Feature Subspaces
HSI Data Analysis Scheme
- Discriminant Analysis Feature Extraction (DAFE)
- Decision Boundary Feature Extraction (DBFE)
Available in MultiSpec via WWW at
http//dynamo.ecn.purdue.edu/biehl/MultiSpec/
Additional documentation via WWW at
http//dynamo.ecn.purdue.edu/landgreb/publicatio
ns.html After David Landgrebe, Purdue
University
9 Dimension Space Reduction
10Pixel Classification
- Labeling the pixels as belonging to particular
spectral classes using the spectral data
available - The terms classification, allocation,
categorization, and labeling are generally used
synonymously - The two broad classes of classification
procedure are supervised classification and
unsupervised classification - Hybrid Supervised/Unsupervised Methods are
available
11Pixel Classification
12Pixel Classification
13Classification Techniques
- Unsupervised
- Supervised
- Hybrid
14Classification
15Classification (Cont)
16Classification (Cont)
17Classification (Cont)
18Classification (Cont)
19Classification (Cont)
20Classification (Cont)
21Classification (Cont)
22 Classifier Options
- Other types - Nonparametric
- Parzen Window Estimators
- Fuzzy Set - based
- Neural Network implementations
- K Nearest Neighbor - K-NN
- etc.
23Classification Algorithms
- Linear Spectral Unmixing (LSU)
- Generates maps of the fraction of each endmember
in a pixel - Orthogonal Subspace Projection (OSP)
- Suppresses background signatures and generates
fraction maps like the LSU algorithm - Spectral Angle Mapper (SAM)
- Treats a spectrum like a vector Finds angle
between spectra - Minimum Distance (MD)
- A simple Gaussian Maximum Likelihood algorithm
that does not use class probabilities - Binary Encoding (BE) and Spectral Signature
Matching (SSM) - Bit compare simple binary codes calculated from
spectra
24Unsupervised Classification
- K-MEANS
- Use of statistical techniques to group
n-dimensional data into their natural spectral
classes - The K-Means unsupervised classifier uses a
cluster analysis approach that requires the
analyst to select the number of clusters to be
located in the data, arbitrarily locates this
number of cluster centers, then iteratively
repositions them until optimal spectral
separability is achieved - ISODATA (Iterative Self-Organizing Data Analysis
Technique) - IsoData unsupervised classification calculates
class means evenly distributed in the data space
and then iteratively clusters the remaining
pixels using minimum distance techniques - Each iteration recalculates means and
reclassifies pixels with respect to the new means - This process continues until the number of
pixels in each class changes by less than the
selected pixel change threshold or the maximum
number of iterations is reached
25Iterative Self-Organizing Data Analysis Technique
(ISODATA)
- IsoData, unsupervised classification,
calculates class means evenly distributed in the
data space and then iteratively clusters the
remaining pixels using minimum distance
techniques - Each iteration recalculates means and
reclassifies pixels with respect to the new
means - This process continues until the number of
pixels in each class changes by less than the
selected pixel change threshold or the maximum
number of iterations is reached
26Supervised Classification
- Supervised classification requires that the
user select training areas for use as the basis
for classification - Various comparison methods are then used to
determine if a specific pixel qualifies as a
class member - A broad range of different classification
methods, such as Parallelepiped, Maximum
Likelihood, Minimum Distance, Mahalanobis
Distance, Binary Encoding, and Spectral Angle
Mapper can be used
27Parallelepiped
- Parallelepiped classification uses a simple
decision rule to classify multidimensional
spectral data - The decision boundaries form an n-dimensional
parallelepiped in the image data space - The dimensions of the parallelepiped are
defined based upon a standard deviation threshold
from the mean of each selected class
28Maximum Likelihood
- Maximum likelihood classification assumes that
the statistics for each class in each band are
normally distributed - The probability that a given pixel belongs to a
specific class is then calculated - Unless a probability threshold is selected, all
pixels are classified - Each pixel is assigned to the class that has
the highest probability (i.e., the "maximum
likelihood")
29Minimum Distance
- The minimum distance classification uses the
mean vectors of each region of interest (ROI) - It calculates the Euclidean distance from each
unknown pixel to the mean vector for each class - All pixels are classified to the closest ROI
class unless the user specifies standard
deviation or distance thresholds, in which case
some pixels may be unclassified if they do not
meet the selected criteria
30Euclidean Distance
31Mahalanobis Distance
- The Mahalanobis Distance classification is a
direction sensitive distance classifier that uses
statistics for each class - It is similar to the Maximum Likelihood
classification, but assumes all class covariances
are equal and, therefore, is a faster method - All pixels are classified to the closest ROI
class unless the user specifies a distance
threshold, in which case some pixels may be
unclassified if they do not meet the threshold
32Bhattacharyya Distance
Mean Difference Term
Covariance Term
33Binary Encoding Classification
- The binary encoding classification technique
encodes the data and endmember spectra into 0s
and 1s based on whether a band falls below or
above the spectrum mean - An exclusive OR function is used to compare
each encoded reference spectrum with the encoded
data spectra and a classification image produced - All pixels are classified to the endmember with
the greatest number of bands that match unless
the user specifies a minimum match threshold, in
which case some pixels may be unclassified if
they do not meet the criteria
34Spectral Angle Mapper Classification
- The Spectral Angle Mapper (SAM) is a
physically-based spectral classification that
uses the n-dimensional angle to match pixels to
reference spectra - The SAM algorithm determines the spectral
similarity between two spectra by calculating the
angle between the spectra, treating them as
vectors in a space with dimensionality equal to
the number of bands
35Spectral Angle Mapper (SAM) Classification
- The Spectral Angle Mapper (SAM) is a physically
based spectral classification that uses the
n-dimensional angle to match pixels to reference
spectra - The algorithm determines the spectral
similarity between two spectra by calculating the
angle between the spectra, treating them as
vectors in a space with dimensionality equal to
the number of bands - The SAM algorithm assumes that hyperspectral
image data have been reduced to "apparent
reflectance", with all dark current and path
radiance biases removed
36Spectral Angle Mapper (SAM) Algorithm
The SAM algorithm uses a reference spectra, r,
and the spectra found at each pixel, t. The
basic comparison algorithm to find the angle ?
is (where nb number of bands in the image)
OR
37Minimum Noise Fraction (MNF) Transformation
- The minimum noise fraction (MNF) transformation
is used to determine the inherent dimensionality
of image data, to segregate noise in the data,
and to reduce the computational requirements for
subsequent processing - The MNF transformation consists essentially of
two-cascaded Principal Components transformations - The first transformation, based on an estimated
noise covariance matrix, decorrelates and
rescales the noise in the data. This first step
results in transformed data in which the noise
has unit variance and no band-to-band
correlations - The second step is a standard Principal
Components transformation of the noise-whitened
data. - For further spectral processing, the inherent
dimensionality of the data is determined by
examination of the final eigenvalues and the
associated images - The data space can be divided into two parts
one part associated with large eigenvalues and
coherent eigenimages, and a complementary part
with near-unity eigenvalues and noise-dominated
images. By using only the coherent portions, the
noise is separated from the data, thus improving
spectral processing results.
38N - Dimensional Visualization
- Spectra can be thought of as points in an n
-dimensional scatterplot, where n is the number
of bands - The coordinates of the points in n -space
consist of "n" values that are simply the
spectral radiance or reflectance values in each
band for a given pixel - The distribution of these points in n - space
can be used to estimate the number of spectral
endmembers and their pure spectral signatures
39Pixel Purity Index (PPI)
- The "Pixel-Purity-Index" (PPI) is a means of
finding the most "spectrally pure," or extreme
pixels in multispectral and hyperspectral images - PPI is computed by repeatedly projecting
n-dimensional scatterplots onto a random unit
vector - The extreme pixels in each projection are
recorded and the total number of times each pixel
is marked as extreme is noted - A PPI image is created in which the DN of each
pixel corresponds to the number of times that
pixel was recorded as extreme
40Matched Filter Technique
- Matched filtering maximizes the response of a
known endmember and suppresses the response of
the composite unknown background, thus "matching"
the known signature - Provides a rapid means of detecting specific
minerals based on matches to specific library or
image endmember spectra - Produces images similar to the unmixing
technique, but with significantly less
computation - Results (values from 0 to 1), provide a means
of estimating relative degree of match to the
reference spectrum where 1 is a perfect match
41Spectral Mixing
- Natural surfaces are rarely composed of a
single uniform material - Spectral mixing occurs when materials with
different spectral properties are represented by
a single image pixel - Researchers who have investigated mixing scales
and linearity have found that, if the scale of
the mixing is large (macroscopic), mixing occurs
in a linear fashion - For microscopic or intimate mixtures, the
mixing is generally nonlinear
42Mixed Spectra Models
- Mixed spectra effects can be formalized in three
ways - A physical model
- A mathematical model
- A geometric model
43Mixed Spectra Physical Model
44Mixed Spectra Mathematical Model
45Mixed Spectra Geometric Model
46Mixture Tuned Matched Filtering (MTMF)
- MTMF constrains the Matched Filtering as
mixtures of the composite unknown background and
the known target - MTMF produces the standard Matched Filter score
images plus an additional set of images for each
endmember infeasibility images - The best match to a target is obtained when the
Matched Filter score is high (near 1) and the
infeasibility score is low (near 0)
47 Principal Component Analysis (PCA)
- Calculation of new transformed variables
(components) by a coordinate rotation - Components are uncorrelated and ordered by
decreasing variance - First component axis aligned in the direction
of the highest percentage of the total variance
in the data - Component axes are mutually orthogonal
- Maximum SNR and largest percentage of total
variance in the first component
48 Principal Component Analysis (PCA)
49 Principal Component Analysis (PCA) (Cont)
- The mean of the original data is the origin of
the transformed system with the transformed axes
of each component mutually orthogonal - To begin the transformation, the covariance
matrix, C, is found. Using the covariance matrix,
the eigenvalues, ?i, are obtained from
C ?iI 0 - where i 1,2,...,n (n is the total number of
original images and I is an identity matrix)
50 Principal Component Analysis (PCA) (Cont)
- The eigenvalues, ?i,, are equal to the variance
of each corresponding component image - The eigenvectors, ei , define the axes of the
components and are obtained from
(C ?iI) ei 0 - The principal components are then given as
- PC T DN
- where DN is the digital number matrix of the
original data and T is the (n x n) transformation
matrix with - matrix elements given by eij , i, j
1,2,3,...n
51A Matrix Equation
Problem Find the value of vector x from
measurement of a different vector y, where they
are related by the matrix equation given
by y Axor yi ?aijxj sum over j
Note1 If both A and x are known, it is
trivial to find y Note2 In our problem, y is
the measurement, and A is determined from the
physics of the problem, and we want to retrieve
the value of x from y
52Mean and Variance
- Mean ?x? (1/N)? xk
- Variance
- var(x) (1/N) ?(xk - ?x?)2 ?x2 where k
1,2,,N
53Covariance
cov(x,y) (1/N) ?(xk ? ?x?)(yk ? ?y?)
(1/N) ? xk yk ? ?x? ?y? Note1 cov(x,x)
var(x) Note2 If the mean values of x and y are
zero, then cov(x,y) (1/N) ? xk yk Note3 Sums
are over k 1,2,., N
54Covariance Matrix
- Let x (x1, x2, ,xn) be a random vector with
n components - The covariance matrix of x is defined to
be C ?(x ? ?)(x ? ?)T? - where ? (?1, ?2, ?k)T
- and ?k (1/N)?xmk
- Summation is over m 1,2,, N
55Gaussian Probability Distributions
- Many physical processes are well represented
withGuassian distributions given by - P(x) (1/?2??x)e(x?ltxgt)?2 /2 ?x ?2
- Given the mean and variance of a Guassian
random variable,it is possible to evaluate all of
the higher moments - The form of the Gaussian is analytically simple
-
56Normal (Gaussian) Distribution
57Scatterplots
58Spectral Signatures
Laboratory Data Two classes of vegetation
59Discrete (Feature) Space
60 Hughes Effect
G.F. Hughes, "On the mean accuracy of statistical
pattern recognizers," IEEE Trans. Inform.
Theory., Vol IT-14, pp. 55-63, 1968.
61 Higher Dimensional Space Implications
- High dimensional space is mostly empty. Data in
high dimensional space is mostly in a lower
dimensional structure.
- Normally distributed data will have a tendency to
concentrate in the tails Uniformly distributed
data will concentrate in the corners.
62 Higher Dimensional Space Geometry
- The number of labeled samples needed for
supervised classification increases rapidly with
dimensionality
In a specific instance, it has been shown that
the samples required for a linear classifier
increases linearly, as the square for a quadratic
classifier. It has been estimated that the number
increases exponentially for a non-parametric
classifier.
- For most high dimensional data sets, lower
dimensional linear projections tend to be normal
or a combination of normals.
63 HSI Data Analysis Scheme
After David Landgrebe, Purdue University
200 Dimensional Data
Class Conditional Feature Extraction
Feature Selection
Classifier/Analyzer
Class-Specific Information
64Define Desired Classes
HSI Image of Washington DC Mall
Training areas designated by polygons outlined in
white
After David Landgrebe, Purdue University
65 Thematic Map of Washington DC Mall
Legend
Operation CPU Time (sec.) Analyst Time Display
Image 18 Define Classes lt 20 min. Feature
Extraction 12 Reformat 67 Initial
Classification 34 Inspect and Mod. Training
5 min. Final Classification 33 Total 164 sec
2.7 min. 25 min.
Roofs Streets Grass Trees Paths Water Shadows
(No preprocessing involved)
After David Landgrebe, Purdue University
66 Hyperspectral Imaging Barriers
Scene - Varies from hour to hour and sq. km to
sq. km
Sensor - Spatial Resolution, Spectral bands, S/N
Processing System -
- Number of samples to define the classes
- Complexity of the Classifier
67 Operating Scenario
- Remote sensing by airborne or spaceborne
hyperspectral sensors - Finite flux reaching sensor causes
spatial-spectral resolution trade-off - Hyperspectral data has hundreds of bands of
spectral information - Spectrum characterization allows subpixel
analysis and material identification
68 Spectral Mixture Analysis
- Assumes reflectance from each pixel is caused by
- a linear mixture of subpixel materials
69 Mixed Pixels and Material Maps
Input Image
PURE
PURE
PURE
MIXED
70 Traditional Linear Unmixing
i 1 k
- Unconstrained
- Partially Constrained
- Fully Constrained
71 Hierarchical Linear Unmixing Method
- Unmixes broad material classes first
- Proceeds to a groups constituents only if the
unmixed fraction is greater than a given threshold
72 Stepwise Unmixing Method
- Employs linear unmixing to find fractions
- Uses iterative regressions to accept only the
endmembers that improve a statistics-based model - Shown to be superior to classic linear method
- Has better accuracy
- Can handle more endmembers
- Quantitatively tested only on synthetic data
73 Performance Evaluation
Error Metric
- Compare squared error from traditional, stepwise
and hierarchical methods - Visually assess fraction maps for accuracy
74 Endmember Selection
- Endmembers are simply material types
- Broad classification road, grass, trees
- Fine classification dry soil, moist soil...
- Use image-derived endmembers to produce spectral
library - Average reference spectra from pure sample
pixels - Chose specific number of distinct endmembers
75 Materials Hierarchy
- Grouped similar materials into 3-level hierarchy
76 Squared Error Results
77 Stepwise Unmixing Comparisons
- Linear unmixing does poorly, forcing fractions
for all materials - Hierarchical approach performs better but
requires extensive user involvement - Stepwise routine succeeds using adaptive
endmember selection without extra preparation
78 HSI Image of Washington DC Mall
HYDICE Airborne System 1208 Scan Lines, 307
Pixels/Scan Line 210 Spectral Bands in 0.4-2.4
µm Region 155 Megabytes of Data (Not yet
Geometrically Corrected)
79 Hyperspectral Imaging Potential
- Assume 10 bit data in a 100 dimensional space
- That is (1024)100 10300 discrete locations
- Even for a data set of 106 pixels, the
probability
of any two pixels lying in the same discrete
location
is extremely small