Title: Spectral Sensing 2
1Introduction to Hyperspectral Imaging HSI
Feature Extraction Methods
Dr. Richard B. Gomez Center for Earth Observing
and Space Research George Mason University
2Outline
- What is Hyperspectral Image Data?
- Interpretation of Digital Image Data
- Pixel Classification
- HSI Data Processing Techniques
- Methods and Algorithms (Continued)
- Principal Component Analysis
- Unmixing Pixel Problem
- Spectral Mixing Analysis
- Other
- Feature Extraction Techniques
- N-dimensional Exploitation
- Cluster Analysis
3What is Hyperspectral Image Data?
- Hyperspectral image data is image data that is
- In digital form, i.e., a picture that a
computer can read, manipulate, store, and
display - Spatially quantized into picture elements
(pixels) - Radiometrically quantized into discrete
brightness levels - It can be in the form of Radiance, Apparent
Reflectance, True Reflectance, or Digital Number
4Difference Between Radiance and Reflectance
- Radiance is the variable directly measured by
remote sensing instruments - Radiance has units of watt/steradian/square
meter - Reflectance is the ratio of the amount of light
leaving a target to the amount of light striking
the target - Reflectance has no units
- Reflectance is a property of the material being
observed - Radiance depends on the illumination (both its
intensity and direction), the orientation and
position of the target, and the path of the light
through the atmosphere - Atmospheric effects and the solar illumination
can be compensated for in digital remote sensing
data. This yields something, which is called
"apparent reflectance," and it differs from true
reflectance in that shadows and directional
effects on reflectance have not been dealt with
5Interpretation of Digital Image Data
- Qualitative Approach Photointerpretation by a
human analyst/interpreter - On a scale large relative to pixel size
- Limited multispectral analysis
- Inaccurate area estimates
- Limited use of brightness levels
- Quantitative Approach Analysis by computer
- At individual pixel level
- Accurate area estimates possible
- Exploits all brightness levels
- Can perform true multidimensional analysis
6Data Space Representations
- Spectral Signatures - Physical Basis for Response
- Discrete Space - For Use in Pattern Analysis
7 Hyperspectral Imaging Ancillary Input
Possibilities
- From the Ground
- Of the Ground
- Previously Gather Spectra
8 Hyperspectral Imaging Barriers
- Scene - The most complex and dynamic part
- Sensor - Also not under analysts control
- Processing System - Analysts choices
9Finding Optimal Feature Subspaces
HSI Data Analysis Scheme
- Discriminant Analysis Feature Extraction (DAFE)
- Decision Boundary Feature Extraction (DBFE)
Available in MultiSpec via WWW at
http//dynamo.ecn.purdue.edu/biehl/MultiSpec/
Additional documentation via WWW at
http//dynamo.ecn.purdue.edu/landgreb/publicatio
ns.html After David Landgrebe, Purdue
University
10 Dimension Space Reduction
11Pixel Classification
- Labeling the pixels as belonging to particular
spectral classes using the spectral data
available - The terms classification, allocation,
categorization, and labeling are generally used
synonymously - The two broad classes of classification
procedure are supervised classification and
unsupervised classification - Hybrid Supervised/Unsupervised Methods are
available
12Pixel Classification
13Pixel Classification
14Classification Techniques
- Unsupervised
- Supervised
- Hybrid
15Classification
16Classification (Cont)
17Classification (Cont)
18Classification (Cont)
19Classification (Cont)
20Classification (Cont)
21Classification (Cont)
Classification (Cont)
22Classification (Cont)
23Data Class Representations
24 Classifier Options
- Other types - Nonparametric
- Parzen Window Estimators
- Fuzzy Set - based
- Neural Network implementations
- K Nearest Neighbor - K-NN
- etc.
25Classification Algorithms
- Linear Spectral Unmixing (LSU)
- Generates maps of the fraction of each endmember
in a pixel - Orthogonal Subspace Projection (OSP)
- Suppresses background signatures and generates
fraction maps like the LSU algorithm - Spectral Angle Mapper (SAM)
- Treats a spectrum like a vector Finds angle
between spectra - Minimum Distance (MD)
- A simple Gaussian Maximum Likelihood algorithm
that does not use class probabilities - Binary Encoding (BE) and Spectral Signature
Matching (SSM) - Bit compare simple binary codes calculated from
spectra
26Unsupervised Classification
- K-MEANS
- ISODATA (Iterative Self-Organizing Data
Analysis Technique)
27 K-MEANS
- Use of statistical techniques to group
n-dimensional data into their natural spectral
classes - The K-Means unsupervised classifier uses a
cluster analysis approach that requires the
analyst to select the number of clusters to be
located in the data, arbitrarily locates this
number of cluster centers, then iteratively
repositions them until optimal spectral
separability is achieved
28 ISODATA Iterative Self-Organizing Data Analysis
Technique
- Unsupervised classification, calculates class
means evenly distributed in the data space and
then iteratively clusters the remaining pixels
using minimum distance techniques - Each iteration recalculates means and
reclassifies pixels with respect to the new
means - This process continues until the number of
pixels in each class changes by less than the
selected pixel change threshold or the maximum
number of iterations is reached
29Supervised Classification
- Supervised classification requires that the
user select training areas for use as the basis
for classification - Various comparison methods are then used to
determine if a specific pixel qualifies as a
class member - A broad range of different classification
methods, such as Parallelepiped, Maximum
Likelihood, Minimum Distance, Mahalanobis
Distance, Binary Encoding, and Spectral Angle
Mapper can be used
30Parallelepiped
- Parallelepiped classification uses a simple
decision rule to classify multidimensional
spectral data - The decision boundaries form an n-dimensional
parallelepiped in the image data space - The dimensions of the parallelepiped are
defined based upon a standard deviation threshold
from the mean of each selected class
31Maximum Likelihood
- Maximum likelihood classification assumes that
the statistics for each class in each band are
normally distributed - The probability that a given pixel belongs to a
specific class is then calculated - Unless a probability threshold is selected, all
pixels are classified - Each pixel is assigned to the class that has
the highest probability (i.e., the "maximum
likelihood")
32Minimum Distance
- The minimum distance classification uses the
mean vectors of each region of interest (ROI) - It calculates the Euclidean distance from each
unknown pixel to the mean vector for each class - All pixels are classified to the closest ROI
class unless the user specifies standard
deviation or distance thresholds, in which case
some pixels may be unclassified if they do not
meet the selected criteria
33Euclidean Distance
34Mahalanobis Distance
- The Mahalanobis Distance classification is a
direction sensitive distance classifier that uses
statistics for each class - It is similar to the Maximum Likelihood
classification, but assumes all class covariances
are equal and, therefore, is a faster method - All pixels are classified to the closest ROI
class unless the user specifies a distance
threshold, in which case some pixels may be
unclassified if they do not meet the threshold
35Bhattacharyya Distance
Mean Difference Term
Covariance Term
36Binary Encoding Classification
- The binary encoding classification technique
encodes the data and endmember spectra into 0s
and 1s based on whether a band falls below or
above the spectrum mean - An exclusive OR function is used to compare
each encoded reference spectrum with the encoded
data spectra and a classification image produced - All pixels are classified to the endmember with
the greatest number of bands that match unless
the user specifies a minimum match threshold, in
which case some pixels may be unclassified if
they do not meet the criteria
37Spectral Angle Mapper Classification
- The Spectral Angle Mapper (SAM) is a
physically-based spectral classification that
uses the n-dimensional angle to match pixels to
reference spectra - The SAM algorithm determines the spectral
similarity between two spectra by calculating the
angle between the spectra, treating them as
vectors in a space with dimensionality equal to
the number of bands
38Spectral Angle Mapper (SAM) Classification
- The Spectral Angle Mapper (SAM) is a physically
based spectral classification that uses the
n-dimensional angle to match pixels to reference
spectra - The algorithm determines the spectral
similarity between two spectra by calculating the
angle between the spectra, treating them as
vectors in a space with dimensionality equal to
the number of bands - The SAM algorithm assumes that hyperspectral
image data have been reduced to "apparent
reflectance", with all dark current and path
radiance biases removed
39Spectral Angle Mapper (SAM) Algorithm
The SAM algorithm uses a reference spectra, r,
and the spectra found at each pixel, t. The
basic comparison algorithm to find the angle ?
is (where nb number of bands in the image)
OR
40Minimum Noise Fraction (MNF) Transformation
- The minimum noise fraction (MNF) transformation
is used to determine the inherent dimensionality
of image data, to segregate noise in the data,
and to reduce the computational requirements for
subsequent processing - The MNF transformation consists essentially of
two-cascaded Principal Components transformations - The first transformation, based on an estimated
noise covariance matrix, decorrelates and
rescales the noise in the data. This first step
results in transformed data in which the noise
has unit variance and no band-to-band
correlations - The second step is a standard Principal
Components transformation of the noise-whitened
data. - For further spectral processing, the inherent
dimensionality of the data is determined by
examination of the final eigenvalues and the
associated images - The data space can be divided into two parts
one part associated with large eigenvalues and
coherent eigenimages, and a complementary part
with near-unity eigenvalues and noise-dominated
images. By using only the coherent portions, the
noise is separated from the data, thus improving
spectral processing results.
41N - Dimensional Visualization
- Spectra can be thought of as points in an n
-dimensional scatterplot, where n is the number
of bands - The coordinates of the points in n -space
consist of "n" values that are simply the
spectral radiance or reflectance values in each
band for a given pixel - The distribution of these points in n - space
can be used to estimate the number of spectral
endmembers and their pure spectral signatures
42Pixel Purity Index (PPI)
- The "Pixel-Purity-Index" (PPI) is a means of
finding the most "spectrally pure," or extreme
pixels in multispectral and hyperspectral images - PPI is computed by repeatedly projecting
n-dimensional scatterplots onto a random unit
vector - The extreme pixels in each projection are
recorded and the total number of times each pixel
is marked as extreme is noted - A PPI image is created in which the DN of each
pixel corresponds to the number of times that
pixel was recorded as extreme
43Matched Filter Technique
- Matched filtering maximizes the response of a
known endmember and suppresses the response of
the composite unknown background, thus "matching"
the known signature - Provides a rapid means of detecting specific
minerals based on matches to specific library or
image endmember spectra - Produces images similar to the unmixing
technique, but with significantly less
computation - Results (values from 0 to 1), provide a means
of estimating relative degree of match to the
reference spectrum where 1 is a perfect match
44 Classification Errors Examples
After Landgrebe, Purdue University
45Spectral Mixing
- Natural surfaces are rarely composed of a
single uniform material - Spectral mixing occurs when materials with
different spectral properties are represented by
a single image pixel - Researchers who have investigated mixing scales
and linearity have found that, if the scale of
the mixing is large (macroscopic), mixing occurs
in a linear fashion - For microscopic or intimate mixtures, the
mixing is generally nonlinear
46Mixed Spectra Models
- Mixed spectra effects can be formalized in three
ways - A physical model
- A mathematical model
- A geometric model
47Mixed Spectra Physical Model
48Mixed Spectra Mathematical Model
49Mixed Spectra Geometric Model
50Mixture Tuned Matched Filtering (MTMF)
- MTMF constrains the Matched Filtering as
mixtures of the composite unknown background and
the known target - MTMF produces the standard Matched Filter score
images plus an additional set of images for each
endmember infeasibility images - The best match to a target is obtained when the
Matched Filter score is high (near 1) and the
infeasibility score is low (near 0)
51 Principal Component Analysis (PCA)
- Calculation of new transformed variables
(components) by a coordinate rotation - Components are uncorrelated and ordered by
decreasing variance - First component axis aligned in the direction
of the highest percentage of the total variance
in the data - Component axes are mutually orthogonal
- Maximum SNR and largest percentage of total
variance in the first component
52 Principal Component Analysis (PCA)
53 Principal Component Analysis (PCA) (Cont)
- The mean of the original data is the origin of
the transformed system with the transformed axes
of each component mutually orthogonal - To begin the transformation, the covariance
matrix, C, is found. Using the covariance matrix,
the eigenvalues, ?i, are obtained from
C ?iI 0 - where i 1,2,...,n (n is the total number of
original images and I is an identity matrix)
54 Principal Component Analysis (PCA) (Cont)
- The eigenvalues, ?i,, are equal to the variance
of each corresponding component image - The eigenvectors, ei , define the axes of the
components and are obtained from
(C ?iI) ei 0 - The principal components are then given as
- PC T DN
- where DN is the digital number matrix of the
original data and T is the (n x n) transformation
matrix with - matrix elements given by eij , i, j
1,2,3,...n
55A Matrix Equation
Problem Find the value of vector x from
measurement of a different vector y, where they
are related by the matrix equation given
by y Axor yi ?aijxj sum over j
Note1 If both A and x are known, it is
trivial to find y Note2 In our problem, y is
the measurement, and A is determined from the
physics of the problem, and we want to retrieve
the value of x from y
56Mean and Variance
- Mean ?x? (1/N)? xk
- Variance
- var(x) (1/N) ?(xk - ?x?)2 ?x2 where k
1,2,,N
57Covariance
cov(x,y) (1/N) ?(xk ? ?x?)(yk ? ?y?)
(1/N) ? xk yk ? ?x? ?y? Note1 cov(x,x)
var(x) Note2 If the mean values of x and y are
zero, then cov(x,y) (1/N) ? xk yk Note3 Sums
are over k 1,2,., N
58Covariance Matrix
- Let x (x1, x2, ,xn) be a random vector with
n components - The covariance matrix of x is defined to
be C ?(x ? ?)(x ? ?)T? - where ? (?1, ?2, ?k)T
- and ?k (1/N)?xmk
- Summation is over m 1,2,, N
59Gaussian Probability Distributions
- Many physical processes are well represented
withGuassian distributions given by - P(x) (1/?2??x)e(x?ltxgt)?2 /2 ?x ?2
- Given the mean and variance of a Guassian
random variable, it is possible to evaluate all
of the higher moments - The form of the Gaussian is analytically simple
60Normal (Gaussian) Distribution
61Scatterplots
62Scatterplots Properties
- Shape
- Position
- Size
- Density
63Spectral Signatures
Laboratory Data Two classes of vegetation
64Discrete (Feature) Space
65 Hughes Effect
G.F. Hughes, "On the mean accuracy of statistical
pattern recognizers," IEEE Trans. Inform.
Theory., Vol IT-14, pp. 55-63, 1968.
66 Higher Dimensional Space Implications 1
- High dimensional space is mostly empty. Data in
high dimensional space is mostly in a lower
dimensional structure.
- Normally distributed data will have a tendency to
concentrate in the tails Uniformly distributed
data will concentrate in the corners.
67 Higher Dimensional Space Implications 2
68 Higher Dimensional Space Geometry
The diagonals in high dimensional spaces
become nearly orthogonal to all coordinate axes
Implication The projection of any cluster onto
any diagonal, e.g., by averaging features could
destroy information
69 Higher Dimensional Space Geometry (Cont)
- The number of labeled samples needed for
supervised classification increases rapidly with
dimensionality
In a specific instance, it has been shown that
the samples required for a linear classifier
increases linearly, as the square for a quadratic
classifier. It has been estimated that the number
increases exponentially for a non-parametric
classifier.
- For most high dimensional data sets, lower
dimensional linear projections tend to be normal
or a combination of normals.
70 HSI Data Analysis Scheme
200 Dimensional Data
Class Conditional Feature Extraction
Feature Selection
Classifier/Analyzer
Class-Specific Information
After David Landgrebe, Purdue University
71Define Desired Classes
HSI Image of Washington DC Mall
Training areas designated by polygons outlined in
white
72 Thematic Map of Washington DC Mall
Legend
Operation CPU Time (sec.) Analyst Time Display
Image 18 Define Classes lt 20 min. Feature
Extraction 12 Reformat 67 Initial
Classification 34 Inspect and Mod. Training
5 min. Final Classification 33 Total 164 sec
2.7 min. 25 min.
Roofs Streets Grass Trees Paths Water Shadows
(No preprocessing involved)
73 Hyperspectral Imaging Barriers (Cont)
Scene - Varies from hour to hour and sq. km to
sq. km
Sensor - Spatial Resolution, Spectral bands, S/N
Processing System -
- Number of samples to define the classes
- Complexity of the Classifier
74 Operating Scenario
- Remote sensing by airborne or spaceborne
hyperspectral sensors - Finite flux reaching sensor causes
spatial-spectral resolution trade-off - Hyperspectral data has hundreds of bands of
spectral information - Spectrum characterization allows subpixel
analysis and material identification
75 Spectral Mixture Analysis
- Assumes reflectance from each pixel is caused by
- a linear mixture of subpixel materials
Mixed Spectra Example
76 Mixed Pixels and Material Maps
Input Image
PURE
PURE
PURE
MIXED
77 Traditional Linear Unmixing
i 1 k
- Unconstrained
- Partially Constrained
- Fully Constrained
78 Hierarchical Linear Unmixing Method
- Unmixes broad material classes first
- Proceeds to a groups constituents only if the
unmixed fraction is greater than a given threshold
79 Stepwise Unmixing Method
- Employs linear unmixing to find fractions
- Uses iterative regressions to accept only the
endmembers that improve a statistics-based model - Shown to be superior to classic linear method
- Has better accuracy
- Can handle more endmembers
- Quantitatively tested only on synthetic data
80 Performance Evaluation
Error Metric
- Compare squared error from traditional, stepwise
and hierarchical methods - Visually assess fraction maps for accuracy
81 Endmember Selection
- Endmembers are simply material types
- Broad classification road, grass, trees
- Fine classification dry soil, moist soil...
- Use image-derived endmembers to produce spectral
library - Average reference spectra from pure sample
pixels - Chose specific number of distinct endmembers
82 Endmember Listing
- Strong Road
- Weak Road
- Panel 2k
- Panel 3k
- Panel 5k
- Panel 8k
- Panel 14k
- Panel 17k
- Panel 25k
- Spectral Panel
- Parking Lot
- Trees
- Strong Vegetation
- Medium Vegetation
- Weak Vegetation
- Strong Cut Vegetation
- Medium Cut Vegetation
- Weak Cut Vegetation
False-Color IR
83 Materials Hierarchy
- Grouped similar materials into 3-level hierarchy
84 Squared Error Results
85 Stepwise Unmixing Comparisons
- Linear unmixing does poorly, forcing fractions
for all materials - Hierarchical approach performs better but
requires extensive user involvement - Stepwise routine succeeds using adaptive
endmember selection without extra preparation
86 HSI Image of Washington DC Mall
HYDICE Airborne System 1208 Scan Lines, 307
Pixels/Scan Line 210 Spectral Bands in 0.4-2.4
µm Region 155 Megabytes of Data (Not yet
Geometrically Corrected)
87 Hyperspectral Imaging Potential
- Assume 10 bit data in a 100 dimensional space
- That is (1024)100 10300 discrete locations
- Even for a data set of 106 pixels, the
probability
of any two pixels lying in the same discrete
location
is extremely small