Information Technology and Systems Center - PowerPoint PPT Presentation

About This Presentation
Title:

Information Technology and Systems Center

Description:

Title: PowerPoint Presentation Last modified by: rahul Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 44
Provided by: ucl56
Category:

less

Transcript and Presenter's Notes

Title: Information Technology and Systems Center


1
Data Mining in Earth Sciences
Rahul Ramachandran, Sara Graves and Ken
Keiser Mathematical Challenges in Scientific
Data Mining IPAM January 14-18, 2002
  • Information Technology and Systems Center
  • University of Alabama in Huntsville
  • http//datamining.itsc.uah.edu

2
Outline
  • Introduction
  • ADaM System
  • Data Mining Taxonomy for Earth Science
  • Event/Relationship based
  • Application Examples
  • Dimensionality Reduction
  • References

3
Reasons for Data Mining of Earth Science Data
  • Greatly increased data volume due to improvements
    in data collection/access/availability/storage
    technology (instruments, computational resources,
    internet)
  • Terra are about 1 terabyte per day - more than
    can be analyzed by conventional means
  • High variability in data formats and content
  • Need for high returns on expensive data
    investments
  • Need for improved access/availability of data,
    information and knowledge
  • Need for higher level products for the
    non-specialist and interdisciplinary/cross-domain
    researchers
  • Questions/queries are getting more complex due,
    in part, to heterogeneous nature of the data

4
Characteristics of Earth Science Data
  • High variability
  • Type
  • Geostationary
  • Polar Orbiting
  • Structure
  • Raster
  • Vector
  • Resolution
  • Fine AVHRR 1km
  • Coarse SSM/I 20km
  • Multi/Hyper Spectral
  • Processing stage
  • Level 0 Raw data instrument counts
  • Level 1 Annotated with Geo-reference information
  • Level 2 Transformed by algorithm into
    geophysical parameter
  • Level 3 Spatial/Temporal resampling
  • Level 4 Includes additional model data

5
Characteristics of Earth Science Data
  • Need to know physical basis (domain knowledge)
    before applying statistical techniques
  • Multiple time scales
  • Wide variety of data formats
  • Includes spatial/temporal information
  • Typically needs domain-specific algorithms

6
ADaM History
  • Algorithm Development and Mining (ADaM) System
  • The system provides knowledge discovery, feature
    detection and content-based searching for data
    values, as well as for metadata.
  • It contains over 120 different operations to be
    performed on the input data stream.
  • Operations vary from specialized atmospheric
    science data-set specific algorithms to different
    digital image processing techniques, processing
    modules for automatic pattern recognition,
    machine perception, neural networks and genetic
    algorithms.
  • Developed a Event/Relationship Search System for
    the environment

7
ADaM Engine Architecture
Preprocessed Data
Patterns/ Models
Results
Data
Translated Data
Processing

Preprocessing
Analysis
Selection and Sampling Subsetting
Subsampling Select by Value Coincidence
Search Grid Manipulation Grid Creation
Bin Aggregate Bin Select Grid Aggregate
Grid Select Find Holes Image Processing
Cropping Inversion Thresholding Others...
Clustering K Means Isodata
Maximum Pattern Recognition Bayes Classifier
Min. Dist. Classifier Image Analysis
Boundary Detection Cooccurrence Matrix
Dilation and Erosion Histogram Operations
Polygon Circumscript Spatial Filtering
Texture Operations Genetic Algorithms Neural
Networks Others...
8
ADaM Mining Environment
Data Mining Server
Mining Results
Event/ Relationship Search System
9
Data Mining Taxonomy
10
Event-based Mining
  • Known events/Known algorithms
  • Tropical Cyclones from AMSU-A data
  • Known events/Learned algorithms
  • Rainfall estimation from SSM/I data
  • Lightning Detection from OLS data
  • Unknown event/Unknown algorithm
  • Target Independent Mining

11
Known Event/Known Algorithm
I know what phenomena to detect and I have
the algorithm to do so!
Results
Add algorithm to Mining Environment
  • Relationship analysis
  • Coincidence searches
  • Input for other algorithms

Earth Science Data Sets
12
Tropical Cyclone DetectionEstimating Maximum
Wind Speed
  • Scientist Dr. Roy Spencer (GHCC/MSFC NASA)
  • Data used Advanced Microwave Sounding Unit-A
  • Radiometer can detect temperatures at different
    levels of the atmosphere
  • Surface winds in tropical cyclones are directly
    related to the warm middle- and upper-atmosphere
    temperatures which exist around the cyclone
    center
  • AMSU-A measures this warmth at several
    frequencies near 55 gigahertz (GHz)
  • Calibrated using aircraft reconnaissance
    measurements in tropical depressions, tropical
    storms, and hurricanes from the 1998 Atlantic
    hurricane season
  • Tropical cyclone detection based on ice
    scattering, water vapor and wind speed

13
Tropical Cyclone DetectionEstimating Maximum
Wind Speed
Advanced Microwave Sounding Unit (AMSU-A)Data
  • Water cover mask to eliminate land
  • Laplacian filter to compute temperature
    gradients
  • Science Algorithm to estimate wind speed
  • Contiguous regions with wind speeds above a
    desired
  • threshold identified
  • Additional test to eliminate false positives
  • Maximum wind speed and location produced

Calibration/ Limb Correction/ Converted to Tb
Hurricane Floyd
Data Archive
Mining Environment
Result
Results are placed on the web and made available
to National Hurricane Center Joint Typhoon
Warning Center
14
Known Event/Learned Algorithm
Data Mining System
I know what phenomena I want to detect but I
do not know the characteristics of the phenomena
Results
Refine your algorithm using iteration
  • Relationship analysis
  • Coincidence searches
  • Input for other algorithms

Earth Science Data Sets
15
Rainfall Estimation and Identification Study
using SSM/I data
  • Scientist Dr. Steve Goodman (GHCC/MSFC NASA)
  • To determine whether generic pattern recognition
    techniques could be applied to SSM/I data to
    detect rain
  • Minimum Distance Classifier, Back-propagation
    Neural Network and Discrete Bayes Classifier were
    compared against a Science Algorithm ( WetNet PIP
    Algorithm)
  • US Composite rainfall product was used as ground
    truth

Subsetted SSM/I data
NEXRAD Composite data
16
Rainfall Estimation and Identification Study
using SSM/I data
  • SSM/I and US rain data over southeastern United
    States for the period January and July 1995 were
    compared in the study
  • SSM/I and Radar data were gridded and registered
    to establish spatial and temporal coincidence
  • BPNN performance was comparable to that of the
    WetNet PIP SSM/I rain rate algorithm
  • Performance of Bayes classifier was not as good
    as that of the WetNet PIP SSM/I rain rate
    algorithm. This is perhaps due to the small
    sample size used for estimating density functions
    of the two classes (rain and non-rain)

17
Lightning Detection in Operational Linescan
System (OLS) Images
  • Scientist Dr. Steve Goodman (GHCC/MSFC NASA)
  • To identify lightning streaks in night time
    portions of OLS images
  • OLS is carried by DMSP satellites and produces a
    visible and thermal image
  • Lightning shows up as bright horizontal streaks
    as do city lights and moonlight reflected off the
    clouds
  • Approach based on morphological filtering and
    gradient detection was selected
  • Both visible and thermal band used

18
Lightning Detection in Operational Linescan
System (OLS) Images
  • Erosion and dilation was used to find areas
    in/near clouds, other areas were removed
  • Gradient detection in the direction of satellite
    propagation was applied to the visible image to
    extract horizontal streaks
  • Texture measures were used to identify areas of
    small patchy cloud cover which exhibited small
    bright streaks
  • Genetic algorithm was used to tune parameters of
    the classification during training

Results ( Accuracy)
Correctly Detected
False Positives
False Negatives
Training Results
80
0.7
19.2
Test Results
78.2
4.3
17.3
19
Unknown Event/Unknown Algorithm
Data Mining System
I want to find anomalies in the data sets !
Results
Let the miner discover it
  • Relationship analysis
  • Coincidence searches
  • Input for other algorithms

Earth Science Data Sets
Example Target Independent Mining
20
Target Independent Mining of SSM/I Data
  • Mine for data in a target independent manner (no
    specific phenomena under consideration)
  • Interested in transient phenomena that move
    through an area
  • Transient phenomena characterized as deviation
    from normal
  • Objective Data Reduction with minimum loss of
    information
  • Size of remotely sensed data prevents it from
    being maintained on-line
  • Data is archived in much slower tertiary storage
  • Need to develop techniques to minimize the need
    for data access from the tertiary storage
  • Procedure Overlay the earths surface with a
    constant grid consisting of cells
  • For each cell a maximum and minimum trend line is
    computed
  • Maximum trend line is computed by forming a set
    of maximum values for a day over some period
    (month)
  • Median for a series of months is used to form the
    maximum trend line
  • Same procedure used to calculate minimum trend
    line

21
Target Independent Mining of SSM/I Data
Trend Lines Represent What Is Normal
22
Target Independent Mining of SSM/I Data
  • Extracted metadata not oriented toward any
    particular transient phenomena
  • Laboratory tests show 98 data compression while
    preserving 92 of MCSs detectable in raw data
  • MCS events represented only 6.7 of extracted
    metadata

23
Relationship-based Mining
  • Coincident Association
  • VARGA Algorithm for multispectral data
  • Localized Spatial Association
  • Cumulus Cloud Classification in GOES Imagery
  • Temporal Association

24
Coincident Association Mining
  • Use Market Basket analysis to mine for




    association rules in vector data
  • Rule has form X ?Y
  • Rule characterized by
  • Support
  • of vector instances that have X ? Y
  • How likely the rule is applicable?
  • Confidence
  • What of vector instances that contain X also
    contain Y?
  • Estimate of conditional probability

25
Coincident Association Applied to Multi-spectral
Data Mining
  • Developed and implemented Vector Association Rule
    Generation Algorithm (VARGA) as a modification to
    market-basket association rule mining.
  • Modified to minimize memory usage for large
    multi-spectral satellite data such as SSM/I (90
    megabytes per day uncompressed)
  • Example SSM/I Rule
  • 19V, 180.0 37H, 140.0 -gt 37V, 200.0
    0.117037 0.945986

26
Localized Spatial Association Mining
  • Extract association rules to characterize texture
    (Dissertation of Dr. John Rushing)
  • Each pixel on an nxn neighborhood is
    characterized by the triple (X,Y,I)
  • The X and Y offsets from the pixel at the
    neighborhood center
  • Its intensity I
  • Association rules can then be characterized by
    relationships between the triples

27
Association Rule Example
  • The rule specified in figure can be applied to
    this image in 9 of the 16 pixel locations due to
    the pixel offsets in the rule.
  • Of these 9 locations, the antecedent matches at 5
    locations, and both the antecedent and consequent
    match at 3 locations.
  • This yields a support of 3/9 33.33 and a
    confidence of 3/5 60.

 
28
Association Rule Example
29
GOES Cumulus Cloud Classification Why Texture
Features?
  • Cumulus cloud fields have a very characteristic
    texture signature in the GOES visible imagery

30
GOES Cumulus Cloud Classification The Need
  • Cloud systems are important modulators of earths
    radiation budget
  • Large uncertainties are associated with cloud
    radiative forcing
  • Radiative energy budget is impacted by change in
    distribution of clouds
  • Cumulus clouds are a cloud field type that could
    respond strongly to climate change
  • Knowledge of cloud geometry, size and spatial
    distribution is needed for the representation of
    cumulus clouds in radiative transfer models
  • To derive models of cloud field characteristics,
    automated cumulus cloud detection schemes are
    required to analyze large amounts of data

31
GOES Cumulus Cloud Classification Purpose of
this study
  • Compare different techniques for detecting
    Cumulus cloud fields in Geostationary Operation
    Environmental Satellite (GOES)
  • Comparison based on
  • Accuracy of detection
  • Amount of time required to classify
  • Feature measures used along with the Maximum
    Likelihood Classifier
  • Texture features
  • Gray Level Co-Occurrences Matrix
  • Gray Level Run Length Features
  • Association Rules
  • Edge Detection Features
  • Sobel Filter
  • Laplacian Filter
  • Combination of Sobel and Laplacian Filter

32
GOES Cumulus Cloud Classification Texture
Features (1)
  • Gray Level Co-Occurrence Matrix
  • First texture feature vector to be developed
  • GLCM is used as a benchmark
  • It is based on positional operator
  • Positional operator defines relationship between
    pixels in terms of x,y offset or as a distance,
    angle offset
  • Co-occurrence matrix is an NxN matrix where N is
    the number of gray levels and functions are
    computed on the matrix
  • Gray Level Run Length Features
  • Gray level statistical features based on
    homogeneous gray level runs
  • Run is a series of consecutive pixels of the same
    intensity
  • Run length are at orientations in increments of
    45 degrees starting at 0 degrees

33
GOES Cumulus Cloud Classification Texture
Features (2)
  • Association Rules
  • Often used in business applications to identify
    relationships in databases
  • Adapted to discriminate textures in images
  • Based on frequently occurring local image
    structures

Triples ( Pos X, Pos Y, Pixel Intensity) Rule
(0,0,2) (1,1,2) gt (1,0,0) Then calculate
Support and Confidence of this Rule
34
GOES Cumulus Cloud Classification Edge Detection
Features
  • These techniques are used for detecting
    discontinuities in an image
  • These techniques apply a local derivative
    operator on the image
  • Sobel Filters
  • It calculates the magnitude of rate of change of
    gray level and the direction of this change
    vector
  • Magnitude Gx Gy
  • Direction tan-1(Gx/Gy)
  • Gx (z7 2z8 z9) (z1 2z2 z3)
  • Gy (z3 2z6 z9) (z1 2z4 z7)
  • Laplacian Filters
  • It is a second order derivative
  • F(z) 4z5 (z2 z4 z6 z8)

z1 z2 z3
z4 z5 z6
z7 z8 z9
35
GOES Cumulus Cloud Classification Experiment
Process
  • Training
  • Samples selected from 1000x1000 GOES scene
  • Only two classes are used Cumulus and Others (
    includes background)
  • For validation, samples were labeled by at least
    two experts and only pixels where experts agreed
    were used for training
  • Maximum likelihood classifier was trained using
    GLCM, GLRL, Association Rules and Edge detection
    features
  • Window size was varied 5x5 11x11
  • Testing
  • 12 different GOES images (512x512) where used for
    testing
  • Classification results were compared against
    expert labeled images
  • Confusion matrix, classification accuracy and
    experiment run times were calculated

36
GOES Cumulus Cloud Classification Sample Result
Original
GLRL
Association Rules
GLCM
Expert Labeled
Sobel
Sobel Laplacian
Laplacian
37
GOES Cumulus Cloud Classification Conclusions
  • Accuracy
  • Best results using texture features
  • GLRL (78) with a filter size of 11x11
  • Association Rules (75) with a filter size of 5x5
  • GLCM gave the worst results (51-55)
  • Best results using edge detection filters
  • Sobel Filter (78) with a filter size of 11x11
  • Laplacian (73) with a filter size of 9x9
  • Laplacian and Sobel (75) with a filter size of
    9x9
  • Timing Results
  • Times were calculated on an 933MHz Pentium III
    processor PC with 512 MB memory
  • Texture feature techniques in general required an
    order of magnitude more time than edge detection
    filters

38
Dimensionality Reduction Mesoscale Convective
System (MCS) Detection
Scientists
Populating Knowledge Base (reducing data volume )
Scientists
  • Define the Experiment
  • Select algorithm (Devlin)
  • Automatic extraction of MCSs from SSM/I
  • data

Mining Results MCSs
Knowledge Base Event/ Relationship
Search System
SSM/I Data
39
Dimensionality Reduction Research Analysis
  • Reduced amount of data
  • Allow scientists to pose questions
  • and get results
  • Allow easy visualization
  • Maximize knowledge discovery/
  • minimize data handling
  • Scientists can refine their
  • knowledge repository
  • Answer the science questions
  • Analysis
  • Find MCSs over river basins in Middle East?
  • Data Sets
  • MCSs
  • River basin data set
  • Political boundaries

Scientists
Mining Results MCSs
Event/ Relationship Search System
Knowledge Base Event/ Relationship
Search System
SSM/I Data
40
Dimensionality Reduction Knowledge Reuse
  • Climatological Study of MCSs
  • What is the latitudinal distribution of
  • MCSs?
  • Which continent has more MCSs?
  • What is the size distribution of the
  • MCSs for JUN-JUL-AUG?
  • What is the relationship between the
  • number of MCSs and their intensities?
  • Do results vary for El-Nino years?

Scientists
  • Knowledge Reuse

Mining Results MCSs
Event/ Relationship Search System
Knowledge Base Event/ Relationship
Search System
SSM/I Data
41
Event/Relationship Search System
  • Allows users to conduct coincidence searches and
    relationship tests between mined phenomena and a
    variety of parameters
  • Parameters include geographic regions,
    political boundaries, or other named phenomena
    for a specific time period

42
References
  • Graves, Sara J., Thomas Hinke, Shanlini Kansal,
    "Metadata The Golden Nuggets of Data Mining",
    First IEEE Metadata Conference, Bethesda,
    Maryland, April 16- 18, 1996
  • Hinke, Thomas, John Rushing, Shanlini Kansal,
    Sara J. Graves, Heggere S. Ranganath, "For
    Scientific Data Discovery Why Can't the Archive
    be More Like the Web", Proceedings Ninth
    International Conference on Scientific Database
    Management, Evergreen State College, Olympia,
    Washington, August 11-13, 1997
  • Hinke, Thomas, John Rushing, Heggere S.
    Ranganath, Sara J. Graves, "Techniques and
    Experience in Mining Remotely Sensed Satellite
    Data", Artificial Intelligence Review 14 (6)
    Issues on the Application of Data Mining, pp
    503-531, December 2000
  • Hinke, Thomas, John Rushing, Shanlini Kansal,
    Sara J. Graves, Heggere S. Ranganath, Evans
    Criswell, "Eureka Phenomena Discovery and
    Phenomena Mining System", AMS 13th Intl
    Conference on Interactive Information and
    Processing Systems (IIPS) for Meteorology,
    Oceanography and Hydrology, 1997

43
References
  • Hinke, Thomas, John Rushing, Heggere S.
    Ranganath, Sara J. Graves, "Target-Independent
    Mining for Scientific Data Capturing Transients
    and Trends for Phenomena Mining", Proceedings
    Third International Conference on Data Mining
    (KDD-97), Newport Beach, California, August
    14-17, 1997
  • Keiser, Ken, John Rushing, Helen Conover, Sara J.
    Graves, "Data Mining System Toolkit for Earth
    Science Data", Earth Observation (EO)
    Geo-Spatial (GEO) Web and Internet Workshop,
    Washington, D.C., February 1999
  • Rushing, John, Heggere S. Ranganath, Thomas
    Hinke, Sara J. Graves, "Using Association Rules
    as Texture Features", IEEE Transactions on
    Pattern Analysis and Machine Intelligence, Vol
    23, No. 8, 845-858, 2001
  • Nair, Udaysankar J., John Rushing, Rahul
    Ramachandran, Kwo-Sen Kuo, Sara J. Graves, Ron
    Welch, "Detection of Cumulus Cloud Fields in
    Satellite Imagery", The International Symposium
    on Optical Science, Engineering and
    Instrumentation, Denver, 1999
  • Nair, U., J. Rushing, R. Ramachandran, R. Welch,
    and S. J. Graves, Detection of boundary layer
    cumulus cloud fields in GOES satellite imagery,
    submitted to Journal of Applied Meteorology,
    September, 2001
Write a Comment
User Comments (0)
About PowerShow.com