Data Integration for Homeland Security - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Data Integration for Homeland Security

Description:

Physical basis/domain knowledge needed before applying algorithms ... May not utilize domain knowledge. May be difficult to prove validity of the results ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 36
Provided by: kenk159
Category:

less

Transcript and Presenter's Notes

Title: Data Integration for Homeland Security


1
Creating a Data Mining Environment for
Geosciences Interface 2002 Montreal April 18,
2002 Sara J. Graves Director, Information
Technology and Systems Center Professor, Computer
Science Department University of Alabama in
Huntsville Director, Information Technology and
Research Center National Space Science and
Technology Center 256-824-6064 sgraves_at_itsc.uah.ed
u www.itsc.uah.edu
2
(No Transcript)
3
Characteristics of Science Data
  • Varied kinds of data
  • Raster images
  • With structure and geometry
  • Multispectral
  • Time series and sequence data
  • Numerical model outputs
  • Multiple resolutions/multiple scales
  • Variability of data formats
  • Granularity of data
  • Includes spatial and temporal dimensions
  • Physical basis/domain knowledge needed before
    applying algorithms
  • Typically requires domain-specific algorithms

4
Scientific Analysis
  • Harnesses human analysis capabilities
  • Highly creative
  • Based on theory and hypothesis formulation
  • Physical basis is normally used for algorithms
  • Drawing insights about the underlying phenomena
  • Rapidly widening gap between data collection
    capabilities and the ability to analyze data
  • Potential of vast amounts of data to be unused

5
Data Mining
  • Provides automation of the analysis process
  • Can be used for dimensionality reduction when
    manual examination of data is impossible
  • Can have limitations
  • May not utilize domain knowledge
  • May be difficult to prove validity of the results
  • There may not be a physical basis
  • Should be viewed as complementary tool and not a
    replacement for scientific analysis

6
Reasons for Mining Science Data
  • Powerful tool for research and analysis given
    the volume of science data
  • Necessity when manual examination of data is
    impossible
  • Can allow scientists to refine/add more layers
    to the knowledge bases
  • Can minimize scientists data handling to allow
    them to maximize research time
  • Can reduce reinventing the wheel
  • Can fully exploit reusable knowledge bases for
    different problems
  • Can be integrated into a Next Generation
    Information System to provide additional
    services such as
  • Custom Order Processing
  • Subsetting/Formatting/Gridding .
  • Event/Relationship Searching

7
Similarity between Data Mining and Scientific
Analysis Process
8
Data Challenge
Search and Access Data
Data Mining Science Analysis
Data Integration
Data Transformation
Data Reduction
Results
Data Preparation for Mining/Analysis
Data Sets
9
Typical Data Preparation Operations
  • Data Cleaning
  • Clean data by filling in missing values,
    smoothing noisy data, identifying or removing
    outliers, and resolving inconsistencies.
  • Fairly well handled
  • Data Integration
  • Integration of multiple data files
  • Data Transformations
  • Normalization and aggregation
  • Data Reduction
  • Obtain a reduced representation of the data set,
    which produces the same analytical results

10
User Perspective and Data Perspective of the Data
Mining Process
Analysis
Decision
Volume
Value
Transformation
Knowledge
Preprocessing
Information
Dataset Specific Algorithms
Domain Specific Algorithms
Data
Calibration Navigation
Data Stores
Dataset
User Perspective
Data Perspective
11
Scientific Data Mining Environment Stakeholders
End Users
Scientists
12
Scientists Perspective
  • Define the experiment
  • Reduce data volume
  • Create reusable Knowledge Base
  • Iterate over experiment to refine the knowledge
    base
  • Minimize data handling/Maximize research
  • Add more layers to the knowledge base
  • Allow different levels of knowledge discovery
  • Shallow knowledge
  • Hidden
  • Deep

13
End Users Perspective
  • End users can be
  • Students
  • Public
  • Decision makers
  • Other Scientists
  • Access to data
  • Access to knowledge base
  • End products

14
NASA Workshop on Issues of Application of Data
Mining to Scientific Data
  • Held on October 19-21, 1999 at University of
    Alabama in Huntsville
  • Domain Focus
  • Global Change
  • Natural Hazard
  • Terrestrial Ecology
  • Key Recommendations
  • Need to create a data mining environment for
    facilitation, scalability and automation of
    scientific analysis for large scale data streams
  • Need to formulate critical partnerships between
    physical scientists, computer scientists and
    statisticians for an effective integration of
    analysis processes, scientific algorithms,
    statistical approaches and enabling computer
    architectures

15
Reasons for Building a Data Mining Environment
  • Provide the capabilities and flexibility of
    creative scientific analysis
  • Provide an infrastructure of mining algorithms
    and knowledge bases for creative analysis to
    reduce reinventing the wheel
  • Provide capabilities to add science algorithms
    to the framework
  • Support a spectrum of heterogeneous participants,
    data sources and technological approaches
  • Provide a framework with components and suitable
    management of the interfaces between them
  • Allow scientists to refine/add domain information
    to the mining environment
  • Minimize scientists data handling to allow them
    to maximize research time
  • Incorporate relevance feedback mechanism to learn
    methodologies for multiple domains
  • Integrate data mining functionality into other
    distributed systems

16
Mining Environment When,Where, Who and Why?
  • WHERE
  • User Workstation
  • Data Mining Center
  • GRID
  • WHEN
  • Real Time
  • On-Ingest
  • On-Demand
  • Repeatedly
  • WHO
  • End Users
  • Domain Experts
  • Mining Experts
  • WHY
  • Event
  • Relationship
  • Association
  • Corroboration
  • Collaboration

Data Mining
17
ADaM History
  • Algorithm Development and Mining (ADaM) System
  • ADaM system developed under NASA HQ research
    grant
  • The system provides knowledge discovery, feature
    detection and content-based searching for data
    values, as well as for metadata.
  • It contains over 120 different operations to be
    performed on the input data stream.
  • Operations vary from specialized atmospheric
    science data-set specific algorithms to different
    digital image processing techniques, processing
    modules for automatic pattern recognition,
    machine perception, neural networks and genetic
    algorithms.
  • Developed an Event/Relationship Search System

18
ADaM Engine Architecture
Preprocessed Data
Patterns/ Models
Results
Data
Translated Data
Processing

Preprocessing
Analysis
Selection and Sampling Subsetting
Subsampling Select by Value Coincidence
Search Grid Manipulation Grid Creation
Bin Aggregate Bin Select Grid Aggregate
Grid Select Find Holes Image Processing
Cropping Inversion Thresholding Others...
Clustering K Means Isodata
Maximum Pattern Recognition Bayes Classifier
Min. Dist. Classifier Image Analysis
Boundary Detection Cooccurrence Matrix
Dilation and Erosion Histogram Operations
Polygon Circumscript Spatial Filtering
Texture Operations Genetic Algorithms Neural
Networks Others...
19
Extensibility of ADaM
ADaM Mining Engine
Analysis Modules
Input Modules
Output Modules
20
Data Mining Environment
Data Mining Server
Mining Results
Event/ Relationship Search System
21
Event/Relationship Search System
  • Allows users to conduct coincidence searches and
    relationship tests between mined phenomena and a
    variety of parameters
  • Parameters include geographic regions,
    political boundaries, or other named phenomena
    for a specific time period

22
An Environment for On-board processing EVE
  • Real-time on-board data mining can provide unique
    capabilities
  • Anomaly detection
  • Autonomous control and decision making
  • Immediate response
  • Direct satellite to Earth delivery of results
  • The Sensor Web is expanding and more processing
    is available on-orbit

23
Major EVE Components
EVE Software Architecture
24
Interchange Technology Earth Science Markup
Language (ESML)
  • Facilitate effective utilization of distributed,
    heterogeneous data products
  • Enable interchangeable tools and services

Data
Applications
Integration
25
Interoperability Accessing Heterogeneous Data
  • Earth science data comes in
  • Different formats, types and structures
  • Different states of processing (raw, calibrated,
    derived, modeled or interpreted)
  • Enormous volumes
  • One approach Standard data formats
  • Difficult to implement and enforce
  • Cant anticipate all needs
  • Some data cant be modeled or is lost in
    translation
  • The cost of converting legacy data
  • A better approach Interchange technologies
  • Earth Science Markup Language

HDF-EOS
HDF
netCDF
ASCII
GRIB
Binary
26
Data Usability
The Problem
The Solution
DATA FORMAT 1
DATA FORMAT 2
DATA FORMAT 3
FORMAT CONVERTER
READER 1
READER 2
APPLICATION
  • Specialized code for every format
  • Difficult to assimilate new data types
  • Expensive to convert legacy data
  • Data interoperability
  • Define Once, Use Anywhere

27
Earth Science Markup Language
  • Specialized markup language for Earth Science
    information based on XML
  • Beyond traditional metadata to include both
    structural and semantic information needed to
    effect a practical runtime interpretation of a
    data set
  • Benefits of ESML
  • Enables independently developed applications and
    services to effectively utilize distributed,
    heterogeneous data products
  • Allows the end-user to integrate data sets of
    differing structures to aid in data fusion and
    analysis without having to write a special reader
    for each data set
  • Is simple enough that end-users can create their
    own ESML for on-hand datasets (new or legacy)

28
Example ESML file for an Image
  • Hurricane Mitch (981027)
  • Binary format
  • (32 bits/pixel)
  • Mitch.esml
  • ltESMLgt
  • ltSyntacticMetaDatagt
  • ltBinary GeoInfoNoGeoInfogt
  • ltArray nameTrack occurs300
    DimNameYgt
  • ltField nameChannel 1 typeint size32
    occurs512 DataTypeData DimNameX/gt
  • ltData/gt
  • lt/Arraygt
  • lt/Binarygt
  • lt/SyntacticMetaDatagt
  • lt/ESMLgt

512
300
29
Atmospheric Science Mining Applications
  • Lightning Detection
  • Rainfall Identification Estimation Study
  • Rainfall Accumulation Study
  • Tropical Cyclone Detection and Wind Speed
    Estimation
  • GOES Cumulus Cloud Classification
  • Mesoscale Convective System Detection
  • Detection of Jet Streams in Numerical Model Data

30
Tropical Cyclone DetectionEstimating Maximum
Wind Speed
Advanced Microwave Sounding Unit (AMSU-A)Data
  • Water cover mask to eliminate land
  • Laplacian filter to compute temperature
    gradients
  • Science Algorithm to estimate wind speed
  • Contiguous regions with wind speeds above a
    desired
  • threshold identified
  • Additional test to eliminate false positives
  • Maximum wind speed and location produced

Calibration/ Limb Correction/ Converted to Tb
Hurricane Floyd
Data Archive
Mining Environment
Result
Results are placed on the web and made available
to National Hurricane Center Joint Typhoon
Warning Center
31
Data Fusion and Mining From Global Information
to Local Knowledge
Emergency Response
Precision Agriculture
Urban Environments
Weather Prediction
32
Mining as a Web Service
33
Mining on Information Power Grid (IPG) using ADaM
IPG Processor
Mining LDAP Server
34
Earth Science Example of Developing a Knowledge
Network Collaborative Research in Mesoscale
Convective Systems
Knowledge Base
Information about MCSs detected
Visualization Eureka Interface
Eureka Spatial
  • Database
  • location
  • size
  • intensity etc.

Data Sets SSM/I (F13, F14)
ADaM System
Generate end products while mining
Add algorithm to detect MCSs
Pose question and get answers from the
Knowledge Repository (such as coincidence search,
relationship testing)
Anyone can access the knowledge base via the web
Scientists/Researchers can ask questions such as
End Users
  • What is the latitudinal distribution of MCSs?
  • Which continent has more MCSs?
  • What is the seasonal distribution of MCSs?
  • What is the relationship between the
  • number of MCSs and their intensity?
  • Generate information useful to the general
  • public ( students, researchers, policy
  • makers etc)
  • Images
  • Forecast aids
  • General Science information
  • Answer the practical side of the problem

35
Challenges
  • Develop and document common/standard interfaces
    for interoperability of data and services
  • Design new data models for handling
  • real-time/streaming input
  • data fusion/integration
  • Design and develop distributed standardized
    catalog capabilities
  • Develop advanced resource allocation and load
    balancing techniques
  • Exploit distributed infrastructure for enhanced
    data mining functionality (grids, web services,
    etc)
  • Develop more intelligent and intuitive user
    interfaces
  • Develop ontologies of scientific data, processes
    and data mining techniques for multiple domains
  • Support language and system independent
    components
  • Incorporate data mining into scientific curricula
Write a Comment
User Comments (0)
About PowerShow.com