Information Technology - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Information Technology

Description:

Information Technology – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 54
Provided by: Kei756
Category:

less

Transcript and Presenter's Notes

Title: Information Technology


1
Information Technology Systems Center
  • Sara J. Graves
  • Director, Information Technology and Systems
    Center
  • University Professor, Computer Science Department
  • University of Alabama in Huntsville
  • Director, Information Technology Research Center
  • National Space Science and Technology Center
  • 256-824-6064
  • sgraves_at_itsc.uah.edu

http//www.itsc.uah.edu
2
Invasive Species Data Service IT vision
  • Networked, distributed system to integrate NASA
    ESE and NBII data sources
  • Customized, easily accessible data products
  • Aggregated, thematic, interdisciplinary
  • Web-based service interoperability
  • User interface tailored to user community

3
IT Components
  • Data catalogs
  • To provide transparent access to ESE, NBII, and
    user-supplied data resources
  • Service catalogs
  • Service metadata, semantics
  • Service Integration Function
  • Loosely coupled, dynamically bound services (read
    routines, georegistration, mining modules,
    subsetting, reprojection, aggregation)
  • Standard service chaining protocols (SOAP, OGC,
    etc.)
  • To support
  • Basic pre-processing steps performed dynamically
    with minimal interventions by the user
  • Creation of aggregated, thematic,
    interdisciplinary data products

4
ISDS Services for Customized Data Products
Data and Service Catalogs (NBII, ECHO, GCMD, ESML
registry, local and distributed ontologies)
Scientists and Policy Makers
ISDS User Interface
Display
Service Integrator
Aggregation
Custom data processing and delivery service chain
Subsetter
Re-grid
Miner
ESML data reader
Data
Services
5
ITSC Relevant Skills and Technologies
  • Accessing and using heterogeneous, distributed
    data
  • ESML Interchange Technologies
  • Subsetting spatial data
  • HEW Subsetting Engine
  • ESML-based Subsetting
  • Analysis tools for data mining and image
    processing
  • ADaM
  • Distributed Science Data and Information
    Management
  • EOSDIS node development and operation (GHRC, LIS
    SCF)
  • ESIP Federation (GHRC, PM-ESIP, Interoperability
    and Technology Committee, GIS Cluster, Federation
    web site)
  • Distributed processing and delivery (AMSR-E SIPS)

6
Data Usability Success Builds on the Integration
of Domain Science and Information Technology
  • Collaborations
  • Accelerate research process
  • Maximize knowledge discovery
  • Minimize data handling
  • Contribute to both fields

7
Improving Data Usability
  • Advanced Applications Development
  • Data organization and management for archival and
    analysis
  • Data Mining in real-time and for post run
    analysis
  • Interchange Technologies for improved data
    exploitation
  • Semantics to transform data exploitation via
    intelligent automated processing
  • Infrastructure Development
  • Grid technologies for seamless access to multiple
    computational and data resources into a virtual
    computing environment
  • Cluster technologies for high speed parallel
    computation, for multiple agent computations, and
    other applications
  • High-performance networking for advanced
    applications development and high-speed
    connectivity
  • Next generation technologies in videoconferencing
    and electronic collaboration

8
Heterogeneity Leads to Data Usability Problems
  • Earth Science Data Characteristics
  • Many different formats, types and structures (18
    and counting for atmospheric science alone!)
  • Different states of processing (raw, calibrated,
    derived, modeled or interpreted)
  • Enormous volumes

9
Interoperability Accessing Heterogeneous Data
The Problem
The Solution
DATA FORMAT 3
DATA FORMAT 1
DATA FORMAT 2
DATA FORMAT 1
DATA FORMAT 2
DATA FORMAT 3
ESML FILE
ESML FILE
ESML FILE
FORMAT CONVERTER
ESML LIBRARY
READER 1
READER 2
APPLICATION
APPLICATION
  • One approach Enforce a standard data format,
    but
  • Difficult to implement and enforce
  • Cant anticipate all needs
  • Some data cant be modeled or is lost in
    translation
  • Converting legacy data is costly
  • A better approach Interchange Technologies
  • Earth Science Markup Language

10
What is ESML?
  • It is a specialized markup language for Earth
    Science metadata based on XML - NOT another data
    format.
  • It is a machine-readable and -interpretable
    representation of the structure, semantics and
    content of any data file, regardless of data
    format
  • ESML description files contain external metadata
    that can be generated by either data producer or
    data consumer (at collection, data set, and/or
    granule level)
  • ESML provides the benefits of a standard,
    self-describing data format (like HDF, HDF-EOS,
    netCDF, geoTIFF, ) without the cost of data
    conversion
  • ESML is the basis for core Interchange Technology
    that allows data/application interoperability
  • ESML complements and extends data catalogs such
    as FGDC and GCMD by providing the use/access
    information those directories lack.
  • http//esml.itsc.uah.edu

11
ESML Components
12
ESML v3.0 Library Layered Design
  • The core ESML library provides the basic
    functionality of reading structural metadata from
    the ESML file and returning data to the user
  • Intuitive user API based on the analogy of file
    access in a directory structure
  • Plug-in modules for each individual format allow
    flexible packaging of libraries
  • Simple Plug-in API for easy addition of new
    formats
  • Additional software can be easily added to
    provide other functions
  • Versions Available
  • C for Windows and Linux
  • Python (pyESML)

Level 1 API
Semantic Parser
User Level 0 API
DOM Tree
Plug-in API
Binary
HDF - EOS
13
ESML IN ACTION Ingest surface skin temperature
data in Numerical Models
  • Purpose
  • Use ESML to incorporate observational data into
    the numerical models for simulation
  • Skin temperatures come in a variety of data
    formats
  • GOES McIDAS
  • Reanalysis Data - GRIB
  • MM5 Model - Binary
  • AVHRR HDF
  • MODIS - EOS-HDF

Reanalysis GRIB files
MM5
GOES
ESML file
ESML file
ESML file
  • Scientists can
  • Select remote files across the network
  • Select different observational data to increase
    the model prediction accuracy

http//vortex.nsstc.uah.edu/sud/web/default.htm
14
ESML IN ACTIONCollocation Algorithm
MISR/ Others
ESML file
ESML file
ESML file
MODIS
CERES
ESML Library
  • Purpose
  • To study the relationship between shortwave flux
    and cloud or aerosol properties
  • Important for climate change studies

Collocation Algorithm
Analysis
http//vortex.nsstc.uah.edu/seth/multiplot.htm
15
ESML-Enabled Generic Subsetter
Other Formats
Binary/ ASCII
ESML file
ESML file
ESML file
HDF-EOS
Network
ESML Library
Subsetting Algorithm
For HDF-EOS data not formatted for subsetting
with the HDF-EOS library ESML file can be used
to correct the semantic tag required to subset
HDF-EOS data without the need to recreate the
data file
Subsetted Data
16
Smart Applications/Services using ESML and
Ontologies
  • ESML Schemas focus is on providing structural
    data interoperability between data/application
  • However, ESML allows embedding semantic terms for
    data fields in the Description File to provide a
    complete structural and semantic description of
    the data
  • Various science communities can create their own
    ontologies (for example, SWEET) and link them
    with ESML Description Files for their data
  • Application developers can add semantic parsers
    on top of the core ESML Library to build smart
    applications or services

Structural Information
ESMLSchema
ESMLSCHEMA
ESMLSCHEMA
Ontologies
Ontologies
Semantic Information
17
Prototype Smart Subsetter
  • To demonstrate a smart application using ESML and
    ontologies, a subsetting prototype is being
    developed
  • Subsetting is a frequent preprocessing step used
    by scientists to reduce the size and complexity
    of the data
  • The subsetting prototype parses the semantic tags
    embedded in the ESML Description File
  • The subsetting prototype then uses the linked
    ontologies to decipher meaning of these tags to
    make useful decisions
  • Components of this Prototype
  • Simple ontologies describing Subsetting and
    Dataset
  • ESML Description Files
  • Reasoning System used as an inference engine

Dataset Ontology
Subsetting Ontology
18
Current Status
  • ESML data formats
  • Currently supported
  • ASCII, Binary, HDF-EOS, netCDF, Grib, HDF5, BUFR,
    NEXRAD Level II
  • ESML Library
  • Currently available
  • C for Windows and Linux, Python plugin, IDL
    plugin
  • ESML DODS Server
  • ESML Editor application
  • ESML Data Browser
  • http//esml.itsc.uah.edu

19
Subsetting
  • Goal to provide a science data user with only
    the data they request as quickly as possible.
  • Benefits science data users and data centers-
    reduces analysis time by reducing amount of
    data- reduces time for data delivery- reduces
    resources (network, personnel, media, etc.)
  • Steps- locate spatial / temporal / spectral
    area of interest- extract- re-assemble for
    distribution

20
ITSC Subsetting Tools
  • HDF-EOS Subsetting Engine (HSE)
  • Dataset-independent subsetting service for
    HDF-EOS data
  • Callable function for integration into other
    applications
  • Available as stand-alone executable
  • Integrated into data ordering and delivery
    components of EOSDIS Core System
  • Planned web service available through ECHO and
    other service brokers
  • Specialized subsetting and data aggregation tools
    for MODIS Land team
  • modland subsetter for MODIS gridded data
  • stitcher pieces together 2 or 4 contiguous
    MODIS tiles
  • ESML-based subsetting and related data services

21
HSE HDF-EOS Subsetting Engine
  • Callable function can be integrated into other
    applications
  • Uses HDF-EOS (and HDF) library
  • Handles Swath and/or Grid objects
  • Unix (SGI Sun) available (Linux planned)
  • Optionally updates metadata in output files to
    contain
  • StructMetadata (HDF-EOS)
  • ArchiveMetadata
  • ProductMetadata (added by HEW subset request)
  • CoreMetadata (w/ modified bounding box time
    info)
  • optionally placed in .met file
  • if present in parent file

22
HSE Subsettable datasets
  • EOS DATASETS
  • Terra
  • MODIS
  • MOPITT
  • ASTER
  • Aqua
  • AMSR-E
  • AIRS
  • MODIS
  • Aura
  • HIRDLS
  • OTHERS
  • TRMM
  • TMI
  • NOAA-15, 16, 17
  • AMSU-A
  • Any other HDF-EOS datasets written with HDF-EOS
    calls in mind

23
HDF-E0S Web-based (HEW) Subsetter
Users Browser (HTML)
User Interface (CGI)
Input file
Subsetting API (ODL)
HEW
Output file
HSE
1. The User Interface CGI checks the HDF-EOS
file and presents the attributes to the user. 2.
The User interacts with the browser to specify
the subsetting criteria. 3. The User Interface
CGI creates the ODL file with the subsetting
criteria. 4. The Subsetter uses the ODL file and
the HDF-EOS file to create the subset HDF-EOS
file.
24
SPOT
  • Subsettability checker
  • Displays content/structure of HDF-EOS files
  • Examines files for subsettability by HSE
  • Simple command-line interface
  • Stand-alone operation
  • Available for SGI and Sun at subset.org

25
HEW integration with ECS
ECS

EDG System


2

EDG

ECS

1

Order
submission
(HTML)

7

4

3

End

Output data
Data order

(Reingested)

user

and reply

Subset ODL

and reply

5

6

HDF-EOS Subsetting Appliance
Input


Output

data

data

HSE

Subsetting System
26
ECS integration status
  • EDG v3.5.1 has basic subsetting options
  • Operational at NSIDC
  • Testing at LPDAAC (EDC)
  • Testing to begin at GDAAC soon
  • Further enhancements for DAACs

27
Subsetting as a Web Service
Subsetting Center
Subset request
Subsetted data
HSE
ECHO
URL to data on Archive
Archive
28
Subsetting web-sitesubset.org
  • The subsetting portal is being created for
    everyone involved in subsetting
  • Advertising
  • Forums
  • Data
  • Software
  • Glossary
  • Tutorials
  • Links to specialized subsetters

29
Distributed Data Integration
Merged data product for on-demand visualization
Countries
Cyclone Events
AMSU-A Channel 01
MCS Events
Coastlines
Knowledge Base
AMSU-A
ITSC
GLOBE
AMSU-A data overlaid with MCS and Cyclone events,
merged with world boundaries from GLOBE.
30
Chained Image Processing Services
Service Chaining is used to integrate modules
or services developed on distributed platforms
and different languages for a single processing
solution.
31
Data Mining
  • Automated discovery of patterns, anomalies from
    vast observational data sets
  • Derived knowledge for decision making,
    predictions and disaster response
  • ADaM Algorithm Development and Mining System
  • http//datamining.itsc.uah.edu

32
Data Mining Types of Mining
  • Association Rule Mining
  • Initially developed for market basket analysis
  • Goal is to discover relationships between
    attributes
  • Uses include decision support, classification and
    clustering
  • Classification and Prediction (Supervised
    Learning)
  • Classifiers are created using labeled training
    samples
  • Training samples created by ground truth /
    experts
  • Classifier later used to classify unknown samples
  • Clustering (Unsupervised Learning)
  • Grouping objects into classes so that similar
    objects are in the same class and dissimilar
    objects are in different classes
  • Discover overall distribution patterns and
    relationships between attributes
  • Other Types of Mining
  • Outlier Analysis
  • Concept / Class Description
  • Time Series Analysis

33
ADaM System Overview
  • Developed by the Information Technology and
    Systems Center at the University of Alabama in
    Huntsville
  • Consists of over 75 interoperable mining and
    image processing components
  • Each component is provided with a C application
    programming interface (API), an executable in
    support of scripting tools (e.g. Perl, Python,
    Tcl, Shell)
  • ADaM components are lightweight and autonomous,
    and have been used successfully in a grid
    environment
  • ADaM has several translation components that
    provide data level interoperability with other
    mining systems (such as WEKA and Orange), and
    point tools (such as libSVM and svmLight)
  • Components include Python wrappers and web
    service interfaces are planned

34
ADaM 4.0 Components
35
Data Mining in Action
  • Grid Mining
  • NASA Information Power Grid
  • NSF TeraGrid

BioInformatics Genome Patterns
  • Earth Science
  • Mining Model Data (Ames, Goddard, SWA)
  • Satellite Observations
  • Radar Observations

Space Science Polar Cap Boundary in Auroras
36
Classification of Tabular Data
  • Wisconsin breast cancer data in ARFF format, from
    the University of California Irvine (UCI) Machine
    Learning Database
  • http//www.ics.uci.edu/mlearn/MLRepository.html
  • The Naïve Bayes classifier will be trained to
    distinguish malignant vs. benign tumors based on
    nine characteristics

37
Cumulus Cloud Classification
  • Science Rationale Man-made changes to land use
    cause changes in weather patterns, especially
    cumulus clouds
  • ADaM allows comparison of many different
    classification techniques based on accuracy of
    detection and amount of time required to classify
  • Best algorithm can be used to create cloud mask
    product

Original
GLRL
Association Rules
GLCM
38
Mining on Data Ingest Tropical Cyclone Detection
Advanced Microwave Sounding Unit (AMSU-A) Data
  • Mining Plan
  • Water cover mask to eliminate land
  • Laplacian filter to compute temperature gradients
  • Science Algorithm to estimate wind speed
  • Contiguous regions with wind speeds above a
    desired threshold identified
  • Additional test to eliminate false positives
  • Maximum wind speed and location produced

Further Analysis
Calibration/ Limb Correction/ Converted to Tb
Knowledge Base
Data Archive
Hurricane Floyd
Mining Environment
Result
Results are placed on the web, made available to
National Hurricane Center Joint Typhoon
Warning Center, and stored for further analysis
http//pm-esip.msfc.nasa.gov/cyclone
39
Mining Model Data
  • To advance its capacity in information extraction
    from models, the Global Modeling and Assimilation
    Office at GSFC, ITSC and Simpson Weather
    Associates propose to apply data mining
    frameworks for the analysis and information
    extraction of numerical model output data
    generated or archived at the GMAO. This will be
    done by conducting experiments focusing on the
    automated detection and mining of atmospheric
    phenomena relationships within the model data.
  • Tropical Cyclone Identification
  • The heuristic procedure considered all tropical
    ocean pixels and accepted those that
  • Had surface pressure below a certain threshold
    (990)
  • Had vorticity above a certain threshold (15)
  • As an alternative to the heuristic procedure, a
    clustering algorithm was used to derive the
    signature of the cyclones
  • Using pressure, vorticity
  • Using pressure, vorticity, temperature, cloud
    total
  • Using pressure, vorticity, cloud low

Sea Level Pressure Global Map
Wind Vector Overlay - Detail
40
Global Hydrology Resource Center Data Systems
and Services for Earth Science
  • Practical application of information technology
    research through the Global Hydrology Resource
    Center
  • Provide data and advanced information science
    applications to the science community, thereby
    enabling research and discovery
  • LIS SCF (1997 )
  • Passive Microwave ESIP (1998 2004 )
  • AMSR-E SIPS (1999 )
  • MSFC DAAC (1992 1997)
  • ESIP Federation Web Site (1999 )
  • CAMEX 3, 4 (Fall 1998, Fall 2001)
  • ACES (Aug/Sept 2002)
  • SERVIR (2003 2008)
  • DISCOVER (2003 2008)

41
Event-driven Data Delivery Example
Satellite Data
Event detection triggers dynamic packaging of
related suite of data products, delivered
immediately to subscribers and made available to
other users on the web
Science User
Delivery
Subset and Packaging Services
Others
AMSR-E
SSM/I
TMI
Quik SCAT
Calibration/ Limb Correction/ Converted to Tb
Notification
Linked from Event Page
Near-Real-Time Mining for Events
Result
Results are placed on the web, made available to
National Hurricane Center Joint Typhoon
Warning Center, and stored for further analysis
http//pm-esip.nsstc.nasa.gov
42
Chained ED3 Distributed Services
Data Delivery To Science User
Event Listener
Package (Linux)
Subset (Linux)
Trigger User Subscription
Data Streams
Reformat (Linux)
Data Archive
Reader (Windows)
Data Files
ESML Lib
ESML
Knowledge Base
Data Files
43
Related Projects
  • SERVIR (NASA REASoN CAN)
  • Providing NASA ESE data products and technology
    to produce and distribute accurate and timely
    decision support and environmental monitoring
    data products for Central America and the
    Mesoamerican Biological Corridor
  • ITSC role Data and information system
    development, generation of decision support
    products, environmental monitoring, 3-D
    visualization of data for national leaders and
    training.
  • http//servir.nsstc.nasa.gov/
  • LEAD (NSF Large ITR)
  • An integrated, scalable framework for
    identifying, accessing, preparing, assimilating,
    predicting, managing, analyzing, mining and
    visualizing a broad array of meteorological data
    and model output, independent of format and
    physical location
  • ITSC role Mining tools and analyses, ESML
    interchange technologies, data and service
    semantics
  • http//lead.ou.edu/

44
Website Home Page
45
SIAM SERVIR
Regional Coordination Standards
Sponsors
Training Capacity Building
Scientific Support
Env. Monitor Decision Sup. Applications
Climate Change Modeling
Data Archive Distribution Visualization
SERVIR
1
1
1
NASA/WB/CCAD Project
1
1
2
2
2
NASA/USAID Cambio Climatico
2
2
3
3
1
NASA Funded Collaborators
Externally Funded Collaborators
46
  • Partners Oklahoma - K Droegemeier, UAH S
    Graves, Colorado State V Chandrasekar,
    Illinois/NCSA R Wilhelmson, Indiana D Gannon,
    UCAR/Unidata M Ramamurthy, Howard E Joseph,
    Millersville R Clark
  • ITSC Contributions

MyLEAD Portal
MyLEAD Virtual Environment
Interchange
Workflow
Semantics for data
Personal Data Space
Technologies
Orchestration
and services
Application Services
Visualization
Data Mining
Models
Others
tools
Middleware
Data Management
Workflow Management
Monitoring
Grid and Web infrastructure
Resource
Scheduling
Security
Others
Allocation
47
On-Board Real-Time Processing Sensor
Control/Targeting
EVE Environment for On-board Processing
  • Anomaly detection
  • Data Mining
  • Autonomous Decision Making
  • Immediate response
  • Direct satellite to Earth delivery of results

www.itsc.uah.edu/eve
48
A Reconfigurable Web of Interacting Sensors
Communications
Weather
Satellite Constellations
Military
Ground Network
Ground Network
Ground Network
49
Example Application of EVE TechnologyLightning
Detection During Tornadic Activity
2) The Ground Station uploads the plan to
multiple on-board platforms
1) The user creates a mining plan using the EVE
editor
3) On-board Platform 1 uses its sensor to watch
for lightning events
4) Platform 1 notifies Platform 2 of the event
5) Platform 2 requests subsetting web services
from an NSSTC server
6) The results are sent back to Platform 1 for
display and further processing
50
Background Slides
51
ISDS in a Nutshell
Success will ultimately require a far more
comprehensive and sophisticated integration of
data from the earth and life sciences than is
currently possible, and will also require that
the multidisciplinary teams who deal with
invasive species issues have far better access to
heterogeneous data resources than is currently
available. That is the key problem that we hope
to address in building the Invasive Species Data
Service.
52
How it fits together
The NASA Office of Earth Science and the US
Geological Survey are developing a National
Invasive Species Forecasting System for the
management and control of invasive species on all
Department of Interior and adjacent lands
(Schnase et al., 2002a,b). The forecasting system
will be the first major client of the Invasive
Species Data Service, and the heterogeneous data
ingest need of the system is the principal design
driver for the ISDS. To clarify the relationship
between these components and the relevance of the
proposed work, we first describe the Invasive
Species Forecasting System then show how its data
ingest requirements define a new class of data
services that we intend to introduce by creating
the Invasive Species Data Service.
53
Relationship to ISFS
Context diagram for the Invasive
Species Forecasting System. The proposed Invasive
Species Data Service would complete the data
resource connection shown in the lower left oval.
ISDS
Write a Comment
User Comments (0)
About PowerShow.com