DataDriven Discovery through eScience Technologies - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

DataDriven Discovery through eScience Technologies

Description:

Working Definition of a Constellation or a Sensor Web (Sensor Network) ... A Neural Map View of Planetary Spectral Images. for Precision Data Mining ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 29
Provided by: drkirk
Category:

less

Transcript and Presenter's Notes

Title: DataDriven Discovery through eScience Technologies


1
Data-Driven Discovery through e-Science
Technologies
  • Kirk D. Borne

George Mason University and QSS Group Inc.,
NASA-Goddard Space Flight Center kborne_at_gmu.edu
or kirk.borne_at_gsfc.nasa.gov http//rings.gsfc.na
sa.gov/nvo_datamining.html
7/19/2006
2
Looking to NASAs Future -- Constellations of
Spacecraft Sensor Webs
  • Working Definition of a Constellation or a Sensor
    Web (Sensor Network) Spatially distributed
    network of individual vehicles, or assets, acting
    collaboratively as a single collective unit,
    exhibiting a common system-wide capability.
  • Constellation and Sensor Web operations include
    homogeneous sensor clusters, heterogeneous
    distributed sensors, multiple missions (space
    and ground).
  • Constellation missions and Sensor Webs/Networks
    will provide opportunities for
  • Application of on-board data processing
    functions.
  • Application of parallel data processing
    techniques.
  • Application of autonomous intelligent systems for
    mission operations.
  • Application of client-server architecture for
    inter-sensor communications, for
    constellation/network operations, and for active
    mission participation in a Virtual Observatory.
  • Application of interoperable systems techniques
    within the constellation, and with other sensors
    (space-based, ground-based, and/or virtual).
  • Application of data reduction (filtering) and
    code-shipping techniques.
  • Application of data mining for target of
    opportunity, event, anomaly detection.

3
(No Transcript)
4
Data Challenges for Space Mission Sciencecraft
of the Future
  • Data and Information Explosion massive volumes
    from single missions and from spacecraft suites,
    constellations, and armadas sciencecraft
  • Data Discovery within distributed data systems
    for science decision support
  • Transparent Access to Data across heterogeneous
    mission environments
  • Interoperability of systems, metadata, data,
    information, and knowledge
  • New Information Technology Infusion across
    multiple distributed systems
  • Data Fusion and Information Integration from
    multiple sources (spacecraft / missions /
    sciencecraft) across various scales and
    modalities
  • Information and Knowledge Sharing among
    cooperating science nodes
  • Intelligence within the sensor and measurement
    and data systems
  • Knowledge Representation and Ontology
    reconciliation across discipline-specific
    instruments
  • Semantic Knowledge Extraction and Retrieval
    from multiple mission science data streams
  • Knowledge Management, Sharing, and Reuse
    lessons learned remembered!

5
The New Face of Science 1
  • Big Data (usually distributed across systems)
  • High-Energy Particle Physics
  • Astronomy, Planetary, and Space Physics
  • Earth Observing System (Remote Sensing)
  • Human Genome and Bioinformatics
  • Numerical Simulations of any kind
  • Digital Libraries (electronic publication
    repositories)
  • e-Science
  • Built on Web Services (e-Gov, e-Biz) paradigm
  • Distributed heterogeneous data are the norm
  • Data integration across projects institutions
  • One-stop shopping The right data, right now.

6
The New Face of Science 2
  • Data and data systems are central to science.
  • Databases enable scientific discovery
  • Data Handling and Archiving (management of
    massive data resources)
  • Data Discovery (finding data wherever they exist)
  • Data Access (HTTP-Database interfaces)
  • Data/Metadata Browsing (serendipity)
  • Data Sharing and Reuse (within project teams and
    by other missions and programs scientific
    validation)
  • Data Fusion (across multiple modalities
    domains)
  • Data Integration (from multiple sources)
  • Knowledge Sharing Reuse (through ontologies and
    semantic representation in knowledgebases)
  • Data Mining (KDD Knowledge Discovery in
    Databases)

7
e-Science
  • Key Technologies
  • Data Mining
  • KDD Knowledge Discovery in Databases
  • Machine Learning
  • Distributed Data Discovery Access
  • Vx0s Virtual any science Observatories
  • Web Services
  • Grid Computing
  • The Semantic Web (ontologies)
  • Key Benefits
  • Provides seamless uniform discovery, access,
    mining, and analysis of distributed heterogeneous
    data sources ...
  • ... here, there, and beyond.
  • Find the right data, right now
  • Enables ... Data Information integration and
    fusion ...
  • ... across multiple distributed heterogeneous
    data collections ...
  • ... to enable scientific knowledge discovery ...
  • ... and decision support ...
  • ... (with minimal human assistance, or
    autonomously).
  • Provides intelligence within the data system.

8
Modeling of a business process, using an
event-driven process chain.
Science data ordered
Intelligent Database event-driven
valid request
invalid request
check data availability
data not available
data available
Steps in this process chain are business-related,
but you can easily transform these steps into
scientific decision support within a space
mission data system.
Design and implement data processing plan
Collect new data from sensors
Science data shipped
generate data products
Reference EPML Event-driven Process chain
Markup Language
product finished
Any more Requests?
http//xml.coverpages.org/ni2003-11-21-a.html
Order completed
Happy User!
9
Data Mining is the killer app for e-Science
  • Data Mining is Knowledge Discovery in Databases
    (KDD).
  • Data Mining is defined as an information
    extraction activity whose goal is to discover
    hidden facts contained in (large) databases.
  • Data Mining is the application of Machine
    Learning to large databases.
  • Machine Learning is defined as the application
    of computer algorithms that improve automatically
    through experience.

10
Data Mining Technique Example Decision Tree
Rule Induction
  • Decision Tree (DT) Construction is a form of data
    mining called Rule Induction.
  • The DT algorithm learns the rules from the
    database of historical records.
  • Picks predictors and their splitting values on
    the basis of an information gain metric
  • The difference between the amount of information
    that is needed to make the correct prediction
    both before and after the split has been made.
  • If the amount of information required is much
    lower after the split is made, then the split is
    said to have decreased the disorder (entropy) of
    the original data. This is good (i.e., the
    more ordered the data, then the more certain is
    our final classification.)
  • Similar to the game 20 Questions good
    questions provide good DT splits. For example
    adult asks Is it alive?, but child asks Is it
    my daddy?.
  • After the rules are learned (induction), they are
    then applied to new events (new records, or new
    data entries, ) to make predictions on unseen
    data. This can be applied in real-time in deep
    space missions.
  • Some databases have automated rule-learning
    algorithms built-in. These are called Inductive
    Databases, and they are data-driven.
  • Reference http//www.mli.gmu.edu/projects/idb.ht
    ml

11
Data Mining Technique Example Bayes Inference
Engine
  • The application of Bayes Theorem is a form of
    Inference (the act or process of deriving a
    conclusion based solely on what one already
    knows).
  • The Bayes Inference Engine (BIE) learns the
    likelihood of possible hypotheses (models, or
    classifications) being correct, for a given set
    of observational measurements, based upon the
    database of historical records.
  • The historical knowledgebase for a space mission
    is used to populate the data history (including
    known outcomes from prior experience) with
    particular measurement values of the observed
    features that correspond to particular events.
  • As new data arrive, the Bayes Inference Engine
    (BIE) estimates the most likely model (outcome,
    class, hypothesis) to describe what the
    spacecraft sees.
  • Following the interpretation of the new events,
    which are then properly labeled, these events
    (class labels) and their corresponding measured
    data values (evidence, features) add to the
    knowledgebase for the next application of the
    BIE.
  • Bayes Inference data-driven, learns as it goes,
    incremental learning, self-correcting, can test
    multiple hypotheses, unbiased by prior prejudice,
    model-independent inference, yields
    probabilistically ranked predictions, popular
    standard machine learning tool for optimal
    knowledge-based decision-making.
  • References
  • http//www.astro.cornell.edu/staff/loredo/bayes/
  • http//aisrp.nasa.gov/projects/5665da55.html

H C hypothesis or class E F evidence or
feature P(H E) probability of
hypothesis being correct, given the evidence
E. P(C Fk) probability of classification
C, given the set of measured features Fk.
12
Existing Space Science Data Infrastructure
  • The Recent Past many independent distributed
    heterogeneous data archives
  • Today VxOs Virtual Observatories
  • Web Services-enabled e-Science paradigm
    (middleware, standards, protocols)
  • Provides seamless uniform access to distributed
    heterogenous data sources
  • Find the right data, right now
  • One-stop shopping for all of your data needs
  • Emerging environment consists of many VxOs for
    example
  • NVO National Virtual Observatory (precursor to
    VAO Virtual Astro Obs)
  • VSO Virtual Solar Observatory
  • VSPO Virtual Space Physics Observatory
  • NVAO National Virtual Aeronomy Observatory
  • VITMO Virtual Ionospheric, Thermospheric,
    Magnetospheric Observatory
  • VHO Virtual Heliospheric Observatory
  • VMO Virtual Magnetospheric Observatory
  • Standards for data formats, data/metadata
    exchange, data models, registries, Web Services,
    VO queries, query results, semantics
  • And of course The Grid, Web Services,
    Semantic Web, etc. ...

13
Sun-Earth Space Environment Rich Source of
Heliophysical Phenomena
14
Multi-point Observations and Models of Space
Plasmas Deliver a Deluge of Physical Measurements
15
Data-Driven Knowledge Discovery
16
Space Weather Example Early Warning System for
Astronauts in Space
CME Coronal Mass Ejection SEP Solar Energetic
Particle
17
Data Mining in Action
  • Data Mining facilitates Intelligent Data
    Understanding (IDU).
  • Data Mining enables Decision Support and Active
    Control Systems.
  • IDU refers to the application of techniques for
    transforming data into scientific understanding.
  • Web reference http//is.arc.nasa.gov/IDU/index.h
    tml
  • IDU specifically refers to automating the
    following techniques for machine-assisted science
    data analysis
  • Data Mining (e.g., http//is.arc.nasa.gov/IDU/tas
    ks/NVODDM.html)
  • Knowledge Discovery
  • Machine Learning

18
Case Study - Mars Rovers
19
e-Science Data Mining Applications on Mars Rover
(1)
  • Rove around the surface of Mars and take samples
    of rocks (mass spectroscopy a data histogram)
  • Supervised Learning (search for rocks with known
    compositions)
  • Unsupervised Learning (discover what types of
    rocks are present, without preconceived biases)
  • Association Mining (find unusual associations)
  • Clustering (find the set of unique classes of
    rocks)
  • Classification (assign rocks to known classes)
  • Deviation/Outlier Detection (one-of-kind
    interesting?)

20
e-Science Data Mining Applications on Mars Rover
(2)
  • On-board Intelligent Data Understanding
    Decision Support Systems (Fuzzy Logic Decision
    Trees Cased-Based Reasoning ) Science Goal
    Monitoring
  • stay here and do more or else move on to
    another rock
  • send results to Earth immediately or send
    results later
  • Learn as it goes (Machine Learning Neural Nets)
  • Relate the results to other factors, such as dust
    storms (XML Information Retrieval Information
    Fusion with other data from orbiting satellite
    mother ship)
  • Predict where to go in order to find interesting
    rocks (Logistic Regression Case-Based Reasoning)

21
Mars Rover as an e-Science Data System
  • Decisions are based on data mined, prior
    experience, new knowledge, and the set of learned
    rules.
  • Rover acts autonomously, without human
    intervention, in Deep Space environment.
  • Actions are driven by mining actionable data from
    all sensors.

http//www.samsi.info/200506/astro/presentations/t
ut1loredo-7.pdf
22
Autonomous Mineral Detectors for Mars Rovers and
Landers
  • NASA / AISRP PI Martha Gilmore, Wesleyan
    University

Objective Design and develop software to
enable rovers to autonomously analyze spectral
data and identify data indicating geologically
important signatures. Motivation Both rover
and orbital missions can collect more data than
can be returned due to downlink restrictions.
Results Software is designed to allow onboard
processing of Vis/NIR spectra to identify and
select spectra that contain minerals of geologic
interest autonomously.
Non-carbonates
Carbonates
Credit M. Gilmore
23
A Neural Map View of Planetary Spectral Images
for Precision Data Mining and Rapid Resource
Identification
  • NASA / AISRP PI Erzsébet Merényi, Rice
    University

Uses advanced variants of the self-organized
machine learning paradigm Self-Organizing
Map, applied to spectral imagery. They detected
orthopyroxene and clinopyroxene dominated mineral
subclasses within a rare undifferentiated mineral
type nicknamed "black rock" by geologists. SOM
by eye! (SOM self-organizing map)
Credit E.Merenyi
24
Application of Machine Learning Technology to
Martian Geology
NASA / AISRP PI Ruye Wang, Harvey Mudd College
  • Machine Learning algorithms have been applied to
    the analysis of Themis (Thermal Emission Imaging
    System) image data of Mars, for the purpose of
    studying mountain ranges on Mars (the Thaumasia
    Highlands and Corprates rise).
  • Specifically, various clustering and
    classification algorithms (e.g., K-means,
    competitive neural network, support vector
    machine, Independent Components Analysis) have
    been applied to the Themis image data covering
    certain areas in the Thaumasia highlands.
  • Objectives
  • Develop an intelligent system for robust
    detection and accurate classification in
    multispectral remote sensing image data
  • Demonstrate system in context of Martian geology
    application

Credit R. Wang
25
Some e-Science Projects
  • International Virtual Observatory Alliance
  • The Thinking Telescope Project
  • Open Science Grid (OSG)
  • Biomedical Informatics Research Network (BIRN)
  • Network for Earthquake Engineering Simulation
    (NEES)
  • PlanetLab computer science testbed
  • Earth System Grid (ESG)
  • Department of Energy's Fusion Collaboratory
  • UK myGrid project
  • Enabling Grids for eScience in Europe (EGEE)
  • e-Framework for Education and Research
  • Global Earth Observation System of Systems (GEOSS)
  • http//www.ivoa.net/
  • http//www.thinkingtelescopes.lanl.gov/
  • http//www.opensciencegrid.org/
  • http//www.nbirn.net/
  • http//www.nees.org/
  • http//www.planet-lab.org/
  • https//www.earthsystemgrid.org/
  • http//www.fusiongrid.org/
  • http//www.mygrid.org.uk/
  • http//public.eu-egee.org/
  • http//www.e-framework.org/
  • http//www.epa.gov/geoss/

26
Astronomy Example The Thinking Telescope
Machine Learning Applications Automated Feature
Extraction Real-time identification of
artifacts and transients in direct and difference
images. Classifiers Automated classification of
celestial objects based on temporal and spectral
properties. Anomaly Detection Real-time
recognition of important deviations from normal
behavior for persistent sources.
Credit http//www.thinkingtelescopes.lanl.gov/
27
Sample e-Science Data Mining Use Cases
  • Discover data stored in distributed heterogeneous
    systems.
  • Search huge databases for trends and correlations
    in high-dimensional parameter spaces identify
    new properties or new classes of scientific
    objects.
  • Discover new linkages associations among data
    parameters.
  • Search for rare, one-of-a-kind, and exotic
    objects in huge databases.
  • Identify repeating patterns of temporal
    variations from millions or billions of
    observations.
  • Identify moving objects in huge image databases.
  • Identify parameter glitches / anomalies /
    deviations either in static databases (e.g.,
    archives) or in dynamic data (e.g., science /
    instrumental / engineering data streams).
  • Find clusters, nearest neighbors, outliers,
    and/or zones of avoidance in the distribution of
    objects or other observables in arbitrary
    parameter spaces.
  • Serendipitously explore huge scientific databases
    through access to distributed, autonomous,
    federated, heterogeneous, multi-experiment,
    multi-mission science data systems.

28
Applications of e-Science Data Mining Techniques
to the Space Mission I.T. Data System Environment
  • Archival research applications
  • cross-links between archives active mission
    data will offer improved data analysis,
    calibration, anomaly detection, and scientific
    discovery with active missions.
  • Decision support for selecting interesting
    targets for observation.
  • Identification of interesting events for rapid
    followup observation planning.
  • Real-time on-board decision-support functions,
    such as
  • the rapid analysis of large volumes of
    time-series data (engineering, telemetry, and
    science streams) in order to make decisions
    (about operations, maneuvers, and science
    observations) in deep space without human
    intervention.

7/19/2006
Write a Comment
User Comments (0)
About PowerShow.com