Discovery Systems: Accelerating Scientific Discovery at NASA - PowerPoint PPT Presentation

About This Presentation
Title:

Discovery Systems: Accelerating Scientific Discovery at NASA

Description:

Discovery Systems: Accelerating Scientific Discovery at NASA Barney Pell, Ph.D. NASA Ames Research Center Barney.D.Pell _at__at_ nasa.gov Presentation at IAAI-04 panel on ... – PowerPoint PPT presentation

Number of Views:222
Avg rating:3.0/5.0
Slides: 19
Provided by: isiEdugi
Learn more at: https://www.isi.edu
Category:

less

Transcript and Presenter's Notes

Title: Discovery Systems: Accelerating Scientific Discovery at NASA


1
Discovery Systems Accelerating Scientific
Discovery at NASA
  • Barney Pell, Ph.D.
  • NASA Ames Research Center
  • Barney.D.Pell _at__at_ nasa.gov
  • Presentation at IAAI-04 panel on The Broader Role
    of Artificial Intelligence in Large-Scale
    Scientific Research

2
Outline of Talk
  • Trends and Challenges affecting Scientific
    Discovery at NASA
  • Distributed Data Search, Access, and Analysis
  • Machine-Assisted Model Discovery and Refinement
  • Exploratory Environments and Collaboration
  • Vision for the future and summary of AI
    technologies
  • Closing remarks

3
Science Discovery Acceleration
  • NASA conducts missions to take measurements that
    produce large amounts of data to support
    ambitious science goals
  • In-situ observation of deep space for origin and
    evolution of life
  • Earth-orbiting satellites for global cause and
    effect relationships
  • Biological experiments to support life in space
  • Too much work and expertise required to perform
    each of many steps in a discovery cycle to
    understand this data
  • Detailed knowledge of the heritage of data and
    models
  • Hard to invert through a complex processing
    pipeline
  • Constant reprocessing and reanalyzing as new info
    available
  • The specialized expertise slows the process and
    also restricts the set of users and scientists
    using NASA products

4
Discovery Steps and Architectures
  • Examples of discovery steps
  • - finding and organizing distributed data
  • - assessing, filtering, cleaning and
    post-processing the data
  • - reconciling the differences across diverse data
  • - exploring the data sets to discover
    regularities
  • - using the regularities to formulate and
    evaluate hypotheses
  • - testing the hypotheses and comparing alternate
    hypotheses against each other
  • - integrating the data into models
  • - linking separate models together
  • - running simulations to generate predictive data
    to compare against observations
  • Current technology programs addressing
    difficulties of individual steps, typically in
    isolation
  • Eg. machine-learning algorithms detect
    regularities in underlying phenomena but also
    artifacts of the data collection/processing
    system.
  • ML algorithms developed without consideration of
    the deeper processes by which the data is
    generated, distributed, and used
  • Data system put together without characterizing
    the data stream to enable new users to analyze
    the data in unanticipated ways.

5
Trends affecting NASA
  • Improvements in sensors, communications, and
    computing
  • orders of magnitude more data, in more varieties,
    and at higher rates than ever before.
  • NASAs science questions are becoming
    increasingly large-scale and interdisciplinary.
  • forming and evaluating theories across a wide
    variety of data
  • integrating a complex set of models produced by
    diverse communities of scientists
  • virtual projects comprising distributed teams
  • Socioeconomic demands are requiring increased
    quality
  • Eg. many customers for weather and climate model
    predictions
  • Need characterization of confidence in data,
    models, results
  • Faster feedback loops in observing/simulation
    systems
  • make it possible to gather more precise data,
    often in real-time, if only we could understand
    the existing data quickly enough.
  • NASA required to enable public access and benefit
    from the data to the same extent as the mission
    science team

6
Distributed Search, Access and Analysis
  • Objective
  • Develop and demonstrate technologies to enable
    investigating interdisciplinary science questions
    by finding, integrating, and composing models and
    data from distributed archives, pipelines
    running simulations, and running instruments.
  • Support interactive and complex query-formulation
    with constraints and goals in the queries and
    resource-efficient intelligent execution of these
    tasks in a resource-constrained environment.
  • Milestone Enable novel what-if and predictive
    question answering
  • Across NASAs complex and heterogeneous data and
    simulations
  • By non data-specialists
  • Use world-knowledge and meta-data
  • Support query formulation and resource discovery
  • Example query Within 20, what will be the
    water runoff in the creeks of the Comanche
    National Grassland if we seed the clouds over
    southern Colorado in July and August next year?

7
Terrestrial Biogeoscience Involves Many Complex
Processes and Data
Chemistry CO2, CH4, N2O ozone, aerosols
Climate Temperature, Precipitation, Radiation,
Humidity, Wind
Heat Moisture Momentum
CO2 CH4 N2O VOCs Dust
Minutes-To-Hours
Biogeophysics
Biogeochemistry
Carbon Assimilation
Aero- dynamics
Decomposition
Water
Energy
Mineralization
Microclimate Canopy Physiology
Phenology
Hydrology
Inter- cepted Water
Bud Break
Soil Water
Days-To-Weeks
Snow
Leaf Senescence
Evaporation Transpiration Snow Melt Infiltration R
unoff
Gross Primary Production Plant
Respiration Microbial Respiration Nutrient
Availability
Species Composition Ecosystem Structure Nutrient
Availability Water
Years-To-Centuries
Ecosystems Species Composition Ecosystem Structure
WatershedsSurface Water Subsurface
Water Geomorphology
Disturbance Fires Hurricanes Ice Storms Windthrows
Vegetation Dynamics
Hydrologic Cycle
(Courtesy Tim Killeen and Gordon Bonan, NCAR)
8
Solution Construction via Composing Models
modeledphenomenon
service interface required inputs,provided
outputs, data descriptions,events
climate model
binary data streams
snow melt metadata
Each model typically has acommunity of experts
thatdeal with the complexity of themodel and
its environment
surface watercommunity
parameterizedphenomenon
rainfall
Nat. WeatherService
modeledphenomenon
modeledphenomenon
USGS
9
Virtual Data Grid Example
Application Three data types of interest ? is
derived from ?, ? is derived from ?, which is
primary data(interaction and and operations
proceed left to right)
Need ?
Have ?
Request ?
Need ?
Proceed?
How to generate ?(? is at ?LFN)
Estimate for generating ?
? is known. Contact Materialized Data Catalogue.
Need ?
Abstract Planner(for materializing data)
Concrete Planner(generates workflow)
MetadataCatalogue
Need ?
Exact steps to generate ?
Resolve?LFN
Materialize ?with ?PERS
Grid workflow engine
?PFN
? ismaterializedat ?LFN
Need tomaterialize ?
Virtual Data Catalogue(how to generate ? and ?)
Grid compute resources
Materialized Data Catalogue
Data Grid replica services
Inform that ?is materialized
LFN logical file name PFN physical file
name PERS prescription for generating
unmaterialized data
Store an archival copy, if so requested. Record
existence of cached copies.
Grid storage resources
As illustrated, easy to deadlock w/o QoS and SLAs.
10
Machine assisted model discovery and refinement
  • Develop and demonstrate methods to
  • assist discovery of and fit physically
    descriptive models with quantifiable uncertainty
    for estimation and prediction
  • improve the use of observational or experimental
    data for simulation and assimilation applied to
    distributed instrument systems (e.g. sensor web)
  • integrate instrument models with physical domain
    modeling and with other instruments (fusion) to
    quantify error, correct for noise, improve
    estimates and instrument performance.
  • Eg. Metrics
  • 50 reduction in scientist time forming models
  • 10 reduction in uncertainty in parameter
    estimates or a 10 reduction in effort to achieve
    current accuracies
  • 10 reduction in computational costs associated
    with a forward model
  • ability to process data on the order of 1000s of
    dimensions
  • ability to estimate parameters from tera-scale
    data.

11
Prediction of the 97/98 El Nino
JFM 1998 Predicted Precipitation
1997
1999
A reasonable 15 month prediction of the 97/98 El
Nino is achieved when ocean height, temperature
and surface wind data are combined to initialize
the model.
12
Observing System of the Future
  • Partners
  • NASA
  • DoD
  • Other Govt
  • Commercial
  • International
  • Advanced Sensors
  • Information Synthesis
  • Access to Knowledge
  • Sensor Web

User Community
Information
13
Exploratory Environments and Collaboration
  • Objective
  • Develop exploratory environments in which
    interdisciplinary and/or distributed teams
    visualize and interact with intelligently
    combined and presented data from such sources as
    distributed archives, pipelines, simulations, and
    instruments in networked environments.
  • Demonstrate that these environments measurably
    improve scientists capability to answer
    questions, evaluate models, and formulate
    follow-on questions and predictions.

14
Multi-parameter Explorations
15
(No Transcript)
16
Vision for future science
Technical Area Today Tomorrow
Distributed Data Search Access and Analysis Answering queries requires specialized knowledge of content, location, and configuration of all relevant data and model resources. Solution construction is manual. Search queries based on high-level requirements. Solution construction is mostly automated and accessible to users who arent specialists in all elements.
Machine integration of data / QA Publish a new resource takes 1-3 years. Assembling a consistent heterogeneous dataset takes 1-3 years. Automated data quality assessment by limits and rules. Publish a new resource takes 1 week. Assembling a consistent heterogeneous dataset in real-time. Automated data quality assessment by world models and cross-validation.
Machine Assisted Model Discovery and Refinement Physical models have hidden assumptions and legacy restrictions. Machine learning algorithms are separate from simulations, instrument models, and data manipulation codes. Prediction and estimation systems integrate models of the data collection instruments, simulation models, observational data formatting and conditioning capabilities. Predictions and estimates with known certainties.
Exploratory environments and collaboration Co-located interdisciplinary teams jointly visualize multi-dimensional preprocessed data or ensembles of running simulations on wall-sized matrixed displays. Distributed teams visualize and interact with intelligently combined and presented data from such sources as distributed archives, pipelines, simulations, and instruments in networked environments.
17
Discovery Systems AI Technology Elements
  • Distributed data search, access and analysis
  • Grid based computing and services
  • Information retrieval
  • Databases
  • Planning, execution, agent architecture,
    multi-agent systems
  • Knowledge representation and ontologies
  • Machine-assisted model discovery and refinement
  • Information and data fusion
  • Data mining and Machine learning
  • Modeling and simulation languages
  • Exploratory environments and Collaboration
  • Visualization
  • Human-computer interaction
  • Computer-supported collaborative work
  • Cognitive models of science

18
Closing remarks
  • NASA science is challenging
  • Need to improve in existing capabilities and
    address emerging trends
  • AI technologies have a crucial role for future
    science
  • Distributed Data Search, Access, and Analysis
  • Machine-Assisted Model Discovery and Refinement
  • Exploratory Environments and Collaboration
  • Many of these themes are shared with science (or
    research) at large
Write a Comment
User Comments (0)
About PowerShow.com