GEON IT Advances: Data Integration GEON Workbench Scientific Workflows PowerPoint PPT Presentation

presentation player overlay
1 / 45
About This Presentation
Transcript and Presenter's Notes

Title: GEON IT Advances: Data Integration GEON Workbench Scientific Workflows


1
GEON IT Advances? Data Integration? GEON
Workbench? Scientific Workflows
  • Bertram Ludäscher
  • Kai Lin
  • Ilkay Altintas
  • Efrat Jaeger

San Diego Supercomputer Center University of
California, San Diego
2
The Problem Scientific Data Integrationor
from Questions to Queries
3
Information Integration Challenges S4
Heterogeneities
  • Systems Integration
  • platforms, devices, data service distribution,
    APIs, protocols,
  • ? Grid middleware technologies
  • e.g. single sign-on, platform independence,
    transparent use of remote resources,
  • Syntax Structure
  • heterogeneous data formats (one for each tool
    ...)
  • heterogeneous data models (RDBs, ORDBs, OODBs,
    XMLDBs, flat files, )
  • heterogeneous schemas (one for each DB ...)
  • ? Database mediation technologies
  • XML-based data exchange, integrated views,
    transparent query rewriting,
  • Semantics
  • fuzzy metadata, terminology, hidden semantics,
    implicit assumptions,
  • ? Knowledge representation semantic mediation
    technologies
  • smart data discovery integration
  • e.g. ask about X (mafic) find data about Y
    (diorite) be happy anyways!

4
Information Integration Challenges S5
Heterogeneities
  • Synthesis of analysis pipelines, integrated apps
    data products,
  • How to make use of these wonderful things put
    them together to solve a scientists problem?
  • Scientific Problem Solving Environments
  • GEON Portal and Workbench (scientists view)
  • ontology-enhanced data registration, discovery,
    manipulation
  • creation and registration of new data products
    from existing ones,
  • GEON Scientific Workflow System (engineers
    view)
  • for designing, re-engineering, deploying
    analysis pipelines and scientific workflows a
    tool to make new tools
  • e.g., creation of new datasets from existing
    ones, dataset registration,

5
Ontology-Enabled Application ExampleGeologic
Map Integration
6
Querying by Geologic Age
7
Querying by Geologic Age Result
8
Querying by Chemical Composition (GSC)
9
Querying by Chemical Composition Results
Note the fine differences in shades of gray
DO know Its NOT there!
DONT know! (not registered)
OK we got to work on the color coding -)
10
Querying w/ British Rock Classification (BRC)
Uses a GSC ? BRC inter-ontology articulation
mapping
11
British Rock Classification Query Results
Uses a GSC ? BRC inter-ontology articulation
mapping
12
The Query Show sedimentary rocksThe Puzzle
Find the 17 differences in the results
13
Sedimentary Rocks BGS Ontology
14
Sedimentary Rocks GSC Ontology
15
Need for Knowledge-enabled Integration
  • A geologist analyzing chemical data from a pluton
    finds no recognizable correlation between
    variables.
  • What possible scenarios can he examine to
    understand this heterogeneity?
  • Measured ages also show a scatter
  • What is the significance of the observed spread
    in measure time?
  • Knowledge Representation
  • Research
  • concept maps ontologies
  • process maps ontologies
  • semantic types
  • to facilitate (even) smarter tools

16
A Prerequisite Resource Registration
  • (1a) Register ontologies
  • geologic age rock classifications (GSC, BGS),
    seismology
  • (1b) optionally register inter-ontology
    articulations
  • e.g. GSC ontology ? BGS ontology
  • (2a) Item-level dataset registration
  • ADN metadata other controlled vocabularies
    ontologies (e.g. geologic age
    timescale (USGS), SWEET (NASA), )
  • (2b) Item-detail registration
  • e.g. associate values in a column with a concept
  • (3) Use ontology-based query UI / application
  • e.g. query by geologic age and chemical
    composition

17
Demonstration Preview
  • NOTE A technology demonstration, not a content
  • demonstration (vocabulary, ontology,
    maps, )
  • Ontology Registration (geologicAge.owl)
  • Dataset Registration (myShapeFiles.zip)
  • Item-Level Association (1?2)
  • GEONsearch
  • metadata, spatial, temporal, concept-based
  • GEONworkbench
  • use of workspace e.g. composing new maps from
    existing ones
  • resume with GEON workflow overview

18
Demonstration Preview
User Access (via Portal)
19
Dataset to Ontology Registration (Item-level)
20
GEON Search Concept-based Querying ? Portal
Demonstration
21
Scientific Problem Solving Environments
  • GEON Portal and Workbench (scientists view)
  • ? previous demonstration
  • a workbench for using existing/integrated tools
  • Kepler Workflow System (engineers view)
  • for (semi-)automating scientific workflows and
    analysis pipelines
  • a tool for making and deploying new tools
  • some features
  • low-level plumbing to high-level conceptual
    flows
  • connect reusable components (actors, boxes)
    to form apps
  • abstraction via nesting of subworkflows into
    composite actors
  • deploy automated workflows on the Grid and/or
    with custom Uis
  • demonstrations available (Kepler2Go-1.? CD for
    Summer Institute)

22
A Kepler Scientific Workflow
inline documentation
canvas for design and execution monitoring
component (actor) libraries
23
GEON Dataset Extraction Processing
24
GEON Dataset Registration
25
GEON Dataset Registration
Registering
26
Putting it all together
27
GEON Workflows KEPLER
http//kepler-project.org
28
Using Kepler for Geological Data Integration
Workflows
  • Ilkay Altintas
  • presenting joint GEON work of
  • Efrat Jaeger Bertram Ludäscher
  • Kai Lin Ashraf Memon

San Diego Supercomputer Center University of
California, San Diego
29
Some Requirements for a Scientific Workflow
System (1/2)
  • it should work (No kidding!)
  • USER REQUIREMENTS
  • Design tools-- especially for non-expert users
  • Ease of use-- fairly simple user interface having
    more complex features hidden in the background
  • Reusable generic features
  • Generic enough to serve to different communities
    but specific enough to serve one domain (e.g.
    geosciences)
  • Extensibility for the expert user-- almost a
    visual programming interface
  • Registration and publication of data products and
    process products (workflows) provenance

30
Some Requirements for a Scientific Workflow
System (2/2)
  • TECHNICAL REQUIREMENTS
  • Error detection and recovery from failure
  • Logging information for each workflow
  • Allow data-intensive and compute-intensive tasks
  • (Maybe at the same time)
  • HPCX (From Dr. Bermans last GSM talk)
  • Allow status checks and on the fly updates
  • Visualization
  • Semantics and metadata
  • Certification, trust, security

Ask the experts in this room ?
31
Kepler is
  • a scientific workflow system
  • a cross-project collaboration
  • New contributing partners
  • Cheminformatics Resurgence (Kim Baldridge et
    al.)
  • Life Sciences EOL (Mark Miller et al.)
  • Data Mining SKIDL (Tony Fountain et al.)
  • Neuroinformatics BIRN (coming)
  • an emerging open source tool for scientific
    discovery workflows

Kepler 1.0 alpha release ? Summer Institute
www.geongrid.org
31
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
32
Some Recent Actor Additions
Browser-based user interface
Queries Transformations
Generic WS Invocation
SQL Queries
File Transfer
SMTP-based messaging
SRB Access
CommandLine Execution
Globus Job Execution
Real-time data streaming
33
Web Services ? Actors (WS Harvester)
1
2
4
3
  • ?Minute-made (MM) WS-based application
    integration
  • Similarly MM workflow design sharing w/o
    implemented components

34
GEON Contributions to Kepler
  • System demonstration
  • Using Kepler Features
  • GEON workflows in detail
  • Dataset Registration Model
  • Processing Datasets on the Fly and Registering
    with the GEONworkbench

35
Conclusions
  • Evolving system GEON is a significant
    contributor
  • Plans for new generic and project-specific
    extensions
  • Second alpha release available as CD
  • Installers for Windows, Linux, MacOSX
  • Daily version tests and JWS installer generation
  • User manuals and developer documentation is
    coming soon!
  • More next week during the Summer Institute
  • Kepler project website http//kepler-project.org
  • Thanks!

36
GEON IT Advances? Data Integration? GEON
Workbench? Scientific Workflows
E N D
  • Bertram Ludäscher
  • Kai Lin
  • Ilkay Altintas
  • Efrat Jaeger

San Diego Supercomputer Center UC San Diego
37
Related Publications
  • Semantic Data Registration and Integration
  • On Integrating Scientific Resources through
    Semantic Registration, S. Bowers, K. Lin, and B.
    Ludäscher, 16th International Conference on
    Scientific and Statistical Database Management
    (SSDBM'04), 21-23 June 2004, Santorini Island,
    Greece.
  • A System for Semantic Integration of Geologic
    Maps via Ontologies, K. Lin and B. Ludäscher. In
    Semantic Web Technologies for Searching and
    Retrieving Scientific Data (SCISW), Sanibel
    Island, Florida, 2003.
  • Towards a Generic Framework for Semantic
    Registration of Scientific Data, S. Bowers and B.
    Ludäscher. In Semantic Web Technologies for
    Searching and Retrieving Scientific Data (SCISW),
    Sanibel Island, Florida, 2003.
  • The Role of XML in Mediated Data Integration
    Systems with Examples from Geological (Map) Data
    Interoperability, B. Brodaric, B. Ludäscher, and
    K. Lin. In Geological Society of America (GSA)
    Annual Meeting, volume 35(6), November 2003.
  • Semantic Mediation Services in Geologic Data
    Integration A Case Study from the GEON Grid, K.
    Lin, B. Ludäscher, B. Brodaric, D. Seber, C.
    Baru, and K. A. Sinha. In Geological Society of
    America (GSA) Annual Meeting, volume 35(6),
    November 2003.
  • Query Planning and Rewriting
  • Processing First-Order Queries under Limited
    Access Patterns, Alan Nash and B. Ludäscher,
    Proc. 23rd ACM Symposium on Principles of
    Database Systems (PODS'04) Paris, France, June
    2004.
  • Processing Unions of Conjunctive Queries with
    Negation under Limited Access Patterns, Alan Nash
    and B. Ludäscher., 9th Intl. Conference on
    Extending Database Technology (EDBT'04)
    Heraklion, Crete, Greece, March 2004, LNCS 2992.
  • Web Service Composition Through Declarative
    Queries The Case of Conjunctive Queries with
    Union and Negation, B. Ludäscher and Alan Nash.
    Research abstract (poster), 20th Intl. Conference
    on Data Engineering (ICDE'04) Boston, IEEE
    Computer Society, April 2004.

38
Related Publications
  • Scientific Workflows
  • Kepler An Extensible System for Design and
    Execution of Scientific Workflows, I. Altintas,
    C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, S.
    Mock, 16th International Conference on Scientific
    and Statistical Database Management (SSDBM'04),
    21-23 June 2004, Santorini Island, Greece.
  • Kepler Towards a Grid-Enabled System for
    Scientific Workflows, Ilkay Altintas, Chad
    Berkley, Efrat Jaeger, Matthew Jones, Bertram
    Ludäscher, Steve Mock, Workflow in Grid Systems
    (GGF10), Berlin, March 9th, 2004.
  • An Ontology-Driven Framework for Data
    Transformation in Scientific Workflows, S. Bowers
    and B. Ludäscher, Intl. Workshop on Data
    Integration in the Life Sciences (DILS'04), March
    25-26, 2004 Leipzig, Germany, LNCS 2994.
  • A Web Service Composition and Deployment
    Framework for Scientific Workflows, I. Altintas,
    E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In
    the 2nd Intl. Conference on Web Services (ICWS),
    San Diego, California, July 2004.

39
Additional Material (for questions etc)
40
Multi-Hierarchical Rock Classification System
(GSC) a target ontology (after conversion to
OWL) for geologic map registration
Genesis
Fabric
Composition
Texture
41
Inside Ontology-Enabled Map Integration
User Show formations from Cenozoic!
Age Ontology
Cenozoic
Query Rewriting
Quaternary
Tertiary
select FORMATION where AGETertiary or
AGEQuaternary
PERIOD
FORMATION
LITHOLOGY
PERIOD
ABBREV
Arizona
Montana West
Map Rendering
Color Definition
42
Data Source Wrapping and Integration
ABBREV
Arizona
PERIOD
FORMATION
Idaho
AGE
NAME
Colorado
PERIOD
LITHOLOGY
Utah
TYPE
PERIOD
Nevada
FMATN
TIME_UNIT
Wyoming
NAME
Livingston formation
FORMATION
PERIOD
Tertiary-Cretaceous
Montana West
AGE
New Mexico
NAME
PERIOD
LITHOLOGY
andesitic sandstone
Montana East
FORMATION
PERIOD
43
Gravity Modeling Design Workflow
  • Idea Comparing observed synthetic gravity
    models
  • Steps
  • Extracting and merging gravity depths from
    heterogeneous data sources for a Lat/Lon bounding
    box (databases, web services).
  • Projecting and interpolating data sources into
    the same coordinate systems.
  • Differencing observed and synthetic models.
  • Displaying Differential raster image.

44
Grid Interpolation
  • Interpolating queried gravity data on the grid
    and displaying it using a color schema.
  • Currently IDW interpolation algorithm supported.
    Future plans Minimum Curvature, TIN, Kriging
    and Spline.
  • Output either ascii x,y,z,p or ESRI ascii grid
    format.
  • Display using global mapper service.

45
Gravity Modeling Design Workflow
Write a Comment
User Comments (0)
About PowerShow.com