Title: GEON IT Advances: Data Integration GEON Workbench Scientific Workflows
1GEON IT Advances? Data Integration? GEON
Workbench? Scientific Workflows
- Bertram Ludäscher
- Kai Lin
- Ilkay Altintas
- Efrat Jaeger
San Diego Supercomputer Center University of
California, San Diego
2The Problem Scientific Data Integrationor
from Questions to Queries
3Information Integration Challenges S4
Heterogeneities
- Systems Integration
- platforms, devices, data service distribution,
APIs, protocols, - ? Grid middleware technologies
- e.g. single sign-on, platform independence,
transparent use of remote resources, - Syntax Structure
- heterogeneous data formats (one for each tool
...) - heterogeneous data models (RDBs, ORDBs, OODBs,
XMLDBs, flat files, ) - heterogeneous schemas (one for each DB ...)
- ? Database mediation technologies
- XML-based data exchange, integrated views,
transparent query rewriting, - Semantics
- fuzzy metadata, terminology, hidden semantics,
implicit assumptions, - ? Knowledge representation semantic mediation
technologies - smart data discovery integration
- e.g. ask about X (mafic) find data about Y
(diorite) be happy anyways!
4Information Integration Challenges S5
Heterogeneities
- Synthesis of analysis pipelines, integrated apps
data products, - How to make use of these wonderful things put
them together to solve a scientists problem? - Scientific Problem Solving Environments
- GEON Portal and Workbench (scientists view)
- ontology-enhanced data registration, discovery,
manipulation - creation and registration of new data products
from existing ones, - GEON Scientific Workflow System (engineers
view) - for designing, re-engineering, deploying
analysis pipelines and scientific workflows a
tool to make new tools - e.g., creation of new datasets from existing
ones, dataset registration,
5Ontology-Enabled Application ExampleGeologic
Map Integration
6Querying by Geologic Age
7Querying by Geologic Age Result
8Querying by Chemical Composition (GSC)
9Querying by Chemical Composition Results
Note the fine differences in shades of gray
DO know Its NOT there!
DONT know! (not registered)
OK we got to work on the color coding -)
10Querying w/ British Rock Classification (BRC)
Uses a GSC ? BRC inter-ontology articulation
mapping
11British Rock Classification Query Results
Uses a GSC ? BRC inter-ontology articulation
mapping
12The Query Show sedimentary rocksThe Puzzle
Find the 17 differences in the results
13Sedimentary Rocks BGS Ontology
14Sedimentary Rocks GSC Ontology
15Need for Knowledge-enabled Integration
- A geologist analyzing chemical data from a pluton
finds no recognizable correlation between
variables. - What possible scenarios can he examine to
understand this heterogeneity? - Measured ages also show a scatter
- What is the significance of the observed spread
in measure time?
- Knowledge Representation
- Research
- concept maps ontologies
- process maps ontologies
- semantic types
- to facilitate (even) smarter tools
16A Prerequisite Resource Registration
- (1a) Register ontologies
- geologic age rock classifications (GSC, BGS),
seismology - (1b) optionally register inter-ontology
articulations - e.g. GSC ontology ? BGS ontology
- (2a) Item-level dataset registration
- ADN metadata other controlled vocabularies
ontologies (e.g. geologic age
timescale (USGS), SWEET (NASA), ) - (2b) Item-detail registration
- e.g. associate values in a column with a concept
- (3) Use ontology-based query UI / application
- e.g. query by geologic age and chemical
composition
17Demonstration Preview
- NOTE A technology demonstration, not a content
- demonstration (vocabulary, ontology,
maps, ) - Ontology Registration (geologicAge.owl)
- Dataset Registration (myShapeFiles.zip)
- Item-Level Association (1?2)
- GEONsearch
- metadata, spatial, temporal, concept-based
- GEONworkbench
- use of workspace e.g. composing new maps from
existing ones - resume with GEON workflow overview
18Demonstration Preview
User Access (via Portal)
19Dataset to Ontology Registration (Item-level)
20GEON Search Concept-based Querying ? Portal
Demonstration
21Scientific Problem Solving Environments
- GEON Portal and Workbench (scientists view)
- ? previous demonstration
- a workbench for using existing/integrated tools
- Kepler Workflow System (engineers view)
- for (semi-)automating scientific workflows and
analysis pipelines - a tool for making and deploying new tools
- some features
- low-level plumbing to high-level conceptual
flows - connect reusable components (actors, boxes)
to form apps - abstraction via nesting of subworkflows into
composite actors - deploy automated workflows on the Grid and/or
with custom Uis - demonstrations available (Kepler2Go-1.? CD for
Summer Institute)
22A Kepler Scientific Workflow
inline documentation
canvas for design and execution monitoring
component (actor) libraries
23GEON Dataset Extraction Processing
24GEON Dataset Registration
25GEON Dataset Registration
Registering
26Putting it all together
27GEON Workflows KEPLER
http//kepler-project.org
28Using Kepler for Geological Data Integration
Workflows
- Ilkay Altintas
- presenting joint GEON work of
- Efrat Jaeger Bertram Ludäscher
- Kai Lin Ashraf Memon
San Diego Supercomputer Center University of
California, San Diego
29Some Requirements for a Scientific Workflow
System (1/2)
- it should work (No kidding!)
- USER REQUIREMENTS
- Design tools-- especially for non-expert users
- Ease of use-- fairly simple user interface having
more complex features hidden in the background - Reusable generic features
- Generic enough to serve to different communities
but specific enough to serve one domain (e.g.
geosciences) - Extensibility for the expert user-- almost a
visual programming interface - Registration and publication of data products and
process products (workflows) provenance
30Some Requirements for a Scientific Workflow
System (2/2)
- TECHNICAL REQUIREMENTS
- Error detection and recovery from failure
- Logging information for each workflow
- Allow data-intensive and compute-intensive tasks
- (Maybe at the same time)
- HPCX (From Dr. Bermans last GSM talk)
- Allow status checks and on the fly updates
- Visualization
- Semantics and metadata
- Certification, trust, security
Ask the experts in this room ?
31Kepler is
- a scientific workflow system
- a cross-project collaboration
- New contributing partners
- Cheminformatics Resurgence (Kim Baldridge et
al.) - Life Sciences EOL (Mark Miller et al.)
- Data Mining SKIDL (Tony Fountain et al.)
- Neuroinformatics BIRN (coming)
- an emerging open source tool for scientific
discovery workflows
Kepler 1.0 alpha release ? Summer Institute
www.geongrid.org
31
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
32Some Recent Actor Additions
Browser-based user interface
Queries Transformations
Generic WS Invocation
SQL Queries
File Transfer
SMTP-based messaging
SRB Access
CommandLine Execution
Globus Job Execution
Real-time data streaming
33Web Services ? Actors (WS Harvester)
1
2
4
3
- ?Minute-made (MM) WS-based application
integration - Similarly MM workflow design sharing w/o
implemented components
34GEON Contributions to Kepler
- System demonstration
- Using Kepler Features
- GEON workflows in detail
- Dataset Registration Model
- Processing Datasets on the Fly and Registering
with the GEONworkbench
35Conclusions
- Evolving system GEON is a significant
contributor - Plans for new generic and project-specific
extensions - Second alpha release available as CD
- Installers for Windows, Linux, MacOSX
- Daily version tests and JWS installer generation
- User manuals and developer documentation is
coming soon! - More next week during the Summer Institute
- Kepler project website http//kepler-project.org
- Thanks!
36GEON IT Advances? Data Integration? GEON
Workbench? Scientific Workflows
E N D
- Bertram Ludäscher
- Kai Lin
- Ilkay Altintas
- Efrat Jaeger
San Diego Supercomputer Center UC San Diego
37Related Publications
- Semantic Data Registration and Integration
- On Integrating Scientific Resources through
Semantic Registration, S. Bowers, K. Lin, and B.
Ludäscher, 16th International Conference on
Scientific and Statistical Database Management
(SSDBM'04), 21-23 June 2004, Santorini Island,
Greece. - A System for Semantic Integration of Geologic
Maps via Ontologies, K. Lin and B. Ludäscher. In
Semantic Web Technologies for Searching and
Retrieving Scientific Data (SCISW), Sanibel
Island, Florida, 2003. - Towards a Generic Framework for Semantic
Registration of Scientific Data, S. Bowers and B.
Ludäscher. In Semantic Web Technologies for
Searching and Retrieving Scientific Data (SCISW),
Sanibel Island, Florida, 2003. - The Role of XML in Mediated Data Integration
Systems with Examples from Geological (Map) Data
Interoperability, B. Brodaric, B. Ludäscher, and
K. Lin. In Geological Society of America (GSA)
Annual Meeting, volume 35(6), November 2003. - Semantic Mediation Services in Geologic Data
Integration A Case Study from the GEON Grid, K.
Lin, B. Ludäscher, B. Brodaric, D. Seber, C.
Baru, and K. A. Sinha. In Geological Society of
America (GSA) Annual Meeting, volume 35(6),
November 2003. - Query Planning and Rewriting
- Processing First-Order Queries under Limited
Access Patterns, Alan Nash and B. Ludäscher,
Proc. 23rd ACM Symposium on Principles of
Database Systems (PODS'04) Paris, France, June
2004. - Processing Unions of Conjunctive Queries with
Negation under Limited Access Patterns, Alan Nash
and B. Ludäscher., 9th Intl. Conference on
Extending Database Technology (EDBT'04)
Heraklion, Crete, Greece, March 2004, LNCS 2992. - Web Service Composition Through Declarative
Queries The Case of Conjunctive Queries with
Union and Negation, B. Ludäscher and Alan Nash.
Research abstract (poster), 20th Intl. Conference
on Data Engineering (ICDE'04) Boston, IEEE
Computer Society, April 2004.
38Related Publications
- Scientific Workflows
- Kepler An Extensible System for Design and
Execution of Scientific Workflows, I. Altintas,
C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, S.
Mock, 16th International Conference on Scientific
and Statistical Database Management (SSDBM'04),
21-23 June 2004, Santorini Island, Greece. - Kepler Towards a Grid-Enabled System for
Scientific Workflows, Ilkay Altintas, Chad
Berkley, Efrat Jaeger, Matthew Jones, Bertram
Ludäscher, Steve Mock, Workflow in Grid Systems
(GGF10), Berlin, March 9th, 2004. - An Ontology-Driven Framework for Data
Transformation in Scientific Workflows, S. Bowers
and B. Ludäscher, Intl. Workshop on Data
Integration in the Life Sciences (DILS'04), March
25-26, 2004 Leipzig, Germany, LNCS 2994. - A Web Service Composition and Deployment
Framework for Scientific Workflows, I. Altintas,
E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In
the 2nd Intl. Conference on Web Services (ICWS),
San Diego, California, July 2004.
39Additional Material (for questions etc)
40Multi-Hierarchical Rock Classification System
(GSC) a target ontology (after conversion to
OWL) for geologic map registration
Genesis
Fabric
Composition
Texture
41Inside Ontology-Enabled Map Integration
User Show formations from Cenozoic!
Age Ontology
Cenozoic
Query Rewriting
Quaternary
Tertiary
select FORMATION where AGETertiary or
AGEQuaternary
PERIOD
FORMATION
LITHOLOGY
PERIOD
ABBREV
Arizona
Montana West
Map Rendering
Color Definition
42Data Source Wrapping and Integration
ABBREV
Arizona
PERIOD
FORMATION
Idaho
AGE
NAME
Colorado
PERIOD
LITHOLOGY
Utah
TYPE
PERIOD
Nevada
FMATN
TIME_UNIT
Wyoming
NAME
Livingston formation
FORMATION
PERIOD
Tertiary-Cretaceous
Montana West
AGE
New Mexico
NAME
PERIOD
LITHOLOGY
andesitic sandstone
Montana East
FORMATION
PERIOD
43Gravity Modeling Design Workflow
- Idea Comparing observed synthetic gravity
models - Steps
- Extracting and merging gravity depths from
heterogeneous data sources for a Lat/Lon bounding
box (databases, web services). - Projecting and interpolating data sources into
the same coordinate systems. - Differencing observed and synthetic models.
- Displaying Differential raster image.
44Grid Interpolation
- Interpolating queried gravity data on the grid
and displaying it using a color schema. - Currently IDW interpolation algorithm supported.
Future plans Minimum Curvature, TIN, Kriging
and Spline. - Output either ascii x,y,z,p or ESRI ascii grid
format. - Display using global mapper service.
45Gravity Modeling Design Workflow