Title: An Ontology-Driven Framework for Data Transformation in Scientific Workflows
1An Ontology-Driven Framework for Data
Transformation in Scientific Workflows
- Shawn Bowers
- Bertram Ludäscher
- San Diego Supercomputer Center
- University of California, San Diego
2Outline
- Background (SEEK Project)
- Scientific Workflows
- The Problem Reusing Structurally Incompatible
Services - The Ontology-Driven Framework
- Future Work
3Outline
- Background (SEEK Project)
- Scientific Workflows
- The Problem Reusing Structurally Incompatible
Services - The Ontology-Driven Framework
- Future Work
4Science Environment for Ecological Knowledge
(SEEK)
- Domain Science Driver
- Ecology (LTER), biodiversity,
- Analysis Modeling System
- Design and execution of ecological models and
analysis - End user focus
- application,upper-ware
- Semantic Mediation System
- Data Integration of hard-to-relate sources and
processes - Semantic Types and Ontologies
- upper middleware
- EcoGrid
- Access to ecology data and tools
- middle,under-ware
Architecture (cf. US cyberinfrastructure, UK
e-Science)
5Outline
- The SEEK Project
- Scientific Workflows
- Focus analysis component integration on top of
data integration - The Problem Reusing Structurally Incompatible
Services - The Ontology-Driven Framework
- Future Work
6Promoter Identification in Kepler SSDBM03
- Problems
- Many components (web serivces) are NOT designed
to fit! - The problem P that X solves is simple, and X
doesnt solve it well - Semantically meaningful connections are
structurally incompatible - Approach
- Distinguish structural type and semantic type
- Structural type e.g. XML Schema
- Semantic type e.g. OWL expressions
- Exploit the (optional!) semantic type as much as
possible
7A Very Simple Scientific Workflow
P2
P3
P5
S1(life stage property)
S2(mortality rate for period)
P1
P4
8A Very Simple Scientific Workflow
P2
P3
P5
S1(life stage property)
S2(mortality rate for period)
P1
P4
observations
Phase
Observed
Eggs Instar I Instar II Instar III Instar
IV Adults
44,000 3,513 2,529 1,922 1,461 1,300
Population samples for life stages of the common
field grasshopper Begon et al, 1996
9A Very Simple Scientific Workflow
P2
P3
P5
S1(life stage property)
S2(mortality rate for period)
P1
P4
life stage periods
observations
Phase
Observed
Period
Phases
Nymphal
Instar I, Instar II, Instar III, Instar IV
Eggs Instar I Instar II Instar III Instar
IV Adults
44,000 3,513 2,529 1,922 1,461 1,300
Periods of development in terms of phases
Population samples for life stages of the common
field grasshopper Begon et al, 1996
10A Very Simple Scientific Workflow
P2
P3
P5
S1(life stage property)
S2(mortality rate for period)
P1
(nymphal, 0.44)
P4
k-value for each periodof observation
life stage periods
observations
Phase
Observed
Period
Phases
Nymphal
Instar I, Instar II, Instar III, Instar IV
Eggs Instar I Instar II Instar III Instar
IV Adults
44,000 3,513 2,529 1,922 1,461 1,300
Periods of development in terms of phases
Population samples for life stages of the common
field grasshopper Begon et al, 1996
11Scientific Workflows
- A scientific workflow consists of a network of
connected services - A service can be any software component
(including a web service or even a data source) - Each service (optionally) takes input and
(optionally) produces output
12Scientific Workflows
- SEEK adopts a Ptolemy II workflow model
- A service is called an actor
- Each actor has zero or more input and output
ports (and possibly parameters) - Data flows through a workflow based on
connections made from output to input ports - (ignored here different models of computation,
directors, )
P2
P3
P5
S1(life stage property)
S2(mortality rate for period)
P1
P4
13Outline
- The SEEK Project
- Scientific Workflows
- The Problem Reusing Structurally Incompatible
Services - The Ontology-Driven Framework
- Future Work
14Service Reusability
- A scientist wishes to connect two (independent)
services
Desired Connection
Source Service
Target Service
Pt
Ps
15Service Reusability
- In Ptolemy II/Kepler (and in web services), input
and output ports (message parts) have structural
types (XML Schema)
StructuralType Pt
StructuralType Ps
Desired Connection
Source Service
Target Service
Pt
Ps
16Service Reusability
- Unless designed to fit, independent services
are structurally incompatible - ? Generally, the source output type will not be a
subtype of the target input type
Incompatible
StructuralType Pt
StructuralType Ps
(?)
Desired Connection
Source Service
Target Service
Pt
Ps
17Service Reusability
- A transformation mapping (?) is required to
connect the services artificially creating
subtype compatibility - If such a ? exists, the services are
structurally feasible
Incompatible
StructuralType Pt
StructuralType Ps
(?)
?
?(Ps)
Desired Connection
Source Service
Target Service
Pt
Ps
18Service Reusability
- SEEK annotates services with semantic types for
discovery and interoperability of services
Ontologies (OWL)
Compatible
(?)
SemanticType Ps
SemanticType Pt
Desired Connection
Source Service
Target Service
Pt
Ps
19Service Reusability
- Services can be semantically compatible, but
structurally incompatible
Ontologies (OWL)
Compatible
(?)
SemanticType Ps
SemanticType Pt
Incompatible
StructuralType Pt
StructuralType Ps
(?)
?
?(Ps)
Desired Connection
Source Service
Target Service
Pt
Ps
20Example Structural Types (XML)
structType(P2)
structType(P3)
root cohortTable (measurement) elem
measuremnt (phase, obs) elem phase
xsdstring elem obs xsdinteger
ltpopulationgt ltsamplegt ltmeasgt
ltcntgt44,000lt/cntgt ltaccgt0.95lt/accgt
lt/measgt ltlspgtEggslt/lspgt lt/samplegt
ltpopulationgt
ltcohortTablegt ltmeasurementgt
ltphasegtEggslt/cntgt ltobsgt44,000lt/accgt
lt/measurementgt ltcohortTablegt
P2
P3
P5
S1(life stage property)
S2(mortality rate for period)
P1
P4
21Example Semantic Types
- Portion of SEEK measurement ontology
appliesTo
MeasContext
0
hasContext
11
hasProperty
itemMeasured
Observation
Entity
MeasProperty
0
1
EcologicalProperty
AccuracyQualifier
AbundanceCount
LifeStage Property
Spatial Location
hasLocation
11
hasValue
hasCount
11
Numeric Value
11
22Example Semantic Types
- Portion of SEEK measurement ontology
appliesTo
MeasContext
Same in OWL, a description logic standard (here,
Sparrow syntax) Observation subClassOf
forall hasContext/MeasContext and
forall hasProperty/MeasProperty
and exists
itemMeasured/Entity. MeasContext
subClassOf exists appliesTo/Entity and
atmost 1/appliesTo. EcologicalP
roperty subClassOf Entity. LifeStageProperty
subClassOf EcologicalProperty. AbundanceCount
subClassOf EcologicalProperty and
exists hasLocation/SpatialLocation
and atMost
1/hasLocation and
exists hasCount/NumericValue and
atMost 1/hasCount.
0
hasContext
11
hasProperty
itemMeasured
Observation
Entity
MeasProperty
0
1
EcologicalProperty
AccuracyQualifier
AbundanceCount
LifeStage Property
Spatial Location
hasLocation
11
hasValue
hasCount
11
Numeric Value
11
23Example Semantic Types
- Semantic types for P2 and P3
MeasContext
Observation
hasContext
appliesTo
LifeStage Property
11
11
itemMeasured
hasCount
semType(P3)
Abundance Count
Number Value
11
11
11
?
hasValue
hasProperty
semType(P2)
AccuracyQualifier
11
P2
P3
P5
S1(life stage property)
S2(mortality rate for period)
P1
P4
24Example Semantic Types
- Semantic types for P2 and P3
MeasContext
Observation
semType(P3) subClassOf Observation and
exists hasContext/(MeasurementContext
and exists
appliesTo/LifeStageProperty and
atMost 1/appliesTo) and
exists itemMeasured/AbundanceCount
and atMost
1/itemMeasured. semType(P2) subClassOf
Observation and exists
hasContext/(MeasurementContext and
exists appliesTo/LifeStageProper
ty and atMost
1/appliesTo) and exists
itemMeasured/AbundanceCount and
atMost 1/itemMeasured and
exists hasProperty/AccuracyQualifier and
atMost 1/hasProperty.
hasContext
appliesTo
LifeStage Property
11
11
itemMeasured
hasCount
semType(P3)
Abundance Count
Number Value
11
11
11
?
hasValue
hasProperty
semType(P2)
AccuracyQualifier
11
P2
P3
P5
S1(life stage property)
S2(mortality rate for period)
P1
P4
25Outline
- The SEEK Project
- Scientific Workflows
- The Problem Reusing Structurally Incompatible
Services - The Ontology-Driven Framework
- Future Work
26The Ontology-Driven Framework
- Define semantic registration mappings (semantic
views) to connect structural and semantic types - Use registration mappings to (semi-) automate
transformation, based on derived structural
correspondences - Depending on the ontologies and registration
mappings, it may not be possible to find an
appropriate ? - (since the correspondence is often
under-specified)
27The Ontology-Driven Framework
Ontologies (OWL)
Compatible
(?)
SemanticType Ps
SemanticType Pt
Registration Mapping (Input)
Registration Mapping (Output)
StructuralType Pt
StructuralType Ps
Source Service
Target Service
Pt
Ps
Desired Connection
28Registration Example (simple XPaths)
structType(P2)
ltpopulationgt ltsamplegt ltmeasgt
ltcntgt44,000lt/cntgt ltaccgt0.95lt/accgt
lt/measgt ltlspgtEggslt/lspgt lt/samplegt
ltpopulationgt
/population/sample
semType(P2)/population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/me
as/cnt/text() semType(P2).itemMeasured.hasCou
nt/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/ac
c/text() semType(P2).hasProperty.hasValue/po
pulation/sample/lsp/text()
semType(P2).hasContext.appliesTo
29Registration Example (simple XPaths)
structType(P2)
ltpopulationgt ltsamplegt ltmeasgt
ltcntgt44,000lt/cntgt ltaccgt0.95lt/accgt
lt/measgt ltlspgtEggslt/lspgt lt/samplegt
ltpopulationgt
/population/sample
semType(P2)/population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/me
as/cnt/text() semType(P2).itemMeasured.hasCou
nt/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/ac
c/text() semType(P2).hasProperty.hasValue/po
pulation/sample/lsp/text()
semType(P2).hasContext.appliesTo
Each sample is an instance of the semantic type
30Registration Example (simple XPaths)
structType(P2)
ltpopulationgt ltsamplegt ltmeasgt
ltcntgt44,000lt/cntgt ltaccgt0.95lt/accgt
lt/measgt ltlspgtEggslt/lspgt lt/samplegt
ltpopulationgt
/population/sample
semType(P2)/population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/me
as/cnt/text() semType(P2).itemMeasured.hasCou
nt/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/ac
c/text() semType(P2).hasProperty.hasValue/po
pulation/sample/lsp/text()
semType(P2).hasContext.appliesTo
Each samples cnt represents the itemMeasured
object
31Registration Example (simple XPaths)
structType(P2)
ltpopulationgt ltsamplegt ltmeasgt
ltcntgt44,000lt/cntgt ltaccgt0.95lt/accgt
lt/measgt ltlspgtEggslt/lspgt lt/samplegt
ltpopulationgt
/population/sample
semType(P2)/population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/me
as/cnt/text() semType(P2).itemMeasured.hasCou
nt/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/ac
c/text() semType(P2).hasProperty.hasValue/po
pulation/sample/lsp/text()
semType(P2).hasContext.appliesTo
Each samples cnts value represents the hasCount
value ofthe corresponding itemMeasured object
32Registration Example (simple XPaths)
structType(P3)
ltcohortTablegt ltmeasurementgt
ltphasegtEggslt/cntgt ltobsgt44,000lt/accgt
lt/measurementgt ltcohortTablegt
root cohortTable (measurement) elem
measuremnt (phase, obs) elem phase
xsdstring elem obs xsdinteger
/cohortTable/measurement
semType(P3)/cohortTable/measurement/obs
semType(P3).itemMeasured/cohortTable/measure
ment/obs/text() semType(P3).itemMeasured.ha
sCount/cohortTable/measurement/phase/text()
semType(P3).hasContext.appliesTo
similary for P3 .. .
33The Ontology-Driven Framework
Ontologies (OWL)
Compatible
(?)
SemanticType Ps
SemanticType Pt
Registration Mapping (Input)
Registration Mapping (Output)
StructuralType Pt
StructuralType Ps
Correspondence
Source Service
Target Service
Pt
Ps
Desired Connection
34Correspondence Example
Source-side semantic registration mapping
/population/sample
semType(P2)/population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/m
eas/cnt/text() semType(P2).itemMeasured.ha
sCount/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/ac
c/text() semType(P2).hasProperty.hasValue
/population/sample/lsp/text()
semType(P2).hasContext.appliesTo
Target-side semantic registration mapping
/cohortTable/measurement
semType(P3)/cohortTable/measurement/obs
semType(P3).itemMeasured/cohortTable/measure
ment/obs/text() semType(P3).itemMeasured.ha
sCount/cohortTable/measurement/phase/text()
semType(P3).hasContext.appliesTo
population
cohortTable
sample
measurement
meas
obs
cnt
xsdinteger
xsdinteger
phase
acc
xsdstring
xsddouble
lsp
xsdstring
35Correspondence Example
Source
/population/sample
semType(P2)/population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/m
eas/cnt/text() semType(P2).itemMeasured.ha
sCount/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/ac
c/text() semType(P2).hasProperty.hasValue
/population/sample/lsp/text()
semType(P2).hasContext.appliesTo
Target
/cohortTable/measurement
semType(P3)/cohortTable/measurement/obs
semType(P3).itemMeasured/cohortTable/measure
ment/obs/text() semType(P3).itemMeasured.ha
sCount/cohortTable/measurement/phase/text()
semType(P3).hasContext.appliesTo
We want to composethe registrations to
obtain structural correspondences
population
cohortTable
sample
measurement
meas
obs
cnt
xsdinteger
xsdinteger
phase
acc
xsdstring
xsddouble
lsp
xsdstring
36Correspondence Example
Source
/population/sample
semType(P2) /population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/
meas/cnt/text() semType(P2).itemMeasured.h
asCount/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/a
cc/text() semType(P2).hasProperty.hasValue
/population/sample/lsp/text()
semType(P2).hasContext.appliesTo
/population/sample
semType(P2)
Target
/cohortTable/measurement
semType(P3)/cohortTable/measurement/obs
semType(P3).itemMeasured/cohortTable/measure
ment/obs/text() semType(P3).itemMeasured.ha
sCount/cohortTable/measurement/phase/text()
semType(P3).hasContext.appliesTo
/cohortTable/measurement
semType(P3)
population
cohortTable
sample
measurement
meas
obs
cnt
xsdinteger
xsdinteger
phase
acc
xsdstring
xsddouble
lsp
These fragments correspond
xsdstring
37Correspondence Example
Source
/population/sample
semType(P2) /population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/
meas/cnt/text() semType(P2).itemMeasured.h
asCount/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/a
cc/text() semType(P2).hasProperty.hasValue
/population/sample/lsp/text()
semType(P2).hasContext.appliesTo
/population/sample/meas/cnt
semType(P2).itemMeasured
Target
/cohortTable/measurement
semType(P3)/cohortTable/measurement/obs
semType(P3).itemMeasured/cohortTable/measure
ment/obs/text() semType(P3).itemMeasured.ha
sCount/cohortTable/measurement/phase/text()
semType(P3).hasContext.appliesTo
/cohortTable/measurement/obs
semType(P3).itemMeasured
population
cohortTable
sample
measurement
meas
obs
cnt
xsdinteger
xsdinteger
phase
acc
xsdstring
xsddouble
lsp
These fragments correspond
xsdstring
38Correspondence Example
Source
/population/sample
semType(P2) /population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/
meas/cnt/text() semType(P2).itemMeasured.h
asCount/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/a
cc/text() semType(P2).hasProperty.hasValue
/population/sample/lsp/text()
semType(P2).hasContext.appliesTo
/population/sample/meas/cnt/text()
semType(P2).itemMeasured.hasCount
Target
/cohortTable/measurement
semType(P3)/cohortTable/measurement/obs
semType(P3).itemMeasured/cohortTable/measure
ment/obs/text() semType(P3).itemMeasured.ha
sCount/cohortTable/measurement/phase/text()
semType(P3).hasContext.appliesTo
/cohortTable/measurement/obs/text()
semType(P3).itemMeasured.hasCount
population
cohortTable
sample
measurement
meas
obs
cnt
xsdinteger
xsdinteger
phase
acc
xsdstring
xsddouble
lsp
These fragments correspond
xsdstring
39Correspondence Example
Source
/population/sample
semType(P2) /population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/
meas/cnt/text() semType(P2).itemMeasured.h
asCount/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/a
cc/text() semType(P2).hasProperty.hasValue
/population/sample/lsp/text()
semType(P2).hasContext.appliesTo
/population/sample/lsp/text()
semType(P2).hasContext.appliesTo
Target
/cohortTable/measurement
semType(P3)/cohortTable/measurement/obs
semType(P3).itemMeasured/cohortTable/measure
ment/obs/text() semType(P3).itemMeasured.ha
sCount/cohortTable/measurement/phase/text()
semType(P3).hasContext.appliesTo
/cohortTable/measurement/phase/text()
semType(P3).hasContext.appliesTo
population
cohortTable
sample
measurement
meas
obs
cnt
xsdinteger
xsdinteger
phase
acc
xsdstring
xsddouble
lsp
These fragments correspond
xsdstring
40The Ontology-Driven Framework
Ontologies (OWL)
Compatible
(?)
SemanticType Ps
SemanticType Pt
Registration Mapping (Input)
Registration Mapping (Output)
StructuralType Pt
StructuralType Ps
Correspondence
?(Ps)
Generate
Source Service
Target Service
Transformation
Pt
Ps
Desired Connection
41Example Result (XQuery)
- Based on the structural correspondences and
certain assumptions, we derive the transformation
XQuery
ltcohortTablegt for s in /population/sample
return ltmeasurementgt for c in
s/meas/cnt return ltobsgtc/text()lt/obsgt
for l in s/lsp return ltphasegtl/text()lt/pha
segt lt/measurementgt lt/cohortTablegt
42Assumptions Made(or why this may not work for
you)
- Common XPath prefixes refer to the same element
- Elements in correspondences have compatible
cardinalities - source is equivalent or stricter than target
(e.g., is stricter than ) - Primitive data types are compatible
43Framework Operations and Properties
- In the paper, we define
- A semantic registration mapping R as a set of
rules q?p, where q is a substructure selection
(query) and p is a contextual path (a path in an
ontology) - A structural correspondence as a rule qs?qt,
where qs and qt are substructure selections over
the source and target, resp. - The semantic composition of registration mappings
Rs and Rt, which returns a set of structural
correspondence rules - The semantic subpath operation (subconcept),
which is used by the semantic composition to find
matching substructure selection rules
44Framework Operations and Properties
- In the paper, we define
- Registration mapping properties (cardinality
consistency and partial complete registrations)
and discuss the impact on determining structural
transformations - The simple XPath and Semantic Path languages for
defining registration mappings, and the
corresponding semantic join operator to find
correspondences
45Outline
- The SEEK Project
- Scientific Workflows
- The Problem Reusing Structurally Incompatible
Services - The Ontology-Driven Framework
- A Simple Framework Implementation
- Future Work
46Future Work
- Extend the registration mapping language
- XPath is too limited
- try a more general query language (e.g., XPath
variables) - relational/Datalog based substructure selection
(query) - Formalize the properties of registration mappings
and their effect on automated transformation - Introduce conversion routines (e.g., for units)
at the ontology level apply them in
transformations - Extend transformations to different computation
models and workflow scheduling algorithms - Add to the Kepler Scientific Workflow System
47Acknowledgements
- NSF/ITR Science Environment for Ecological
Knowledge - NSF/ITR Geosciences Network
- NIH Biomedical Informatics
- Research Network
- DOE Scientific Data
- Management Center
48Questions