An Ontology-Driven Framework for Data Transformation in Scientific Workflows - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

An Ontology-Driven Framework for Data Transformation in Scientific Workflows

Description:

An Ontology-Driven Framework for Data Transformation in Scientific Workflows. Shawn Bowers ... common field grasshopper [Begon et al, 1996] 9 ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 49
Provided by: Sha6169
Learn more at: https://users.sdsc.edu
Category:

less

Transcript and Presenter's Notes

Title: An Ontology-Driven Framework for Data Transformation in Scientific Workflows


1
An Ontology-Driven Framework for Data
Transformation in Scientific Workflows
  • Shawn Bowers
  • Bertram Ludäscher
  • San Diego Supercomputer Center
  • University of California, San Diego

2
Outline
  • Background (SEEK Project)
  • Scientific Workflows
  • The Problem Reusing Structurally Incompatible
    Services
  • The Ontology-Driven Framework
  • Future Work

3
Outline
  • Background (SEEK Project)
  • Scientific Workflows
  • The Problem Reusing Structurally Incompatible
    Services
  • The Ontology-Driven Framework
  • Future Work

4
Science Environment for Ecological Knowledge
(SEEK)
  • Domain Science Driver
  • Ecology (LTER), biodiversity,
  • Analysis Modeling System
  • Design and execution of ecological models and
    analysis
  • End user focus
  • application,upper-ware
  • Semantic Mediation System
  • Data Integration of hard-to-relate sources and
    processes
  • Semantic Types and Ontologies
  • upper middleware
  • EcoGrid
  • Access to ecology data and tools
  • middle,under-ware

Architecture (cf. US cyberinfrastructure, UK
e-Science)
5
Outline
  • The SEEK Project
  • Scientific Workflows
  • Focus analysis component integration on top of
    data integration
  • The Problem Reusing Structurally Incompatible
    Services
  • The Ontology-Driven Framework
  • Future Work

6
Promoter Identification in Kepler SSDBM03
  • Problems
  • Many components (web serivces) are NOT designed
    to fit!
  • The problem P that X solves is simple, and X
    doesnt solve it well
  • Semantically meaningful connections are
    structurally incompatible
  • Approach
  • Distinguish structural type and semantic type
  • Structural type e.g. XML Schema
  • Semantic type e.g. OWL expressions
  • Exploit the (optional!) semantic type as much as
    possible

7
A Very Simple Scientific Workflow
P2
P3
P5
S1(life stage property)
S2(mortality rate for period)
P1
P4
8
A Very Simple Scientific Workflow
P2
P3
P5
S1(life stage property)
S2(mortality rate for period)
P1
P4
observations
Phase
Observed
Eggs Instar I Instar II Instar III Instar
IV Adults
44,000 3,513 2,529 1,922 1,461 1,300
Population samples for life stages of the common
field grasshopper Begon et al, 1996
9
A Very Simple Scientific Workflow
P2
P3
P5
S1(life stage property)
S2(mortality rate for period)
P1
P4
life stage periods
observations
Phase
Observed
Period
Phases
Nymphal
Instar I, Instar II, Instar III, Instar IV
Eggs Instar I Instar II Instar III Instar
IV Adults
44,000 3,513 2,529 1,922 1,461 1,300
Periods of development in terms of phases
Population samples for life stages of the common
field grasshopper Begon et al, 1996
10
A Very Simple Scientific Workflow
P2
P3
P5
S1(life stage property)
S2(mortality rate for period)
P1
(nymphal, 0.44)
P4
k-value for each periodof observation
life stage periods
observations
Phase
Observed
Period
Phases
Nymphal
Instar I, Instar II, Instar III, Instar IV
Eggs Instar I Instar II Instar III Instar
IV Adults
44,000 3,513 2,529 1,922 1,461 1,300
Periods of development in terms of phases
Population samples for life stages of the common
field grasshopper Begon et al, 1996
11
Scientific Workflows
  • A scientific workflow consists of a network of
    connected services
  • A service can be any software component
    (including a web service or even a data source)
  • Each service (optionally) takes input and
    (optionally) produces output

12
Scientific Workflows
  • SEEK adopts a Ptolemy II workflow model
  • A service is called an actor
  • Each actor has zero or more input and output
    ports (and possibly parameters)
  • Data flows through a workflow based on
    connections made from output to input ports
  • (ignored here different models of computation,
    directors, )

P2
P3
P5
S1(life stage property)
S2(mortality rate for period)
P1
P4
13
Outline
  • The SEEK Project
  • Scientific Workflows
  • The Problem Reusing Structurally Incompatible
    Services
  • The Ontology-Driven Framework
  • Future Work

14
Service Reusability
  • A scientist wishes to connect two (independent)
    services

Desired Connection
Source Service
Target Service
Pt
Ps
15
Service Reusability
  • In Ptolemy II/Kepler (and in web services), input
    and output ports (message parts) have structural
    types (XML Schema)

StructuralType Pt
StructuralType Ps
Desired Connection
Source Service
Target Service
Pt
Ps
16
Service Reusability
  • Unless designed to fit, independent services
    are structurally incompatible
  • ? Generally, the source output type will not be a
    subtype of the target input type

Incompatible
StructuralType Pt
StructuralType Ps
(?)
Desired Connection
Source Service
Target Service
Pt
Ps
17
Service Reusability
  • A transformation mapping (?) is required to
    connect the services artificially creating
    subtype compatibility
  • If such a ? exists, the services are
    structurally feasible

Incompatible
StructuralType Pt
StructuralType Ps
(?)
?
?(Ps)
Desired Connection
Source Service
Target Service
Pt
Ps
18
Service Reusability
  • SEEK annotates services with semantic types for
    discovery and interoperability of services

Ontologies (OWL)
Compatible
(?)
SemanticType Ps
SemanticType Pt
Desired Connection
Source Service
Target Service
Pt
Ps
19
Service Reusability
  • Services can be semantically compatible, but
    structurally incompatible

Ontologies (OWL)
Compatible
(?)
SemanticType Ps
SemanticType Pt
Incompatible
StructuralType Pt
StructuralType Ps
(?)
?
?(Ps)
Desired Connection
Source Service
Target Service
Pt
Ps
20
Example Structural Types (XML)
structType(P2)
structType(P3)
root cohortTable (measurement) elem
measuremnt (phase, obs) elem phase
xsdstring elem obs xsdinteger
ltpopulationgt ltsamplegt ltmeasgt
ltcntgt44,000lt/cntgt ltaccgt0.95lt/accgt
lt/measgt ltlspgtEggslt/lspgt lt/samplegt
ltpopulationgt
ltcohortTablegt ltmeasurementgt
ltphasegtEggslt/cntgt ltobsgt44,000lt/accgt
lt/measurementgt ltcohortTablegt
P2
P3
P5
S1(life stage property)
S2(mortality rate for period)
P1
P4
21
Example Semantic Types
  • Portion of SEEK measurement ontology

appliesTo
MeasContext
0
hasContext
11
hasProperty
itemMeasured
Observation
Entity
MeasProperty
0
1
EcologicalProperty
AccuracyQualifier
AbundanceCount
LifeStage Property
Spatial Location
hasLocation
11
hasValue
hasCount
11
Numeric Value
11
22
Example Semantic Types
  • Portion of SEEK measurement ontology

appliesTo
MeasContext
Same in OWL, a description logic standard (here,
Sparrow syntax) Observation subClassOf
forall hasContext/MeasContext and
forall hasProperty/MeasProperty
and exists
itemMeasured/Entity. MeasContext
subClassOf exists appliesTo/Entity and
atmost 1/appliesTo. EcologicalP
roperty subClassOf Entity. LifeStageProperty
subClassOf EcologicalProperty. AbundanceCount
subClassOf EcologicalProperty and
exists hasLocation/SpatialLocation
and atMost
1/hasLocation and
exists hasCount/NumericValue and
atMost 1/hasCount.
0
hasContext
11
hasProperty
itemMeasured
Observation
Entity
MeasProperty
0
1
EcologicalProperty
AccuracyQualifier
AbundanceCount
LifeStage Property
Spatial Location
hasLocation
11
hasValue
hasCount
11
Numeric Value
11
23
Example Semantic Types
  • Semantic types for P2 and P3

MeasContext
Observation
hasContext
appliesTo
LifeStage Property
11
11
itemMeasured
hasCount
semType(P3)
Abundance Count
Number Value
11
11
11
?
hasValue
hasProperty
semType(P2)
AccuracyQualifier
11
P2
P3
P5
S1(life stage property)
S2(mortality rate for period)
P1
P4
24
Example Semantic Types
  • Semantic types for P2 and P3

MeasContext
Observation
semType(P3) subClassOf Observation and
exists hasContext/(MeasurementContext
and exists
appliesTo/LifeStageProperty and
atMost 1/appliesTo) and
exists itemMeasured/AbundanceCount
and atMost
1/itemMeasured. semType(P2) subClassOf
Observation and exists
hasContext/(MeasurementContext and
exists appliesTo/LifeStageProper
ty and atMost
1/appliesTo) and exists
itemMeasured/AbundanceCount and
atMost 1/itemMeasured and
exists hasProperty/AccuracyQualifier and
atMost 1/hasProperty.
hasContext
appliesTo
LifeStage Property
11
11
itemMeasured
hasCount
semType(P3)
Abundance Count
Number Value
11
11
11
?
hasValue
hasProperty
semType(P2)
AccuracyQualifier
11
P2
P3
P5
S1(life stage property)
S2(mortality rate for period)
P1
P4
25
Outline
  • The SEEK Project
  • Scientific Workflows
  • The Problem Reusing Structurally Incompatible
    Services
  • The Ontology-Driven Framework
  • Future Work

26
The Ontology-Driven Framework
  • Define semantic registration mappings (semantic
    views) to connect structural and semantic types
  • Use registration mappings to (semi-) automate
    transformation, based on derived structural
    correspondences
  • Depending on the ontologies and registration
    mappings, it may not be possible to find an
    appropriate ?
  • (since the correspondence is often
    under-specified)

27
The Ontology-Driven Framework
Ontologies (OWL)
Compatible
(?)
SemanticType Ps
SemanticType Pt
Registration Mapping (Input)
Registration Mapping (Output)
StructuralType Pt
StructuralType Ps
Source Service
Target Service
Pt
Ps
Desired Connection
28
Registration Example (simple XPaths)
structType(P2)
ltpopulationgt ltsamplegt ltmeasgt
ltcntgt44,000lt/cntgt ltaccgt0.95lt/accgt
lt/measgt ltlspgtEggslt/lspgt lt/samplegt
ltpopulationgt
/population/sample
semType(P2)/population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/me
as/cnt/text() semType(P2).itemMeasured.hasCou
nt/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/ac
c/text() semType(P2).hasProperty.hasValue/po
pulation/sample/lsp/text()
semType(P2).hasContext.appliesTo
29
Registration Example (simple XPaths)
structType(P2)
ltpopulationgt ltsamplegt ltmeasgt
ltcntgt44,000lt/cntgt ltaccgt0.95lt/accgt
lt/measgt ltlspgtEggslt/lspgt lt/samplegt
ltpopulationgt
/population/sample
semType(P2)/population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/me
as/cnt/text() semType(P2).itemMeasured.hasCou
nt/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/ac
c/text() semType(P2).hasProperty.hasValue/po
pulation/sample/lsp/text()
semType(P2).hasContext.appliesTo
Each sample is an instance of the semantic type
30
Registration Example (simple XPaths)
structType(P2)
ltpopulationgt ltsamplegt ltmeasgt
ltcntgt44,000lt/cntgt ltaccgt0.95lt/accgt
lt/measgt ltlspgtEggslt/lspgt lt/samplegt
ltpopulationgt
/population/sample
semType(P2)/population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/me
as/cnt/text() semType(P2).itemMeasured.hasCou
nt/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/ac
c/text() semType(P2).hasProperty.hasValue/po
pulation/sample/lsp/text()
semType(P2).hasContext.appliesTo
Each samples cnt represents the itemMeasured
object
31
Registration Example (simple XPaths)
structType(P2)
ltpopulationgt ltsamplegt ltmeasgt
ltcntgt44,000lt/cntgt ltaccgt0.95lt/accgt
lt/measgt ltlspgtEggslt/lspgt lt/samplegt
ltpopulationgt
/population/sample
semType(P2)/population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/me
as/cnt/text() semType(P2).itemMeasured.hasCou
nt/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/ac
c/text() semType(P2).hasProperty.hasValue/po
pulation/sample/lsp/text()
semType(P2).hasContext.appliesTo
Each samples cnts value represents the hasCount
value ofthe corresponding itemMeasured object
32
Registration Example (simple XPaths)
structType(P3)
ltcohortTablegt ltmeasurementgt
ltphasegtEggslt/cntgt ltobsgt44,000lt/accgt
lt/measurementgt ltcohortTablegt
root cohortTable (measurement) elem
measuremnt (phase, obs) elem phase
xsdstring elem obs xsdinteger
/cohortTable/measurement
semType(P3)/cohortTable/measurement/obs
semType(P3).itemMeasured/cohortTable/measure
ment/obs/text() semType(P3).itemMeasured.ha
sCount/cohortTable/measurement/phase/text()
semType(P3).hasContext.appliesTo
similary for P3 .. .
33
The Ontology-Driven Framework
Ontologies (OWL)
Compatible
(?)
SemanticType Ps
SemanticType Pt
Registration Mapping (Input)
Registration Mapping (Output)
StructuralType Pt
StructuralType Ps
Correspondence
Source Service
Target Service
Pt
Ps
Desired Connection
34
Correspondence Example
Source-side semantic registration mapping
/population/sample
semType(P2)/population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/m
eas/cnt/text() semType(P2).itemMeasured.ha
sCount/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/ac
c/text() semType(P2).hasProperty.hasValue
/population/sample/lsp/text()
semType(P2).hasContext.appliesTo
Target-side semantic registration mapping
/cohortTable/measurement
semType(P3)/cohortTable/measurement/obs
semType(P3).itemMeasured/cohortTable/measure
ment/obs/text() semType(P3).itemMeasured.ha
sCount/cohortTable/measurement/phase/text()
semType(P3).hasContext.appliesTo
population
cohortTable
sample
measurement
meas
obs
cnt
xsdinteger
xsdinteger
phase
acc
xsdstring
xsddouble
lsp
xsdstring
35
Correspondence Example
Source
/population/sample
semType(P2)/population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/m
eas/cnt/text() semType(P2).itemMeasured.ha
sCount/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/ac
c/text() semType(P2).hasProperty.hasValue
/population/sample/lsp/text()
semType(P2).hasContext.appliesTo
Target
/cohortTable/measurement
semType(P3)/cohortTable/measurement/obs
semType(P3).itemMeasured/cohortTable/measure
ment/obs/text() semType(P3).itemMeasured.ha
sCount/cohortTable/measurement/phase/text()
semType(P3).hasContext.appliesTo
We want to composethe registrations to
obtain structural correspondences
population
cohortTable
sample
measurement
meas
obs
cnt
xsdinteger
xsdinteger
phase
acc
xsdstring
xsddouble
lsp
xsdstring
36
Correspondence Example
Source
/population/sample
semType(P2) /population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/
meas/cnt/text() semType(P2).itemMeasured.h
asCount/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/a
cc/text() semType(P2).hasProperty.hasValue
/population/sample/lsp/text()
semType(P2).hasContext.appliesTo
/population/sample
semType(P2)
Target
/cohortTable/measurement
semType(P3)/cohortTable/measurement/obs
semType(P3).itemMeasured/cohortTable/measure
ment/obs/text() semType(P3).itemMeasured.ha
sCount/cohortTable/measurement/phase/text()
semType(P3).hasContext.appliesTo
/cohortTable/measurement
semType(P3)
population
cohortTable
sample
measurement
meas
obs
cnt
xsdinteger
xsdinteger
phase
acc
xsdstring
xsddouble
lsp
These fragments correspond
xsdstring
37
Correspondence Example
Source
/population/sample
semType(P2) /population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/
meas/cnt/text() semType(P2).itemMeasured.h
asCount/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/a
cc/text() semType(P2).hasProperty.hasValue
/population/sample/lsp/text()
semType(P2).hasContext.appliesTo
/population/sample/meas/cnt
semType(P2).itemMeasured
Target
/cohortTable/measurement
semType(P3)/cohortTable/measurement/obs
semType(P3).itemMeasured/cohortTable/measure
ment/obs/text() semType(P3).itemMeasured.ha
sCount/cohortTable/measurement/phase/text()
semType(P3).hasContext.appliesTo
/cohortTable/measurement/obs
semType(P3).itemMeasured
population
cohortTable
sample
measurement
meas
obs
cnt
xsdinteger
xsdinteger
phase
acc
xsdstring
xsddouble
lsp
These fragments correspond
xsdstring
38
Correspondence Example
Source
/population/sample
semType(P2) /population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/
meas/cnt/text() semType(P2).itemMeasured.h
asCount/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/a
cc/text() semType(P2).hasProperty.hasValue
/population/sample/lsp/text()
semType(P2).hasContext.appliesTo
/population/sample/meas/cnt/text()
semType(P2).itemMeasured.hasCount
Target
/cohortTable/measurement
semType(P3)/cohortTable/measurement/obs
semType(P3).itemMeasured/cohortTable/measure
ment/obs/text() semType(P3).itemMeasured.ha
sCount/cohortTable/measurement/phase/text()
semType(P3).hasContext.appliesTo
/cohortTable/measurement/obs/text()
semType(P3).itemMeasured.hasCount
population
cohortTable
sample
measurement
meas
obs
cnt
xsdinteger
xsdinteger
phase
acc
xsdstring
xsddouble
lsp
These fragments correspond
xsdstring
39
Correspondence Example
Source
/population/sample
semType(P2) /population/sample/meas/cnt
semType(P2).itemMeasured/population/sample/
meas/cnt/text() semType(P2).itemMeasured.h
asCount/population/sample/meas/acc
semType(P2).hasProperty/population/sample/meas/a
cc/text() semType(P2).hasProperty.hasValue
/population/sample/lsp/text()
semType(P2).hasContext.appliesTo
/population/sample/lsp/text()
semType(P2).hasContext.appliesTo
Target
/cohortTable/measurement
semType(P3)/cohortTable/measurement/obs
semType(P3).itemMeasured/cohortTable/measure
ment/obs/text() semType(P3).itemMeasured.ha
sCount/cohortTable/measurement/phase/text()
semType(P3).hasContext.appliesTo
/cohortTable/measurement/phase/text()
semType(P3).hasContext.appliesTo
population
cohortTable
sample
measurement
meas
obs
cnt
xsdinteger
xsdinteger
phase
acc
xsdstring
xsddouble
lsp
These fragments correspond
xsdstring
40
The Ontology-Driven Framework
Ontologies (OWL)
Compatible
(?)
SemanticType Ps
SemanticType Pt
Registration Mapping (Input)
Registration Mapping (Output)
StructuralType Pt
StructuralType Ps
Correspondence
?(Ps)
Generate
Source Service
Target Service
Transformation
Pt
Ps
Desired Connection
41
Example Result (XQuery)
  • Based on the structural correspondences and
    certain assumptions, we derive the transformation
    XQuery


ltcohortTablegt for s in /population/sample
return ltmeasurementgt for c in
s/meas/cnt return ltobsgtc/text()lt/obsgt
for l in s/lsp return ltphasegtl/text()lt/pha
segt lt/measurementgt lt/cohortTablegt
42
Assumptions Made(or why this may not work for
you)
  • Common XPath prefixes refer to the same element
  • Elements in correspondences have compatible
    cardinalities
  • source is equivalent or stricter than target
    (e.g., is stricter than )
  • Primitive data types are compatible

43
Framework Operations and Properties
  • In the paper, we define
  • A semantic registration mapping R as a set of
    rules q?p, where q is a substructure selection
    (query) and p is a contextual path (a path in an
    ontology)
  • A structural correspondence as a rule qs?qt,
    where qs and qt are substructure selections over
    the source and target, resp.
  • The semantic composition of registration mappings
    Rs and Rt, which returns a set of structural
    correspondence rules
  • The semantic subpath operation (subconcept),
    which is used by the semantic composition to find
    matching substructure selection rules

44
Framework Operations and Properties
  • In the paper, we define
  • Registration mapping properties (cardinality
    consistency and partial complete registrations)
    and discuss the impact on determining structural
    transformations
  • The simple XPath and Semantic Path languages for
    defining registration mappings, and the
    corresponding semantic join operator to find
    correspondences

45
Outline
  • The SEEK Project
  • Scientific Workflows
  • The Problem Reusing Structurally Incompatible
    Services
  • The Ontology-Driven Framework
  • A Simple Framework Implementation
  • Future Work

46
Future Work
  • Extend the registration mapping language
  • XPath is too limited
  • try a more general query language (e.g., XPath
    variables)
  • relational/Datalog based substructure selection
    (query)
  • Formalize the properties of registration mappings
    and their effect on automated transformation
  • Introduce conversion routines (e.g., for units)
    at the ontology level apply them in
    transformations
  • Extend transformations to different computation
    models and workflow scheduling algorithms
  • Add to the Kepler Scientific Workflow System

47
Acknowledgements
  • NSF/ITR Science Environment for Ecological
    Knowledge
  • NSF/ITR Geosciences Network
  • NIH Biomedical Informatics
  • Research Network
  • DOE Scientific Data
  • Management Center

48
Questions
Write a Comment
User Comments (0)
About PowerShow.com