Title: Ontologies in Data and Application Integration an Update
1Ontologies in Data and Application Integration
an Update
Kai Lin Bertram Ludäscher Knowledge-Based
Information Systems Lab Data and Knowledge
Systems (DAKS) San Diego Supercomputer
Center University of California San Diego
http//www.geongrid.org
2Outline
- Motivation
- Ontology Cheat Sheet
- Ontology-enabled Prototypes and Tools
- Data Service Registration (Structural
Semantic) - Scientific Workflows
3(No Transcript)
4Ontology Cheat Sheet (1/2)
- What is an ontology? An ontology usually
- specifies a theory (a set of models) by
- defining and relating
- concepts representing features of a domain of
interest - Also an overloaded (sometimes sloppy) term for
- Controlled vocabularies
- Database schema (relational, XML, )
- Conceptual schema (ER, UML, )
- Thesauri (synonyms, broader term/narrower term)
- Taxonomies
- Informal/semi-formal representations
- Concept spaces, concept maps
- Labeled graphs / semantic networks (RDF)
- Formal ontologies, e.g., in Description Logic
(OWL) - formalization of a specification
- ? constrains possible interpretation of terms
5A Multi-Hierarchical Rock Classification
Ontology (GSC)
Genesis
Fabric
Composition
Texture
6Ontology Cheat Sheet (2/2)
- What are ontologies used for?
- Conceptual models of a domain or application,
(communication means, system design, ) - Classification of
- concepts (taxonomy) and
- data/object instances through classes
- Analysis of ontologies e.g.
- Graph queries (reachability, path queries, )
- Reasoning (concept subsumption, consistency
checking, ) - Targets for semantic data registration
- Conceptual indexes and views for
- searching,
- browsing,
- querying, and
- integration of registered data
7Application Example Geologic Map Integration
domain knowledge
Knowledge representation Ontologies!?
Nevada
8Geologic Map Integration in the Portal
- After registering datasets, ontologies (here
classes), and an application (OMI), the
datasets can be searched and displayed in an
integrated way.
9Concept-Based Queries and Analysis
- After registering a source with one or more
ontologies, concept-based queries and analysis
can be launched - Here light-weight client-side processing (SVG)
10Ontologies and Data Management
- Where do ontologies fit within data management
architectures? - Several answers, specifically
- An ontology is similar to a schema or conceptual
model if one exists, but is - Developed independently of a particular
application - Probably given in a different language
- Inherently more general
- Usually not a very good schema (weak structure)
11Ontologies and Data Management(? watch out for
Semantic Data Registration later)
Ontology
use concepts from (explicitly or implicitly)
Design Artifact
Conceptual Model
Conceptual Model
Schema
Schema
Schema
Schema
? Metadata
Data
12Creating and Sharing Concept Maps (here
Seismology concept map Cmap tool)
- Lock up scientists for 2 days
- Add CS/KRDB types
- Create concept maps
- Refine
- Iterate
- ? from napkin drawings, to concept maps, to
ontologies
13(No Transcript)
14(No Transcript)
15(No Transcript)
16Graph (RDF) Queries on Ontologies
visualisation
RQL Query Show all products
Query Results
17Community-Based Ontology Development
- Current concept maps and
- emerging ontologies
- Igneous Rocks/Plutons
- Seismology
- Geochemistry
- Draft of a geochemistry ontology developed by
scientists
18Protégé ( not so ezOWL yet)
19Sparrow (a poor mans OWL tool )
- Simple ASCII-based RDF and OWL entry and
manipulation
20Semantic Data Registration(joint work w/ Shawn
Bowers)
21What is Data/Ontology/ Registration?
- A mechanism by which data sources, ontologies,
services, - are published in a repository/registry
- for the purpose of smart discovery, querying,
integration
22Things to Register
- Data files (individual files)
- Shapefile as a blob ( file type)
- Collections (of files nested eg satellite data)
- Databases (has schema and can be queried)
- Shapefile with schema registered
- Ontologies
- Services (web grid services)
- Other/external applications
23Connecting Datasets to Ontologies
Ontology (snippet)
How can we register the dataset to concepts in
the Ontology?
Dataset
Date Site Transect SP_Code Count
2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4
LOCH 0 2000-09-08 CARP 7 MUCA 1
2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1
PAPA 5 2000-09-28 BULL 1 CYOS 57
24Step1 Selecting Relevant Concepts
Concepts from an Ontology
- DataCollectionEvent
- AbundanceCollectionEvent
- Measurement
- Abundance
- SpeciesAbundance
- Location
- LTERSite
- SBLTERSite
- naples
- MeasurableItem
- SpeciesCount
Dataset
Date Site Transect SP_Code Count
2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4
LOCH 0 2000-09-08 CARP 7 MUCA 1
2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1
PAPA 5 2000-09-28 BULL 1 CYOS 57
25Step1 Selecting Relevant Concepts
Concepts from an Ontology
- DataCollectionEvent
- AbundanceCollectionEvent
- Measurement
- Abundance
- SpeciesAbundance
- Location
- LTERSite
- SBLTERSite
- naples
- MeasurableItem
- SpeciesCount
Dataset
Date Site Transect SP_Code Count
2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4
LOCH 0 2000-09-08 CARP 7 MUCA 1
2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1
PAPA 5 2000-09-28 BULL 1 CYOS 57
26Step2 Generate Object Model
Concepts from an Ontology
- DataCollectionEvent
- AbundanceCollectionEvent
- Measurement
- Abundance
- SpeciesAbundance
- Location
- LTERSite
- SBLTERSite
- naples
- MeasurableItem
- SpeciesCount
Abundance Collection Event
contains
measureOf
SpeciesAbundance
SpeciesCount
hasSpecies
hasValue
hasUnit
Species
RatioUnit
RatioValue
hasTime
hasLoc
DateTime
SBLTERSite
27(No Transcript)
28(No Transcript)
29Applications of Semantic Registration
- Mentioned before
- Smart data discovery, integration etc.
- New application
- Generating data transformation semi-automatically
for chaining together computational services
30Problem Service Reusability
- Unless designed to fit, independent services
are structurally incompatible - Generally, the source output type will not be a
subtype of the target input type
Incompatible
StructuralType Pt
StructuralType Ps
(?)
Desired Connection
Source Service
Target Service
Pt
Ps
31Service Reusability
- A data transformation mapping (?) is required to
connect the services artificially creating
subtype compatibility - If such a ? exists, the services are
structurally feasible
Incompatible
StructuralType Pt
StructuralType Ps
(?)
?
?(Ps)
Desired Connection
Source Service
Target Service
Pt
Ps
32Service Reusability
- Idea
- annotate services with semantic types (concept
expressions) primarily for discovery of services
Ontologies (OWL)
Compatible
(?)
SemanticType Ps
SemanticType Pt
Desired Connection
Source Service
Target Service
Pt
Ps
33Service Reusability
- Services can be semantically compatible, but
structurally incompatible
Ontologies (OWL)
Compatible
(?)
SemanticType Ps
SemanticType Pt
Incompatible
StructuralType Pt
StructuralType Ps
(?)
?
?(Ps)
Desired Connection
Source Service
Target Service
Pt
Ps
34The Ontology-Driven Framework (work w/ Shawn
Bowers, SEEK)
Ontologies (OWL)
Compatible
(?)
SemanticType Ps
SemanticType Pt
Registration Mapping (Input)
Registration Mapping (Output)
StructuralType Pt
StructuralType Ps
Correspondence
?(Ps)
Generate
Source Service
Target Service
Transformation
Pt
Ps
Desired Connection
35Example Generated Data Transformation (in XQuery)
- Based on the structural correspondences and
certain assumptions, we derive the transformation
query
ltcohortTablegt for s in /population/sample
return ltmeasurementgt for c in
s/meas/cnt return ltobsgtc/text()lt/obsgt
for l in s/lsp return ltphasegtl/text()lt/pha
segt lt/measurementgt lt/cohortTablegt
36Scientific Workflows(Efrat Jaeger et al.)
37Reverse Engineering a Scientific Workflow using
the KEPLER Tool (Efrat Jaeger)
38A Scientific Workflow in Kepler
Extract mineral composition for row Id.
Igneous Rock Diagrams information.
Rock Name.
39A Scientific Workflow in Kepler
40A Scientific Workflow in Kepler
41(No Transcript)
42Reverse-Engineered the Geological Map Integration
in Kepler
43DataMapper Sub-Workflow
44Result launched via the BrowserUI actor
45KEPLER and YOU
- Kepler
- is a community-based, cross-project, open source
collaboration - for minute made application integration
- using web (grid) services as basic building
blocks - has a joint CVS repository, mailing lists, web
site, - is gaining momentum thanks to contributors and
contributions - BSD-style license allows commercial spin-offs
- a pre-packaged, shrink-wrapped version
(Kepler-to-GO) coming soon to a place near you
46F I N Questions?
47Additional Material
48The KEPLER GUI (Vergil from Ptolemy II)
Drag and drop utilities, director and actor
libraries.
49Running the workflow
50Distributed Workflows in KEPLER
- Web and Grid Service plug-ins
- WSDL
- ProxyInit, GlobusGridJob, GridFTP,
DataAccessWizard - SRB
- SSH, SCP
- Web Service Harvester
- Imports all the operations of a specific WS (or
of all - the WSs in a UDDI repository) as Kepler actors
- XSLT and XQuery transformers to link non-fitting
services together - Web Service Deployment (ongoing work)
51A Generic Web Service Actor
- Given a WSDL and the name of an operation of a
web service, dynamically customizes itself to
implement and execute that method.
52Set Parameters and Commit
Set parameters and commit
53WS Actor after Instantiation
54Web Service Harvester
- Imports the web services in a repository into
the actor library. - Has the capability to search for web services
based on a keyword.
55Composing 3rd-Party WSs
Input of next web service
User interaction Transformations
56Providing DB Access through Kepler
- Database connection actor
- Opening a database connection and passing it to
all actors accessing this database. - Database query actor
- A generic actor that queries a database and
provides its result. - DBConnection type and DBConnectionToken
- A new IOPort type and a token to distinguish a
database connection from any general type.
57Database Connection Actor
- OpenDBConnection actor
- Input database connection information.
- Output A DBConnectionToken, a reference to a
database connection instance, through a
DBConnection output port.
58Database Query Actor
- Database Query actor
- Input A query string (SQL) and a database
connection reference.Parameters output type
XML, Record or String.
output each row separately or all at once.
Process Execute query. Produce results
according to parameters. -
59Querying Example
60Resource Description Framework (RDF)
- Simple data model that consists of
- Resources (uniquely identified via URIs)
- Properties
- Values (resources or character strings)
- Data organized into triples (subject, property,
value)
locatedIn
SonomaRegion
CaliforniaRegion
Property (Resource)
Subject (Resource)
Value (Resource)
locatedIn(SonomaRegion, California)
61RDF Schema
- Adds a set of pre-defined properties to define
classes and properties - Allows instances to be connected to classes
- Sub-class and sub-property (is-a) relationships
Region is a class locatedIn is a
property locatedIn connects Regions
locatedIn
Region
rdftype
rdftype
locatedIn
CaliforniaRegion
SonomaRegion
62OWL
- Adds additional pre-defined properties to further
constrain an ontology - (See http//www.w3.org/TR/owl-guide/)
- Note, RDF(S) and OWL use XML
- Some graphic tools exist (e.g., Protégé)
A Vintage is a class that is a subclass of an
unnamed class whose instances always have one
hasVintageYear property.
ltowlClass rdfID"Vintage"gt ltrdfssubClassOfgt
ltowlRestrictiongt ltowlonProperty
rdfresource"hasVintageYear"/gt
ltowlcardinalitygt1lt/owlcardinalitygt
lt/owlRestrictiongt lt/rdfssubClassOfgt
lt/owlClassgt
Note the uglified XML syntax The good news
meant for parsers, not humans!