Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows

Description:

Smart Search. Find a component (here: an actor) in different locations ('categories' ... Search for components with compatible input/output semantic types ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 26
Provided by: bertramlud
Category:

less

Transcript and Presenter's Notes

Title: Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows


1
Semantic Mediation in SEEK/Kepler Exploiting
Semantic Annotation for Discovery, Analysis, and
Integration of Scientific Data and Workflows
Shawn Bowers UC Davis Genome Center sbowers _at_
ucdavis.edu
  • Bertram Ludäscher
  • Dept. of Computer Science, UC Davis
  • UC Davis Genome Center
  • ludaesch _at_ ucdavis.edu

seek.ecoinformatics.org kepler-project.org
www.sdsc.edu dbis.ucdavis.edu
genomics.ucdavis.edu
2
Science Environment for Ecological Knowledge
  • SEEK is an NSF-funded, multidisciplinary research
    project to facilitate
  • Access to distributed ecological, environmental,
    and biodiversity data
  • Enable data sharing reuse
  • Enhance data discovery at global scales
  • Scalable analysis and synthesis
  • Taxonomic, spatial, temporal, conceptual
    integration of data, addressing data
    heterogeneity issues
  • Enable communication and collaboration for
    analysis
  • Enable reuse of analytical components
  • Support scientific workflow design and modeling

3
SEEK data access, analysis, mediation
  • Data Access (EcoGrid)
  • Distributed data network for environmental,
    ecological, and systematics data
  • Interoperate diverse environmental data systems
  • Workflow Tools (Kepler)
  • Problem-solving environment for scientific data
    analysis and visualization ? scientific
    workflows
  • Semantic Mediation (SMS)
  • Leverage ontologies for smartdata/component
    discovery and integration

4
Managing Data Heterogeneity
  • Data comes from heterogeneous sources
  • Real-world observations
  • Spatial-temporal contexts
  • Collection/measurement protocols and procedures
  • Many representations for thesame information
    (count, area, density)
  • Data, Syntax, Schema, Semantic heterogeneity
  • Discovery and synthesis (integration) performed
    manually
  • Discovery often based on intuitive notion of
    what is out there
  • Synthesis of data is very time consuming, and
    limits use

5
Scientific workflow systems support data analysis
KEPLER
6
A simple Kepler workflow
Composite Component (Sub-workflow)
Loops often used in SWFs e.g., in genomics and
bioinformatics (collections of data, nested data,
statistical regressions, ...)
(T. McPhillips)
7
A simple Kepler workflow
Lists Nexus filesto process (project)
Reads text files
Parses Nexus format
Draws phylogenetic trees
PhylipPars infers trees from discrete,
multi-state characters.
Workflow runs PhylipPars iteratively to discover
all of the most parsimonious trees.
UniqueTrees discards redundant trees in each
collection.
(T. McPhillips)
8
A simple Kepler workflow
An example workflow run, executed as a Dataflow
Process Network
9
SMS motivation
  • Scientific Workflow Life-cycle
  • Resource Discovery
  • discover relevant datasets
  • discover relevant actors or workflow templates
  • Workflow Design and Configuration
  • data ? actor (data binding)
  • data ? data (data integration / merging /
    interlinking)
  • actor ? actor (actor / workflow
    composition)
  • Challenge do all this in the presence of
  • 100s of workflows and templates
  • 1000s of actors (e.g. actors for web services,
    data analytics, )
  • 10,000s of datasets
  • 1,000,000s of data items
  • highly complex, heterogeneous data

price to pay for these resources (lots)
scientists time wasted priceless!
10
Approach SMS capabilities
Ontologies
Iterative Development
SemanticAnnotation
Resource Discovery
Workflow Validation
Resource Integration
Workflow Elaboration
11
Approach SMS capabilities
Ontologies
Iterative Development
SemanticAnnotation
Resource Discovery
Workflow Validation
Resource Integration
Workflow Elaboration
  • SEEK KR group is developing OWL-DL ontologies
  • Various workflow-component ontologies (for
    categorizing by function, project, scientific
    discipline, )
  • Scientific observation ontology (OBOE), an upper
    ontology for defining and relating observations,
    measurements, and units
  • Domain specific ontologies that extend OBOE
    (standard and derived units, ecology and
    biodiversity concepts, )

12
Approach SMS capabilities
Ontologies
Iterative Development
SemanticAnnotation
Resource Discovery
Workflow Validation
Resource Integration
Workflow Elaboration
  • Annotations connect resources to ontologies
  • Conceptually describe a resource and/or its data
    schema
  • Annotations provide the means for ontology-based
    discovery, integration,

13
Hybrid types Semantic Structural Typing
14
Semantic Type Annotation in Kepler
  • Component input and output port annotation
  • Each port can be annotated with multiple classes
    from multiple ontologies
  • Annotations are stored within the component
    metadata

15
Component Annotation and Indexing
  • Component Annotations
  • New components can be annotated and indexed into
    the component library (e.g., specializing generic
    actors)
  • Existing components can also be revised,
    annotated, and indexed (hiding previous versions)

16
Approach SMS capabilities
Ontologies
Iterative Development
SemanticAnnotation
Resource Discovery
Workflow Validation
Resource Integration
Workflow Elaboration
  • Ontology-based smart search
  • Find components by semantic types
  • Find components by input/output semantic types
  • Ontology-based query rewriting for
    discovery/integration
  • Joint work with GEON project (see SSDBM-04,
    SWDB-04)

17
Smart Search
  • Find a component (here an actor) in different
    locations (categories)
  • based on the semantic annotation of the
    component (or its ports)

18
Searching in context
  • Search for components with compatible
    input/output semantic types
  • searches over actor library
  • applies subsumption checking on port annotations

19
Approach SMS capabilities
Ontologies
Iterative Development
SemanticAnnotation
Resource Discovery
Workflow Validation
Resource Integration
Workflow Elaboration
  • Workflow validation and analysis
  • Check that workflows are semantically
    structurally well-typed
  • Infer semantic type annotations of derived data
    (ie, type inference)
  • An initial approach and prototype based on
    mapping composition (see QLQP-05)
  • User-oriented provenance
  • Collect query data-lineage of WF runs (see
    IPAW-06)

20
Workflow validation in Kepler
  • Statically perform semantic and structural type
    checking
  • Navigate errors and warnings within the workflow
  • Search for and insert adapters to fix
    (structural and semantic) errors

21
Approach SMS capabilities
Ontologies
Iterative Development
SemanticAnnotation
Resource Discovery
Workflow Validation
Resource Integration
Workflow Elaboration
  • Integrating and transforming data
  • Merge (smart union) datasets
  • Find mappings between data schemas for
    transformation
  • data binding, component connections (see DILS-04)

22
Smart (Data) Integration Merge
  • Discover data of interest
  • connect to merge actor
  • compute merge
  • align attributes via annotations
  • open dialog for user refinement
  • store merge mapping in MOML
  • enjoy!
  • your merged dataset
  • almost, can be much more complicated

23
Under the hood of Smart Merge
  • Exploits semantic type annotations and ontology
    definitions to find mappings between sources
  • Executing the merge actor results in an
    integrated data product (via outer union)

a1
a3
a1a8
a4
a3a6
Merge
a6
a4
a8
24
Approach SMS capabilities
Ontologies
Iterative Development
SemanticAnnotation
Resource Discovery
Workflow Validation
Resource Integration
Workflow Elaboration
  • Workflow design support
  • (Semi-) automatically combine resource
    discovery, integration, and validation
  • Abstract ? Executable WF
  • ongoing work!

Automated SWF Refinement
25
Summary
  • Outlook
  • Ontologies and semantic anotations for WF design
    reuse
  • Put ontologies to actual use in Kepler
  • Continue to develop Kepler tools for annotation
    (KR observation ontology), discovery,
    integration, design,
  • Issues Challenges
  • Tools/approaches for ontology (OWL) management,
    organization, reasoning
  • Open source (distributed) ontology (OWL) storage
    and reasoning
  • Tools and techniques for robust ontology
    versioning, and extension
  • Acknowledgements
  • Timothy McPhillips, Dave Thau (UC Davis)
  • Mark Schildhauer, Josh Madin, Matt Jones (UCSB)
  • Deana Pennington (UNM)
  • Rich Williams (Microsoft Research)
  • Ferdinando Villa, Sergey Krivov (UVM)
Write a Comment
User Comments (0)
About PowerShow.com