The Role of Concepts in myGrid - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

The Role of Concepts in myGrid

Description:

specialised services such as AMBIT text extraction. Experiment life cycle. Executing experiments ... AMBIT. Text Extraction Service. Provenance mgt ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 40
Provided by: caro256
Category:
Tags: ambit | concepts | mygrid | role

less

Transcript and Presenter's Notes

Title: The Role of Concepts in myGrid


1
The Role of Concepts in myGrid  
  • Professor Carole Goble
  • the myGrid consortium
  • http//www.mygrid.org.uk

2
Roadmap
  • Context
  • Workflows, repository, registry and provenance
  • Concept services
  • Using concepts
  • Discovering workflows and services
  • Workflow composition support
  • Discovering and linking experimental components
  • Linking provenance logs
  • Remarks

3
myGrid
  • EPSRC UK e-Science pilot project
  • Open Source Upper Middleware for Bioinformatics
  • Knowledge-driven Middleware for data intensive in
    silico experiments in biology
  • (Web) Service-based architecture -gt OGSA Grid
    services
  • Targeted at Tool Developers, Bioinformaticians
    and Service Providers
  • http//www.mygrid.org.uk

Newcastle
Sheffield
Manchester
Nottingham
Hinxton
Southampton
4
Data intensive bioinformatics

5
Graves DiseaseAutoimmune disease of the thyroid

6
Services and toolkit
7
Workflows as in silico experiments
  • Freefluo workflow enactment engine
  • WSFL
  • Scufl
  • Workflow discovery
  • Finding workflows that others have done, and that
    I have done myself
  • Workflow creation
  • Finding classes of services
  • Guiding service composition
  • We dont do automated composition
  • Dynamic workflow enactment service discovery and
    invocation
  • Choose services instances when running workflow
  • User involvement

8
FreeFluo and Taverna environments
  • Freefluo workflow enactment engine
  • WSFL
  • Scufl
  • Taverna development environment

9
Investigation set of experiments metadata
  • Experimental design components
  • workflow specifications query specifications
    notes describing objectives applications
    databases relevant papers the web pages of
    important workers,
  • Experimental instances that are records of
    enacted experiments
  • data results a history of services invoked by a
    workflow engine instances of services invoked
    parameters set for an application notes
    commenting on the results
  • Experimental glue that groups and links design
    and instance components
  • a query and its results a workflow linked with
    its outcome links between a workflow and its
    previous and subsequent versions a group of all
    these things linked to a document discussing the
    conclusions of the biologist
  • Life Science IDs URIs
  • RDF-based annotations
  • DAMLOIL -gt OWL ontologies

10
Bio in silico experiments service types
  • Making in silico experiments
  • workflow
  • distributed database query processing.
  • Managing experimental outcomes
  • information management
  • managing metadata
  • Scientific method
  • provenance management
  • change notification
  • personalisation
  • Sharing experiments
  • semantic services for discovering services and
    workflows, and managing metadata
  • third party service registries and federated
    personalised views over those registries,
  • ontologies and ontology management.
  • The base services that tools that will constitute
    the experiments
  • third party services such databases,
    computational analyses, simulations .
  • specialised services such as AMBIT text
    extraction.

11
Experiment life cycle
Personalised registries Personalised
workflows Info repository views Personalised
annotations Personalised metadata Security
Resource service discovery Repository
creation Workflow creation Database query
formation
Forming experiments
Personalisation
Discovering and reusing experiments and resources
Executing experiments
Workflow discovery refinement Resource
service discovery Repository creation Provenance
Workflow enactment Distributed Query
processing Job execution Provenance
generation Single sign-on authentician Event
notification
Providing services experiments
Managing experiments
Service registration Workflow deposition Metadata
Annotation Third party registration
Information repository Metadata
management Provenance management Workflow
evolution Event notification
12
Sharing info ? Sharing meaning
  • Metadata
  • Data describing the content and meaning of
    resources and services.
  • But everyone must speak the same language
  • Terminologies
  • Shared and common vocabularies
  • For search engines, agents, curators, authors and
    users
  • But everyone must mean the same thing
  • Ontologies
  • Shared and common understanding of a domain
  • Essential for search, exchange and discovery
  • A common vocabulary of terms
  • Some specification of the meaning of the terms
  • A shared understanding for people and machines

13
myGrid Service Stack
External Applications
Applications
e
c
e-Science experimental management
Semantic Grid metadata capabilities
d
Core services
Data management
b
High level services for data intensive integration
Web Service Grid communication fabric, OGSA,
OGSI
External (Web/Grid) Services
External services
a
14
myGrid Service Stack
Work bench
Taverna workflow environment
Talisman application
Web Portal
Applications
e
Gateway
c
Personalisation
Service and Workflow Discovery
Registries
Provenance mgt
Event Notification
Ontology Mgt
Ontologies
Metadata Mgt
d
Core services
myGrid Information Repository
FreeFluo Workflow enactment engine
OGSA Distributed Query Processor
b
Web Service Grid communication fabric
External services
AMBIT Text Extraction Service
Bio Services
Soaplab
SRS
a
EMBOSS
15
W3C Ontology and Metadata languages
  • OWL (and DAMLOIL)
  • The Web Ontology Language OWL
  • Family of languages OWL Lite, OWL DL OWL Full
  • OWL DL DAMLOIL
  • Expressive language for describing concepts,
    relationships, constraints and axioms
  • Sound and complete, and efficient, reasoning over
    expressions to infer relationships between
    concepts rather than assert them (including the
    hierarchy).
  • OWL is W3C Candidate recommendation.
  • RDF
  • Resource Description Framework
  • W3C language for describing metadata on the Web
  • Triples (subject, predicate, object) forming
    graphs
  • Associate URIs (LSIDs) with other URIs (LSIDs)
  • Associate URIs with OWL concepts (which are URIs)
  • RDQL
  • Triple store RDF implementations (e.g. Jena)
  • http//www.w3.org/RDF

16
Concept services Ontology Services
  • Ontology server for concept expressions
  • Ontology development environments
  • OilEd
  • FaCT reasoner for inferring over concept
    expressions
  • Imprecise matchmaking for best effort
    substitutability
  • Reasoning over descriptions
  • Generating classification structures
  • Matchmaker and ranking for matching concept
    expressions
  • Instance store for indexing instances of concept
    expressions in registries and databases

Match maker
Ontology Server
Reasoner
FaCT
Indexer (Instance Store)
KAON
mIR
Registry
Annotation Manager
Jena Toolkit
RDF
Annotation Browser
17
Concept services Annotation services
Match maker
  • RDF repositories
  • Jena Toolkit
  • RDF query languages RDQL
  • myGrid Information Repository
  • Version 1 Relational (DB2)
  • Version 2 Federated architecture.
  • Browsers for annotating objects and viewing
    annotations
  • Automated tools for marking up objects with
    annotations.

Ontology Server
Reasoner
FaCT
Indexer (Instance Store)
KAON
mIR
Registry
Annotation Manager
Jena Toolkit
RDF
Annotation Browser
18
myGrid Information Repository
  • Stores experimental components
  • Workflow specs as XML Scufl docs
  • Data
  • XML notes
  • Types
  • XML docs
  • Relational
  • RDF (like)
  • Every entry has Dublin Core provenance attributes
  • Every entry can have (multiple) concept OWL
    concept expressions
  • Multiple mIRs

19
Registries
  • Publishes experimental components services,
    workflows and (distributed query plans in the
    future?)
  • Multiple 3rd party registries
  • Multiple 3rd party metadata
  • Luc Moreau will talk more about this.

Views
UDDI
Views
RDF
Third party description
Service descriptions
Third party description
Workflow
Service
Org. registry
Public registry
ws-Info Docs
publishes
publishes
UDDI
Scufl
WSDL
20
Provenance logging and reusing
  • FreeFluo provides a detailed provenance record
    stored in the mIR describing what was done, with
    what services and when
  • XML document
  • Every mIR object has (Dublin Core) provenance
    properties

21
Using Concepts
  • Controlled vocabulary for advertisements for
    workflows and services
  • Indexes into registries and mIR
  • Semantic discovery of services and workflows
  • Semantic discovery of repository entries
  • Type management for composition
  • Semantic workflow construction guidance and
    validation
  • Navigation paths between data and knowledge
    holdings
  • Semantic glue between repository entries
  • Semantic annotation and linking of workflow
    provenance logs

22
Tiered specification of single step in an
experiment design
Classes of services Domain semantic Unexecutabl
e Potentials
Instances of services Business operational Exec
utable Actuals
23
Stratified metadata
  • Service Type and Class (OWL)
  • Service Instance (RDF)

24
Ontology Suite
parameters input, output, precondition,
effect performs_task uses-resource is_function_of
Inspired by DAML-S
Upper level ontology
Publishing ontology
Informatics ontology
Molecularbiology ontology
Organisationontology
Task ontology
Bioinformatics ontology
Web serviceontology
25
Semantic discovery services workflows
  • Services and workflows in registry have RDF and
    OWL descriptions
  • Selection by the types of inputs they use,
    outputs they produce, the bioinformatics tasks
    they perform
  • Querying using RDQL over RDF UDDI registry for
    operational metadata
  • Matching using FaCT OWL classification for
    concept-based metadata

A registry browser
A workflow wizard
26
Workflow creation, resolution enactment
27
Find Component
Client Service Browser
Find Service
Word-based discovery
Semantic discovery
Structural discovery
Syntactic discovery
Views
UDDI
Views
RDF
Third party description
Service descriptions
Third party description
Workflow
workflow descriptions
Service
mIR
Org. registry
Public registry
publishes
publishes
UDDI
Scufl
WSDL
28
Workflow construction
  • Outputs and inputs of chained services are
    compatible
  • OWL Concept
  • XSD Type
  • Data Format
  • Workflows are constructed in collaboration with
    Scientist
  • No automated workflow creation
  • Find service being embedded into Taverna by end
    October like Geodise approach

http//sourceforge.net/projects/taverna)
29
Matrix of metadata in workflow lifecycle
30
Linking objects to objects via concepts
Workflows
Provenance record of workflow runs
Notes
People
Data holdings
Services
31
Linking objects to objects via URIs and LSIDs
People to notify of the workflow status
Provenance of the workflow template. Related
workflows.
Ontologies describing workflows
32
Quacks like a graph and waddles like a graph RDF
Workflows that could use pr generate this data
People who have registered an interest in this
data
Related Data holding
Provenance of the data holdings
Ontologies describing data
33
A Semantic Web of Science Hendler02
Workflows they wrote or used
People they collaborate with
34
Conceptual Hypermedia COHSE
http//cohse.semanticweb.org
35
Manual annotation
  • Generic ontology organizations and users
  • Bioinformatics ontology
  • Biology ontology

36
Annotating and linking provenance logs
Concept
organization
Target links
  • Generic ontology organizations and users
  • Bioinformatics ontology
  • Biology ontology

37
Linking logs together based on concepts
white blood cell
neutrophil
lymphocyte
38
Generated hypertext of logs
39
Generating workflow provenance
  • 3 files Xscufl (workflow specification), ws-info
    files, inputs parameters
  • Provenance startTime, endTime, service instances
    invoked
  • Output results, metadata

40
Reflections semantic Grid services
  • Adverts for services and workflows turns out to
    be jolly tricky
  • Describing different executable objects
  • Workflows and Services
  • Stratification of metadata
  • Classes and Instances of services and workflows
  • Service execution
  • Complex state based invocation models
  • Parametric polymorphism of services
  • Executable process models vs discovery process
    models
  • Multi-dimensions of service composition.
  • Multiple descriptions, multiple interfaces
  • Users needs vs machine needs
  • The dimensions of Service Class substitution
  • Biologists choose experimentally meaningful
    services and do not want semantically similar
    substitutions only substituting one instance for
    another
  • Experimentally neutral glue services that can
    be substituted are comparatively few
  • If users are choosing services you dont need
    many kinds of metadata to eliminate 90 of options

41
Human vs machine views
Human
Machine
Service User
Service provider
Weak semantic descriptions Rewriting views
UDDI style advertisements
Human
Syntactic descriptions Interface
descriptions Invocation descriptions Semantic
mining
Elaborate Semantic descriptions Simplification
views
Machine
42
Reflections annotations
  • Annotation metadata model for myGrid holdings are
    a Graph
  • If it waddles like RDF and quacks like RDF, its
    RDF
  • Experiments in RDF scalability
  • Co-existence of RDF and other data models
    (relational)
  • Acquisition of annotations and adverts
  • Automated by mining WSDL docs, mining ws-info
    docs
  • Deep annotation works ok for bioinformatic
    service concepts (its an EMBL record) but
  • Annotating with biologically meaningful concepts
    is harder
  • Data in the mIR (its a lymphocyte)
  • Manual annotation cost is high!
  • Service/workflow publication tools
  • Dealing with change
  • Ontology changes service changes annotations
    change.

43
AcknowledgementsLuc Moreau, Simon Miles,
Keith Decker, Terry Payne, Phil Lord, Chris Wroe,
Roberts Stevens, Kevin Garwood, Jun
Zhaohttp//www.mygrid.org.uk/
44
Relevant Publications
  • P Lord, C Wroe, R Stevens, CA Goble, S Miles, L
    Moreau, K Decker, T Payne, J Papay, Semantic and
    Personalised Service Discovery in Proceedings
    IEEE/WIC International Conference on Web
    Intelligence / Intelligent Agent Technology
    Workshop on "Knowledge Grid and Grid
    Intelligence" October 13, 2003, Halifax, Canada
  • S Miles, J Papay, V Dialani, M Luck, K Decker, T
    Payne, L Moreau. Personalised Grid Service
    Discovery, 19th Annual UK Performance Engineering
    Workshop (UKPEW'03), Warwick, UK, July 2003,
    pp.131-140, ISBN 0-9541000-2-6.
  • J Zhao, CA Goble, M Greenwood, C Wroe, R Stevens
    Annotating, linking and browsing provenance logs
    for e-Science in 1st Semantic Web Conference
    (ISWC2003) Workshop on Retrieval of Scientific
    Data, Florida, USA, October 2003
  • C Wroe, R.D. Stevens, CA Goble, A Roberts, M
    Greenwood A suite of DAMLOIL ontologies to
    describe bioinformatics web services and data.
    International Journal of Cooperative Information
    Systems. Special issue on Bioinformatics and
    Biological Data Management   12(2)197-224, 2003.
  • S Bechhofer, L Carr, CA Goble, S Kampa, T
    Miles-Board The Semantics of Semantic Annotation.
    ODBASE First International Conference on
    Ontologies, Databases, and Applications of
    Semantics for Large Scale Information Systems,
    Irvine, California. Springer-Verlag LNCS Vol.
    2519, pp. 1151--1167. 2002
  • CA Goble, S Pettifer, R Stevens and C Greenhalgh
    Knowledge Integration In silico Experiments in
    Bioinformatics in The Grid Blueprint for a New
    Computing Infrastructure Second Edition eds. Ian
    Foster and Carl Kesselman, 2003, Morgan Kaufman,
    in press
  • C Wroe, CA Goble, M Greenwood, P Lord, S Miles, L
    Moreau, J Papay, T Payne Experiment automation
    using semantic data on a bioinformatics Grid,
    IEEE Intelligent Systems in review
Write a Comment
User Comments (0)
About PowerShow.com