Title: The Role of Concepts in myGrid
1The Role of Concepts in myGrid Â
- Professor Carole Goble
- the myGrid consortium
- http//www.mygrid.org.uk
2Roadmap
- Context
- Workflows, repository, registry and provenance
- Concept services
- Using concepts
- Discovering workflows and services
- Workflow composition support
- Discovering and linking experimental components
- Linking provenance logs
- Remarks
3myGrid
- EPSRC UK e-Science pilot project
- Open Source Upper Middleware for Bioinformatics
- Knowledge-driven Middleware for data intensive in
silico experiments in biology - (Web) Service-based architecture -gt OGSA Grid
services - Targeted at Tool Developers, Bioinformaticians
and Service Providers - http//www.mygrid.org.uk
Newcastle
Sheffield
Manchester
Nottingham
Hinxton
Southampton
4Data intensive bioinformatics
5Graves DiseaseAutoimmune disease of the thyroid
6Services and toolkit
7Workflows as in silico experiments
- Freefluo workflow enactment engine
- WSFL
- Scufl
- Workflow discovery
- Finding workflows that others have done, and that
I have done myself - Workflow creation
- Finding classes of services
- Guiding service composition
- We dont do automated composition
- Dynamic workflow enactment service discovery and
invocation - Choose services instances when running workflow
- User involvement
8FreeFluo and Taverna environments
- Freefluo workflow enactment engine
- WSFL
- Scufl
- Taverna development environment
9Investigation set of experiments metadata
- Experimental design components
- workflow specifications query specifications
notes describing objectives applications
databases relevant papers the web pages of
important workers, - Experimental instances that are records of
enacted experiments - data results a history of services invoked by a
workflow engine instances of services invoked
parameters set for an application notes
commenting on the results - Experimental glue that groups and links design
and instance components - a query and its results a workflow linked with
its outcome links between a workflow and its
previous and subsequent versions a group of all
these things linked to a document discussing the
conclusions of the biologist
- Life Science IDs URIs
- RDF-based annotations
- DAMLOIL -gt OWL ontologies
10Bio in silico experiments service types
- Making in silico experiments
- workflow
- distributed database query processing.
- Managing experimental outcomes
- information management
- managing metadata
- Scientific method
- provenance management
- change notification
- personalisation
- Sharing experiments
- semantic services for discovering services and
workflows, and managing metadata - third party service registries and federated
personalised views over those registries, - ontologies and ontology management.
- The base services that tools that will constitute
the experiments - third party services such databases,
computational analyses, simulations . - specialised services such as AMBIT text
extraction.
11Experiment life cycle
Personalised registries Personalised
workflows Info repository views Personalised
annotations Personalised metadata Security
Resource service discovery Repository
creation Workflow creation Database query
formation
Forming experiments
Personalisation
Discovering and reusing experiments and resources
Executing experiments
Workflow discovery refinement Resource
service discovery Repository creation Provenance
Workflow enactment Distributed Query
processing Job execution Provenance
generation Single sign-on authentician Event
notification
Providing services experiments
Managing experiments
Service registration Workflow deposition Metadata
Annotation Third party registration
Information repository Metadata
management Provenance management Workflow
evolution Event notification
12Sharing info ? Sharing meaning
- Metadata
- Data describing the content and meaning of
resources and services. - But everyone must speak the same language
- Terminologies
- Shared and common vocabularies
- For search engines, agents, curators, authors and
users - But everyone must mean the same thing
- Ontologies
- Shared and common understanding of a domain
- Essential for search, exchange and discovery
- A common vocabulary of terms
- Some specification of the meaning of the terms
- A shared understanding for people and machines
13myGrid Service Stack
External Applications
Applications
e
c
e-Science experimental management
Semantic Grid metadata capabilities
d
Core services
Data management
b
High level services for data intensive integration
Web Service Grid communication fabric, OGSA,
OGSI
External (Web/Grid) Services
External services
a
14myGrid Service Stack
Work bench
Taverna workflow environment
Talisman application
Web Portal
Applications
e
Gateway
c
Personalisation
Service and Workflow Discovery
Registries
Provenance mgt
Event Notification
Ontology Mgt
Ontologies
Metadata Mgt
d
Core services
myGrid Information Repository
FreeFluo Workflow enactment engine
OGSA Distributed Query Processor
b
Web Service Grid communication fabric
External services
AMBIT Text Extraction Service
Bio Services
Soaplab
SRS
a
EMBOSS
15W3C Ontology and Metadata languages
- OWL (and DAMLOIL)
- The Web Ontology Language OWL
- Family of languages OWL Lite, OWL DL OWL Full
- OWL DL DAMLOIL
- Expressive language for describing concepts,
relationships, constraints and axioms - Sound and complete, and efficient, reasoning over
expressions to infer relationships between
concepts rather than assert them (including the
hierarchy). - OWL is W3C Candidate recommendation.
- RDF
- Resource Description Framework
- W3C language for describing metadata on the Web
- Triples (subject, predicate, object) forming
graphs - Associate URIs (LSIDs) with other URIs (LSIDs)
- Associate URIs with OWL concepts (which are URIs)
- RDQL
- Triple store RDF implementations (e.g. Jena)
- http//www.w3.org/RDF
16Concept services Ontology Services
- Ontology server for concept expressions
- Ontology development environments
- OilEd
- FaCT reasoner for inferring over concept
expressions - Imprecise matchmaking for best effort
substitutability - Reasoning over descriptions
- Generating classification structures
- Matchmaker and ranking for matching concept
expressions - Instance store for indexing instances of concept
expressions in registries and databases
Match maker
Ontology Server
Reasoner
FaCT
Indexer (Instance Store)
KAON
mIR
Registry
Annotation Manager
Jena Toolkit
RDF
Annotation Browser
17Concept services Annotation services
Match maker
- RDF repositories
- Jena Toolkit
- RDF query languages RDQL
- myGrid Information Repository
- Version 1 Relational (DB2)
- Version 2 Federated architecture.
- Browsers for annotating objects and viewing
annotations - Automated tools for marking up objects with
annotations.
Ontology Server
Reasoner
FaCT
Indexer (Instance Store)
KAON
mIR
Registry
Annotation Manager
Jena Toolkit
RDF
Annotation Browser
18myGrid Information Repository
- Stores experimental components
- Workflow specs as XML Scufl docs
- Data
- XML notes
- Types
- XML docs
- Relational
- RDF (like)
- Every entry has Dublin Core provenance attributes
- Every entry can have (multiple) concept OWL
concept expressions - Multiple mIRs
19Registries
- Publishes experimental components services,
workflows and (distributed query plans in the
future?) - Multiple 3rd party registries
- Multiple 3rd party metadata
- Luc Moreau will talk more about this.
Views
UDDI
Views
RDF
Third party description
Service descriptions
Third party description
Workflow
Service
Org. registry
Public registry
ws-Info Docs
publishes
publishes
UDDI
Scufl
WSDL
20Provenance logging and reusing
- FreeFluo provides a detailed provenance record
stored in the mIR describing what was done, with
what services and when - XML document
- Every mIR object has (Dublin Core) provenance
properties
21Using Concepts
- Controlled vocabulary for advertisements for
workflows and services - Indexes into registries and mIR
- Semantic discovery of services and workflows
- Semantic discovery of repository entries
- Type management for composition
- Semantic workflow construction guidance and
validation - Navigation paths between data and knowledge
holdings - Semantic glue between repository entries
- Semantic annotation and linking of workflow
provenance logs
22Tiered specification of single step in an
experiment design
Classes of services Domain semantic Unexecutabl
e Potentials
Instances of services Business operational Exec
utable Actuals
23Stratified metadata
- Service Type and Class (OWL)
24Ontology Suite
parameters input, output, precondition,
effect performs_task uses-resource is_function_of
Inspired by DAML-S
Upper level ontology
Publishing ontology
Informatics ontology
Molecularbiology ontology
Organisationontology
Task ontology
Bioinformatics ontology
Web serviceontology
25Semantic discovery services workflows
- Services and workflows in registry have RDF and
OWL descriptions - Selection by the types of inputs they use,
outputs they produce, the bioinformatics tasks
they perform - Querying using RDQL over RDF UDDI registry for
operational metadata - Matching using FaCT OWL classification for
concept-based metadata
A registry browser
A workflow wizard
26Workflow creation, resolution enactment
27Find Component
Client Service Browser
Find Service
Word-based discovery
Semantic discovery
Structural discovery
Syntactic discovery
Views
UDDI
Views
RDF
Third party description
Service descriptions
Third party description
Workflow
workflow descriptions
Service
mIR
Org. registry
Public registry
publishes
publishes
UDDI
Scufl
WSDL
28Workflow construction
- Outputs and inputs of chained services are
compatible - OWL Concept
- XSD Type
- Data Format
- Workflows are constructed in collaboration with
Scientist - No automated workflow creation
- Find service being embedded into Taverna by end
October like Geodise approach
http//sourceforge.net/projects/taverna)
29Matrix of metadata in workflow lifecycle
30Linking objects to objects via concepts
Workflows
Provenance record of workflow runs
Notes
People
Data holdings
Services
31Linking objects to objects via URIs and LSIDs
People to notify of the workflow status
Provenance of the workflow template. Related
workflows.
Ontologies describing workflows
32Quacks like a graph and waddles like a graph RDF
Workflows that could use pr generate this data
People who have registered an interest in this
data
Related Data holding
Provenance of the data holdings
Ontologies describing data
33A Semantic Web of Science Hendler02
Workflows they wrote or used
People they collaborate with
34Conceptual Hypermedia COHSE
http//cohse.semanticweb.org
35Manual annotation
- Generic ontology organizations and users
- Bioinformatics ontology
- Biology ontology
36Annotating and linking provenance logs
Concept
organization
Target links
- Generic ontology organizations and users
- Bioinformatics ontology
- Biology ontology
37Linking logs together based on concepts
white blood cell
neutrophil
lymphocyte
38Generated hypertext of logs
39Generating workflow provenance
- 3 files Xscufl (workflow specification), ws-info
files, inputs parameters - Provenance startTime, endTime, service instances
invoked - Output results, metadata
40Reflections semantic Grid services
- Adverts for services and workflows turns out to
be jolly tricky - Describing different executable objects
- Workflows and Services
- Stratification of metadata
- Classes and Instances of services and workflows
- Service execution
- Complex state based invocation models
- Parametric polymorphism of services
- Executable process models vs discovery process
models - Multi-dimensions of service composition.
- Multiple descriptions, multiple interfaces
- Users needs vs machine needs
- The dimensions of Service Class substitution
- Biologists choose experimentally meaningful
services and do not want semantically similar
substitutions only substituting one instance for
another - Experimentally neutral glue services that can
be substituted are comparatively few - If users are choosing services you dont need
many kinds of metadata to eliminate 90 of options
41Human vs machine views
Human
Machine
Service User
Service provider
Weak semantic descriptions Rewriting views
UDDI style advertisements
Human
Syntactic descriptions Interface
descriptions Invocation descriptions Semantic
mining
Elaborate Semantic descriptions Simplification
views
Machine
42Reflections annotations
- Annotation metadata model for myGrid holdings are
a Graph - If it waddles like RDF and quacks like RDF, its
RDF - Experiments in RDF scalability
- Co-existence of RDF and other data models
(relational) - Acquisition of annotations and adverts
- Automated by mining WSDL docs, mining ws-info
docs - Deep annotation works ok for bioinformatic
service concepts (its an EMBL record) but - Annotating with biologically meaningful concepts
is harder - Data in the mIR (its a lymphocyte)
- Manual annotation cost is high!
- Service/workflow publication tools
- Dealing with change
- Ontology changes service changes annotations
change.
43AcknowledgementsLuc Moreau, Simon Miles,
Keith Decker, Terry Payne, Phil Lord, Chris Wroe,
Roberts Stevens, Kevin Garwood, Jun
Zhaohttp//www.mygrid.org.uk/
44Relevant Publications
- P Lord, C Wroe, R Stevens, CA Goble, S Miles, L
Moreau, K Decker, T Payne, J Papay, Semantic and
Personalised Service Discovery in Proceedings
IEEE/WIC International Conference on Web
Intelligence / Intelligent Agent Technology
Workshop on "Knowledge Grid and Grid
Intelligence" October 13, 2003, Halifax, Canada - S Miles, J Papay, V Dialani, M Luck, K Decker, T
Payne, L Moreau. Personalised Grid Service
Discovery, 19th Annual UK Performance Engineering
Workshop (UKPEW'03), Warwick, UK, July 2003,
pp.131-140, ISBN 0-9541000-2-6. - J Zhao, CA Goble, M Greenwood, C Wroe, R Stevens
Annotating, linking and browsing provenance logs
for e-Science in 1st Semantic Web Conference
(ISWC2003) Workshop on Retrieval of Scientific
Data, Florida, USA, October 2003 - C Wroe, R.D. Stevens, CA Goble, A Roberts, M
Greenwood A suite of DAMLOIL ontologies to
describe bioinformatics web services and data.
International Journal of Cooperative Information
Systems. Special issue on Bioinformatics and
Biological Data Management  12(2)197-224, 2003. - S Bechhofer, L Carr, CA Goble, S Kampa, T
Miles-Board The Semantics of Semantic Annotation.
ODBASE First International Conference on
Ontologies, Databases, and Applications of
Semantics for Large Scale Information Systems,
Irvine, California. Springer-Verlag LNCS Vol.
2519, pp. 1151--1167. 2002 - CA Goble, S Pettifer, R Stevens and C Greenhalgh
Knowledge Integration In silico Experiments in
Bioinformatics in The Grid Blueprint for a New
Computing Infrastructure Second Edition eds. Ian
Foster and Carl Kesselman, 2003, Morgan Kaufman,
in press - C Wroe, CA Goble, M Greenwood, P Lord, S Miles, L
Moreau, J Papay, T Payne Experiment automation
using semantic data on a bioinformatics Grid,
IEEE Intelligent Systems in review