Title: Part 4: Pioneers and Examples Carole Goble
1Part 4 Pioneers and ExamplesCarole Goble
2Specific services Application specific
controlled vocabularies
Grid Applications
Standard services semantic data integration,
service discovery, workflow enactment
composition, provenance, portals
Open Grid Service Architecture Tupperware
upper services
Standard services resource selection, matching
and brokering
Open Grid Service Architecture Underware
plumbing services
Standard interfaces and behaviours for
distributed systems naming, service state,
lifetime management, notification, registry
management
Web Service Resource Framework Web
Service-Notification WS-I
Standard mechanisms for describing and invoking
services WSDL, SOAP, WS-Security etc
Web Services
3Where SW technologies are being used in e-Science
CombeChem, myGrid, CMCS, Hero
Composing, validating and repairing workflows and
service compositions negotiations
Describing and linking provenance records
GriPhyN (Pegasus), GRIP
Matching and provisioning
Pegasus, Geodise, myGrid, Kepler, CAT-S
Resource, service, workflow, data
set, registration, description discovery
Notification topics
myGrid, Geodise, SEEK
Controlled vocabularies for metadata and data
Schema mediation
Knowledge-based guidance and recommendation
MIAKT, AstroGrid
SEEK, GEON, BIRN Artemis
Geodise
4Publication and Discovery
- Promote sharing and (re)use
- Services, resources and workflows require a
semantic-driven description - Semantics is key to negotiation, discovery and
workflow composition - If you cant describe what you want, you cant
have it. - If you cant describe what youve got, no-one can
find it or use it. - To (re)use components we need a way of describing
what they do a place to put descriptions and a
way of searching for them
Annotator
Domain User
5Background
- Semantic Web Services
- Darpa Agent Mark-up Language
- DAML-S http//www.daml.org
- Semantic Web Services Initiative
- OWL-S http//www.swsi.org
- EU FP6 SDK Cluster
- Web Service Modelling Ontology
- http//www.wsmo.org/
- Purpose automated service discovery and
composition suitable for agent-based frameworks.
6http//www.mygrid.org.uk
Intelligent engineering design search and
optimisation for fluid dynamics. Matlab based
http//www.geodise.org
7Publication and Discovery
Ontologies
Classification
Knowledge consuming application Workflow
construction
Raw function/service/workflow
Discovery consumption mechanisms
Function/Service/ Workflow Annotation
Registration
Publishing
Retrieval
Function/Service/Workflow Repository
8In silico biology http//www.mygrid.org.uk
- Construct in silico experiments, find and adapt
others, manage the experiment lifecycle - Tupperware
- Workflows and DQP
- Semantic registries,
- Knowledge-based provenance and metadata
management - Event notification
- Taverna workflow workbench
Middleware for data intensive in silico biology
by bioinformaticians
9Discovery in Taverna Workflow Workbench
- User chooses services or workflows.
- A common ontology (in OWL) used to annotate and
query any myGrid object including services. - Discover workflows and services described in the
registry via Taverna. - Find workflows that accept an input of semantic
type nucleotide sequence requires annotating
data as well as service inputs.
10Discovery in Taverna Workflow Workbench
- Drag a workflow entry into the explorer pane and
the workflow loads. - Drag a service/ workflow to the scavenger window
for inclusion into the workflow
11Information Model
- Components form a loosely coupled system
- An Information Model for e-Science experiments,
based on CCLRC scientific metadata model - XML messages between services conform to the
model - Life Science Identifiers (URNs) uniquely
identify all myGrid experimental objects
(workflows, workflow templates, data, data sets
etc
Domain specific knowledge model
Domain neutral in silico experiment data model
XML
http//cvs.mygrid.org.uk/cgi-bin/viewcvs.cgi/mygri
d/MIR/model/
12 Model of services
operation name, description input output task met
hod resource application
service name, description authororganisation
input
parameter name, description semantic
type format transport type collection
type collection format
output
workflow
WSDL operation
WSDL service
Soaplab service
bioMoby service
13Service Ontology Suite
parameters input, output, precondition,
effect performs_task uses-resource is_function_of
Upper level ontology
Inspired by DAML-S
Publishing ontology
Informatics ontology
Molecularbiology ontology
Organisationontology
Task ontology
Bioinformatics ontology
Web serviceontology
Current work Joint development on an Open
Biological Ontologies BioService Ontology.
http//obo.sourceforge.net/
14A Blast Description
- Service Name Blast
- Operation execute
- task pairwise_local_aligning
- resource EMBL
- application blastn
- Parameter
- Input
- Name accession
- semantic type EMBL Nucleotide sequence id
- transport data type string
- Output
- Name Result
- semantic type sequence alignment report
- transport data type string
-
-
15Discovery
Service Providers
Ontology Store
Ontologists
Others
Vocabulary
WSDL
Feta Semantic Discovery
Soap- lab
Bioinformaticians
Registry
Taverna Workbench
Registry (Personalised View)
Registry
Registry
Workflow Execution
FreeFluo WfEE
invoking
mIR
Store data metadata
16Feta Example
- Domain dependent query
- Find a workflow or service that performs
nucleotide sequence alignment - performs task aligning or more specific
- accepts input nucleotide sequence or more
general
Biological data
Task
Bio Sequence data
Aligning
Nucleotide sequence data
Local aligning
.
Pairwise local aligning
Protein sequence data
Global aligning
.
.
17Feta Semantic Discovery
18Publication
Service Providers
Ontologists
Others
Ontology Store
Description extraction
WSDL
Interface Description
Vocabulary
Soap- lab
Pedro Annotation tool
Annotation providers
Annotation/ description
Taverna Workbench
Registry (Personalised View)
Registry
Registry plug-in
Registry
19Pedro Data Entry Tool
Pedro Data Entry Tool
20Annotating Anything
Ontologists
Ontology Store
Vocabulary
Haystack Provenance Browser
Pedro Annotation tool
Annotation providers
Annotation/ description
Scientists
Taverna Workbench
myGrid Information Repository
Store plug-in
Metadata store RDF Jena
Data Store RDBMS mySQL
21Stratified metadata
- Service Type and Class (OWL)
22Describing workflows
23(No Transcript)
24Application of Semantic Grid for engineers
- VERTICAL advice on workflow assembly
- Semantic matching
- Contextual advice
- HORIZONTAL advice on component configuration
- Low level at semantic level
- What needs to be filled out ( for a valid
configuration) - High level at knowledge level
- Filled out with what? Why? Suggest suitable value
(for best configuration/ usage) - Integration
- GUI mode Workflow Composer Environment (WCE)
- Text mode workflow editor (Domain Script Editor)
25Semantic MetadataManagement System
26System Deployment
27Knowledge and Application Integration Architecture
Workflow Construction Environment
Semantic driven
Decision-Tree
Workflow Advisor
Workflow Wizard
Function/Workflow Manager
Archive Manager
Ontology Manager
Semantic Queries
Semantic Annotation
Database Archiving
Semantic Archiving
Function Archive
Workflow- Template Archive
Workflow Archive
Geodise Ontologies
28Publication Function Annotator
- Customised for Matlab functions
- Automatic parsing of Matlab function source
- Instantiating concepts defined in ontology
- Semi-automatic filling of the ontology driven
forms
29Advice on Function Assembly(Integrated in Matlab
Knowledge Toolbox)
- Goal
- Function assembly
- What can be deploy next and before?
- Mechanism
- Matlab ? Java ? WSDL ? Web service
- Function semantic interface
- Semantic matching
- Pre-requirements
- Function has been annotated
- Semantics available in the instance store
30Advice on Function Assembly(Integrated in Domain
Script Editor)
Domain script editing area
Ontology and semantics
Function configuration advice
Function assembly advice
31Advice on Function Configuration
get the default beam structure beam
createBeamStruct (4) analyze the OMETH and
advice on its additional control parameter (with
default value) beamcontrol gdk_options(beam)
check semantics gdk_semantics(GD_NPOP)
further configure these control parameters
run options s OptionsMatlab (beamcontrol)
1
2
3
32Advice on Function Assembly(Integrated in WCE
workflow advisor)
- Contextual advice
- Workflow composition via interface semantic
matching - Function configuration via semantic annotation
decomposition
- Semantics-based function workflow discovery
- Exploring new components in workflows
- Intelligent workflow monitoring based on
provenance data
Select a function and request advice
Function assembly advice
33Non-invisible function discovery
34Towards Service Orientated Paradigm
35When to reason?
Ontologies
Classification
Knowledge consuming application Workflow
construction
Raw function/service/workflow
Discovery consumption mechanisms
Function/Service/ Workflow Annotation
Registration
Publishing
Retrieval
Function/Service/Workflow Repository
36When to reason?
Ontologies
Classification
Knowledge consuming application Workflow
construction
Raw function/service/workflow
Discovery consumption mechanisms
Function/Service/ Workflow Annotation
Registration
Publishing
Retrieval
Function/Service/Workflow Repository
37When to reason?
Ontologies
Classification
Knowledge consuming application Workflow
construction
Raw function/service/workflow
Discovery consumption mechanisms
Function/Service/ Workflow Annotation
Registration
Publishing
Retrieval
Function/Service/Workflow Repository
38Simplifying interfaces
- Creating maintaining the ontology
- Generating Concrete nodes (semantic instances)
- Instantiating abstract nodes defined in ontology
- Filling ontology driven forms with semantic
content
39Remarks and Reflections
- Who is doing the discovering? Is it automated or
manual?
User
Human manual
Machine automatic
Provider
UDDI style advertisements
Weak semantic descriptions Rewriting and
expansion Geodise Ontoview, Pedro tool
Human manual
Syntactic descriptions Semantic mining myGrid
Feta load tool Pegasus, Cardiff
Elaborate Semantic descriptions Simplification
views Geodise Ontoview
Machine auto
40Reflections
- Eager vs Late reasoning
- If people are selecting then you need just enough
semantics for a shortlist - Ontology invisibility
- Painless publication
- The rise of the specialist annotator
- Describing for reuse is challenging
- Reuse depends on costly semantic descriptions
- Describing for someone elses benefit
- Reuse by multiple stakeholders
- Metadata pays off but it needs a network effect
and there is a cost.
41So far, Using Concepts
- Controlled vocabulary for advertisements for
workflows and services - Indexes into registries and mIR
- Semantic discovery of services and workflows
- Semantic discovery of repository entries
- Type management for composition
- Semantic workflow construction guidance and
validation - Navigation paths between data and knowledge
holdings - Semantic glue between repository entries
- Semantic annotation and linking of workflow
provenance logs
42Provenance
- Experiments being performed repeatedly, at
different sites, different times, by different
users or groups
A large repository of records about experiments!!
- verification of data
- recipes for experiment designs
- explanation for the impact of changes
- ownership
- performance of services
- data quality
Scientists
In silico experiments
43Provenance forms
- Derivations
- A path like a workflow, script or query.
- Linking items, usually in a directed graph.
- An explanation of when, who, how something
produced. - Execution Process-centric
- Annotations
- Attached to items or collections of items, in a
structured, semi-structured or free text form. - Annotations on one item or linking items.
- An explanation of why, when, where, who, what,
how. - Data-centric
44A digital lab book for chemists.
45COSHH
46(No Transcript)
47(No Transcript)
48(No Transcript)
49(No Transcript)
50(No Transcript)
51(No Transcript)
52(No Transcript)
53(No Transcript)
54Architecture
Viewing Tools
Sem. Web Apps
RDF over SOAP
Semantic Data
Results Data
55(No Transcript)
56getRecord()
57getObservation()
58RDF
- Common model for metadata
- A graph a set of triples
- Query over
- Link together
- Aggregate
- Integrate
- Avoids pre-commitment
- Self-describing
- Incremental
- Extensible
- RDQL, repositories, integration tools,
presentation tools
Data
Workflow
Experiment
User
Service
Graphic based on Tim Berners-Lee
http//www.w3.org/2003/Talks/0521-www-keynote-tbl/
slide22-0.html
59Bridging islands
Service 1
Service 2
Workflow 1
Experimental Investigation 1
Data 1
60Bridging islands Concepts and LSID
Service 1
Service 2
Workflow 1
RDF
RDF
RDF
RDF
RDF
RDF
Experimental Investigation 1
Data 1
61Provenance Web
62Provenance of data
- Operational execution trail
GeneAC005412.6
SNP000010197
input
output
processstart timeend time
run_for
by_service
urn Clare Jennings
lsidHGVBase_retrieve
63Provenance of knowledge
- Declarative semantic execution trail
contains_single_nucleotide_polymorphism
GeneAC005412.6
SNP000010197
input
output
as stated by
processstart timeend time
run_for
by_service
urn Claire Jennings
lsidHGVBase_retrieve
64Provenance of knowledge
urn Carole Goble
disputed by
contains_single_nucleotide_polymorphism
GeneAC005412.6
SNP000010197
input
output
as stated by
processstart timeend time
run_for
by_service
urn Claire Jennings
lsidHGVBase_retrieve
65Provenance of knowledge
- Aggregation and integration
processstart timeend time
run_for
by_service
urn Bill Jones
lsidBIGDbretrieve
as stated by
contains_single_nucleotide_polymorphism
GeneAC005412.6
SNP000010197
66Provenance
Ontology-aided workflow construction
- RDF-based service and data registries
- RDF-based metadata for experimental components
- RDF-based provenance graphs
- OWL based controlled vocabularies for database
content - OWL based integration of experiment entities
RDF-based semantic mark up of results, logs,
notes, data entries
http//www.mygrid.org.uk
67Aside LSIDs
- urnlsidAuthorityIDNamespaceIDObjectIDRevisio
nID - urnlsidncbi.nlm.nig.govGenBankT486012
- urnlsidebi.ac.ukSWISS-PROT.accessionP343553
- urnlsidrcsb.orgPDB1D4X22
- LSID Designator A mandatory preface that notes
that the item being identified is a life
science-specific resource - Authority Identifier An Internet domain owned by
the organization that assigns an LSID to a
resource - Namespace Identifier The name of the resource
(e.g., a database) chosen by the assigning
organization - Object Identifier The unique name of an item
(e.g., a gene name or a publication tracking
number) as defined within the context of a given
database - Revision Identifier An optional parameter to
keep track of different versions of the same item
68Information Access
LSID aware client
RDF aware client
LSID interface
Query
Publish interface
Metadata Store
Taverna/ Freefluo
MIR Metadata Store RDF
data
Data store
MIR Data store XML
metadata
Query
XML aware client
69Organisation level provenance
Process level provenance
Service
Project
runBye.g. BLAST _at_ NCBI
Experiment design
Process
Workflow design
componentProcesse.g. web service invocation of
BLAST _at_ NCBI
Event
partOf
instanceOf
componentEvente.g. completion of a web service
invocation at 12.04pm
Workflow run
Data/ knowledge level provenance
knowledge statementse.g. similar protein
sequence to
run for
User can add templates to each workflow process
to determine links between data items.
Data item
Person
Organisation
Data item
Data item
data derivation e.g. output data derived from
input data
70Provenance tracking
- Automated generation of this web of links
- Workflow enactor generates
- LSIDs
- Data derivation links
- Knowledge links
- Process links
- Organisation links
Relationship BLAST report has with other items in
the repository
Other classes of information related to BLAST
report
71Haystack (IBM/MIT)
GenBank record
Portion of the Web of provenance
Managing collection of sequences for review
72(No Transcript)
73Provenance metadata
- Outside objects
- RDF store
- Within objects
- LSID metadata.
74Linked Provenance Resources
The subsumed concepts
Link to the log annotated with more general
concept
The subsuming concepts
Link to the log annotated with more specific
concept
75Generating Links
The concept
The generated Link to related provenance
document
The name of the data
76P Afflard et al The Grid(s)? _at_ Novartis presented
at PRISM PharmaGrid retreat, July 2003
77William Pike, Ola Ahlqvist, Mark Gahegan, Sachin
Oswal Supporting Collaborative Science through a
Knowledge and Data Management Portal in 1st
Semantic Web Conference (ISWC2003) Workshop on
Retrieval of Scientific Data, Florida, USA,
October 2003
78Two views of a gravity model conceptfrom
the Hero CODEX web tool
William Pike, Ola Ahlqvist, Mark Gahegan, Sachin
Oswal Supporting Collaborative Science through a
Knowledge and Data Management Portal in 1st
Semantic Web Conference (ISWC2003) Workshop on
Retrieval of Scientific Data, Florida, USA,
October 2003
- a social network reveals which users favour
different instances of the model, with edge
length suggesting the degree of support.
- An ontological description shows how one
geoscientist constructs a model
79Collaboratory for Multi-Scale Chemical Science
CMCS Pedigree Graph portlet showing provenance
relationships between resources (colour coded by
original relationship type).
CMCS Pedigree Browser showing the metadata and
relationships of the selected data set.
80Provenance dimensions connected by concepts and
identifiers
project
Services
Workflow instances
Author
project
workflow template
Based on http//www.w3.org/2003/Talks/0521-www-key
note-tbl/slide22-0.html
81awareness ofcolleagues presence
BuddySpace
Access Grid Node
virtual meetings
mapping real time discussions/group sense making
NetMeeting
recovering information from meetings
enacting decisions/coordinating activities
synthesising artefacts
I-X planning tools
http//www.aktors.org/coakting/ Courtesy of David
De Roure
82GEON Grid Applications
http//www.geongrid.org/
Courtesy Bertram Ludaescher
83http//www.aktors.org/miakt/
84Reflections
- Relationship between RDF and other data models.
- When should we use RDF?
- Scalability of technologies
- Querying over and aggregation of metadata
- How to do it?
- Real examples
- Users viewing complex models
- Domain specific interfaces
85Knowledge Stakeholders
Knowledge for the Grid Applications
Semantics for the Grid
Sources of Knowledge