Title: myGrid
1myGrid
- Robert Stevens
- University of Manchester, UK
- myGrid project
- http//www.mygrid.org.uk
2- Half way through project
- First prototype
3Graves disease
- Autoimmune disease of the thyroid in which the
immune system of an individual attacks cells in
the thyroid gland resulting in hyperthyroidism - Weight loss, trembling, muscle weakness,
increased pulse rate, increased sweating and heat
intolerance, goitre, exophtalmos
4The Biology
- GD caused by the stimulation of the thyrotrophin
receptor by thyroid-stimulating autoantibodies
secreted by lymphocytes of the immune system. - Why is the lymphocyte causing these antibodies
that attack the thyroid cell?
5The Bioinformatics
6Biology Affymetrix microarray studies
What genes are associated with Graves Disease?
Affymetrix data mining tool
Probe IDs
ESTs
Wet-lab biology
8 datasets
A
Gene ID
Extract lymphocyte mRNA
4 patients 4 controls
U95A Affy chips
P
Gene
NCBI
I
What genes are expressed in patient samples but
not in controls, and vice versa?
Candidate gene pool
7Bioinformatics
Peter Li1, Claire Jennings2, Simon Pearce2 and
Anil Wipat1, (2003) 1School of Computing Science
and 2Institute of Human Genetics, University of
Newcastle-upon-Tyne.
Candidate gene pool
Annotation Pipeline
Genotype Assay Design System
3D Protein Structure
What is known about my candidate gene?
What is the structure of the protein product
encoded by my candidate gene?
Is this SNP present in my samples?
Gene ID
Medline
Primer Design
GO
EMBL
Emboss Eprimer application in SoapLab
Use primers designed by myGrid to amplify region
flanking SNP on the gene
SNP
Query
Restriction Fragment Length Polymorphism
experiment
OMIM
BLAST
Selection of restriction enzyme
Talisman
Emboss Restrict in SoapLab
DQP
SN
SNP
P
SN
P
8Integration
- Databases and applications need to be stitched
together
9Workflows are in silico experiments
- ltPicture of a workflow or set of workflows for
the SNPs from Tom.gt
10Experiment Workflows Services (meta)Data
- Discovering services to invoke
- Discovering workflows to enact
- Discovering links between experiments
- Some workflows you wrote, some others wrote
- Publishing new ones, adapting old ones.
- Sharing best practice
- Avoid reinventing wheels
- Services come and go
- Services are not necessarily owned by the user
- Service registration and discovery
11The Experimental process
- Experiment is repeatable, if not reproducible.
- What you did and why explained by provenance
records - Who, what, where, why, when, (w)how?
- The tracability of knowledge as it is evolves and
as it is derived. - Methods in papers.
- A web of experimental material
- input data, data results, intermediate data,
parameter sets, workflow logs, workflow
templates, people, organisations, personal notes
etc.
12A web of info data centric
13An in silico experiment a web of interconnected
information and components
Provenance of the workflow template. Related
workflows.
Ontologies describing workflows
14Data at the centre
Workflows that could use this data
People who have registered an interest in this
data
Related Data
Provenance of the data
Ontologies describing data
15Put the scientist at the centre
Workflows they wrote or used
16This time its personal
- my services
- my favourite services
- my opinion of those services
- my workflows
- my data
- my notes
- my queries
- my logs of what I did
- The events I care about
17myGrid Services
Work bench
Taverna workflow environment
Talisman application
Portal
Gateway
Personalisation
Service and Workflow Discovery
myGrid Information Repository
Provenance mgt
Ontology Mgt
Event Notification
Metadata Mgt
Workflow enactment engine
Distributed Query Processor
Soaplab
Communication fabric
Bio Services
Text Extraction Service AMBIT
Bio Services
18A work bench for demonstrating services
19Taverna workflow development environment
20The services in an architecture
21Architecture
Slide Jump
Knowledge Services
Knowledge Service
Semantic registration
Registry
Registry
Ontology Server
Reasoner
Structural registration
UDDI
Matcher
Service
Registry View
Notification Service
Notification Service
UDDI-M
Service Discovery
JMS
Provenance service
Workflow enactment engine
Build/Edit Workflow
mIR
Test Data
WSFL
Component Discovery
Information Extraction
Distributed Query Processor
Job Execution
mInfo Repository
Workflow templates
Workflow instances
PASTA
Service
Service
Service
Metadata
Concepts
Data
Provenance
SoapLab
DB2
DB2
22myGrid in a nutshell
- An example of a second generation open
service-based Grid project, specifically a
testbed for the OGSI, OGSA and OGSA-DAI base
services - myGrid Information Repository that is OGSA-DAI
compliant - Developing high level services for data intensive
integration, rather than computationally
intensive problems - Workflow distributed query processing
- Developing high level services for e-Science
experimental management - Provenance, change notification and
personalisation - Developing Semantic Grid capabilities and
knowledge-based technologies, such as
semantic-based resource discovery and matching. - Metadata descriptions and ontologies for service
discovery, component discovery and linking
components.
23In silico experiment life cycle
24What myGrid uses
- netBeans
- BioJava
- Soaplab
- LSID implementation
25Finding the services
- The databases and applications required to
integrate
Screen shot of semantic find service listing the
services
26Workflows
- The workflows required to know about
- http//cvs.mygrid.org.uk/scufl/
- Workflow templates
- Workflows dynamically instantiated with services
- Nested, iterative, paths
- Stored in the mIR
- Templates can be anywhere so long as a have URI
- Can be advertised in registries
- Discovered from registries or the mIR
- Workflow enactment engine http//www.mygrid.org.uk
/myGrid/web/components/Workflow/ - Workflow editor
- Taverna available at
http//prdownloads.sourceforge.net/taverna/taverna
-release-0-1-beta-1.tar.gz?download - Scufl
- http//sourceforge.net/project/showfiles.php?group
_id74874release_id159045 - WSFL
27Discovering services and workflows
- Find service and ontologies
- Stuff here.
Screen shot Video of the find service
28Provenance
29Scenario part 1 the annotation pipeline
- Look at workbench ltvideo fragmentgt
- Discover I have been notified ltvideo fragmentgt
- Run a workflow over the data I just got (the set
of affy probe ids) - Workflow wizard
- Discover the workflow
- Enact it
- Monitor workflow
- Be notified that results are returned ltvideo
fragmentgt - Look at provenance of experiment. ltvideo
fragmentgt - Select embl ids and retrieve the record
- Read the flat file.
- Select medline id and do some text extraction
using AMBIT ltvideo fragmentgt - ltendgt
30Scenario part 2
- Assume have done annotation pipeline
- Why is this candidate gene differentially
expressed in GD patients. Is it possible that it
is caused by the presence of a SNP or SNPs. - So run workflow that is about SNP expression wrt
your candidate gene. - Look at workbench ltvideogt
- Discover I have been notified ltvideogt
- Or look in MIR for candidate gene interested in.
- Run a workflow over the data I just got Workflow
wizard - Discover the workflow
- Enact it
- Monitor workflow
- Be notified that results are returned ltvideogt
- Look at provenance of experiment. ltvideogt
- Look at the EMBL record, presented through a
specialist viewer. - Select Medline ids from the EMBL record and do
some text extraction using AMBIT.Or select
medlines ids from results of previous workflow
that was the annotation pipeline workflow. - ltendgt
31Scenario part 3
- Assume have done annotation pipeline
- Why is this candidate gene differentially
expressed in GD patients. Is it possible that it
is caused by the presence of a SNP or SNPs. - So run workflow that is about SNP expression wrt
your candidate gene. - Look at workbench ltvideogt
- Discover I have been notified ltvideogt
- Or look in MIR for candidate gene interested in.
- Run a workflow over the data I just got Workflow
wizard - Discover the workflow
- Enact it
- Monitor workflow
- Be notified that results are returned ltvideogt
- Look at provenance of experiment. ltvideogt
- Look at the EMBL record, presented through a
specialist viewer. - ltendgt
32Where is the Grid?
33Summary
- Service based
- Open
- Free
- Grid-compliant
- Personalised
- Generic
- Bioinformatics
- Semantics
- Available from here.
- 18 months to go.
34Our esteemed scientific colleagues
- Claire and Simon
- Institute of Human Genetics School of Clinical
Medical Sciences - University of Newcastle
35Bioinformaticans
- Peter Li
- Neil Wipat
- Robert Stevens
- Phil Lord
- Martin Senger
- Tom Oinn
36Computer Scientists
37- Comparerestrict workflow as core of poster. Point
to services. Icons for services.
38http//www.mygrid.org.uk/
39Spares
40myGrid
- EPSRC UK e-Science pilot project
- Open Source Upper Middleware for Bioinformatics
- Data intensive not compute intensive
- Sharing knowledge and sharing components
IBM
41Open architecture shared components
- Incorporating third party tools and services
- Working in the public domain consuming public
repositories - SoapLab, a soap-based programmatic interface to
command-line applications - EMBOSS Suite, BLAST, Swiss-Prot, OpenBQS, etc.
300 services - Incorporation of third party tools and
applications - Talisman, a rapid application development tool
for annotation pipelines using by the InterPro
programme - Lab book application to show off myGrid core
components - Graves disease (defective immune system cause of
hyperthyroidis) - Circadian rhythms in Drosophila
42Experiment life cycle
Personalised registries Personalised
workflows Info repository views Personalised
annotations Personalised metadata Security
Resource service discovery Repository
creation Workflow creation Database query
formation
Forming experiments
Personalisation
Discovering and reusing experiments and resources
Executing experiments
Workflow discovery refinement Resource
service discovery Repository creation Provenance
Workflow enactment Distributed Query
processing Job execution Provenance
generation Single sign-on authentican Event
notification
Providing services experiments
Managing experiments
Service registration Workflow deposition Metadata
Annotation Third party registration
Information repository Metadata
management Provenance management Workflow
evolution Event notification
43Workflow
- Workflow enactment engine
- IBMs Web Service
- Flow Language (WSFL)
- Dynamic workflow service invocation and service
discovery - Choose services when running workflow
- Shared development with Comb-e-Chem
- User interactivity during workflow enactment
- Not a batch script!
- Requires user proxies,
- Ontologies for describing and finding workflows
and guiding service composition - Service A outputs compatible with Service B
inputs - Blastn compares a nucleotide query sequence
against a nucleotide sequence database (usually
intelligent misuse of services)
44Notification Personalisation
- Dynamic creation of personal data sets in mIR
- Personal views over repositories.
- Personalisation of workflows.
- Personal notification
- Annotation of datasets and workflows.
- Personalised service registries what I think
the service does, which services can GSK
employees use
- Has PDB changed since I last ran this?
- Has the record I derived my record from changed?
- Has the workflow I adapted my workflow from
changed? - Did the provenance record change?
- Has a service I am using right now gone? Has an
equivalent one sprung up? - Event notification service.
45Information Weaving
- Large amounts of data many applications.
- Highly heterogeneous.
- Different types, algorithms, forms,
implementations, communities, service providers - Highly complex and inter-related.
- Highly volatile.
- Obstacles Everywhere