myGrid - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

myGrid

Description:

AMBIT. Determine whether coding SNP. affects the active site of the protein ... Select Medline ids from the EMBL record and do some text extraction using AMBIT. ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 45
Provided by: Caro110
Category:
Tags: mygrid

less

Transcript and Presenter's Notes

Title: myGrid


1
myGrid
  • Robert Stevens
  • University of Manchester, UK
  • myGrid project
  • http//www.mygrid.org.uk

2
  • Half way through project
  • First prototype

3
Graves disease
  • Autoimmune disease of the thyroid in which the
    immune system of an individual attacks cells in
    the thyroid gland resulting in hyperthyroidism
  • Weight loss, trembling, muscle weakness,
    increased pulse rate, increased sweating and heat
    intolerance, goitre, exophtalmos

4
The Biology
  • GD caused by the stimulation of the thyrotrophin
    receptor by thyroid-stimulating autoantibodies
    secreted by lymphocytes of the immune system.
  • Why is the lymphocyte causing these antibodies
    that attack the thyroid cell?

5
The Bioinformatics
6
Biology Affymetrix microarray studies
What genes are associated with Graves Disease?
Affymetrix data mining tool
Probe IDs
ESTs
Wet-lab biology
8 datasets
A
Gene ID
Extract lymphocyte mRNA
4 patients 4 controls
U95A Affy chips
P
Gene
NCBI
I
What genes are expressed in patient samples but
not in controls, and vice versa?
Candidate gene pool
7
Bioinformatics
Peter Li1, Claire Jennings2, Simon Pearce2 and
Anil Wipat1, (2003) 1School of Computing Science
and 2Institute of Human Genetics, University of
Newcastle-upon-Tyne.
Candidate gene pool
Annotation Pipeline
Genotype Assay Design System
3D Protein Structure
What is known about my candidate gene?
What is the structure of the protein product
encoded by my candidate gene?
Is this SNP present in my samples?
Gene ID
Medline
Primer Design
GO
EMBL
Emboss Eprimer application in SoapLab
Use primers designed by myGrid to amplify region
flanking SNP on the gene
SNP
Query
Restriction Fragment Length Polymorphism
experiment
OMIM
BLAST
Selection of restriction enzyme
Talisman
Emboss Restrict in SoapLab
DQP
SN
SNP
P
SN
P
8
Integration
  • Databases and applications need to be stitched
    together

9
Workflows are in silico experiments
  • ltPicture of a workflow or set of workflows for
    the SNPs from Tom.gt

10
Experiment Workflows Services (meta)Data
  • Discovering services to invoke
  • Discovering workflows to enact
  • Discovering links between experiments
  • Some workflows you wrote, some others wrote
  • Publishing new ones, adapting old ones.
  • Sharing best practice
  • Avoid reinventing wheels
  • Services come and go
  • Services are not necessarily owned by the user
  • Service registration and discovery

11
The Experimental process
  • Experiment is repeatable, if not reproducible.
  • What you did and why explained by provenance
    records
  • Who, what, where, why, when, (w)how?
  • The tracability of knowledge as it is evolves and
    as it is derived.
  • Methods in papers.
  • A web of experimental material
  • input data, data results, intermediate data,
    parameter sets, workflow logs, workflow
    templates, people, organisations, personal notes
    etc.

12
A web of info data centric
  • Data centric

13
An in silico experiment a web of interconnected
information and components
Provenance of the workflow template. Related
workflows.
Ontologies describing workflows
14
Data at the centre
Workflows that could use this data
People who have registered an interest in this
data
Related Data
Provenance of the data
Ontologies describing data
15
Put the scientist at the centre
Workflows they wrote or used
16
This time its personal
  • my services
  • my favourite services
  • my opinion of those services
  • my workflows
  • my data
  • my notes
  • my queries
  • my logs of what I did
  • The events I care about

17
myGrid Services
Work bench
Taverna workflow environment
Talisman application
Portal
Gateway

Personalisation
Service and Workflow Discovery
myGrid Information Repository
Provenance mgt
Ontology Mgt
Event Notification
Metadata Mgt
Workflow enactment engine
Distributed Query Processor
Soaplab
Communication fabric
Bio Services
Text Extraction Service AMBIT
Bio Services
18
A work bench for demonstrating services
19
Taverna workflow development environment
20
The services in an architecture
21
Architecture
Slide Jump
Knowledge Services
Knowledge Service
Semantic registration
Registry
Registry
Ontology Server
Reasoner
Structural registration
UDDI
Matcher
Service
Registry View
Notification Service
Notification Service
UDDI-M
Service Discovery
JMS
Provenance service
Workflow enactment engine
Build/Edit Workflow
mIR
Test Data
WSFL
Component Discovery
Information Extraction
Distributed Query Processor
Job Execution
mInfo Repository
Workflow templates
Workflow instances
PASTA
Service
Service
Service
Metadata
Concepts
Data
Provenance
SoapLab
DB2
DB2
22
myGrid in a nutshell
  • An example of a second generation open
    service-based Grid project, specifically a
    testbed for the OGSI, OGSA and OGSA-DAI base
    services
  • myGrid Information Repository that is OGSA-DAI
    compliant
  • Developing high level services for data intensive
    integration, rather than computationally
    intensive problems
  • Workflow distributed query processing
  • Developing high level services for e-Science
    experimental management
  • Provenance, change notification and
    personalisation
  • Developing Semantic Grid capabilities and
    knowledge-based technologies, such as
    semantic-based resource discovery and matching.
  • Metadata descriptions and ontologies for service
    discovery, component discovery and linking
    components.

23
In silico experiment life cycle
24
What myGrid uses
  • netBeans
  • BioJava
  • Soaplab
  • LSID implementation

25
Finding the services
  • The databases and applications required to
    integrate

Screen shot of semantic find service listing the
services
26
Workflows
  • The workflows required to know about
  • http//cvs.mygrid.org.uk/scufl/
  • Workflow templates
  • Workflows dynamically instantiated with services
  • Nested, iterative, paths
  • Stored in the mIR
  • Templates can be anywhere so long as a have URI
  • Can be advertised in registries
  • Discovered from registries or the mIR
  • Workflow enactment engine http//www.mygrid.org.uk
    /myGrid/web/components/Workflow/
  • Workflow editor
  • Taverna available at
    http//prdownloads.sourceforge.net/taverna/taverna
    -release-0-1-beta-1.tar.gz?download
  • Scufl
  • http//sourceforge.net/project/showfiles.php?group
    _id74874release_id159045
  • WSFL

27
Discovering services and workflows
  • Find service and ontologies
  • Stuff here.

Screen shot Video of the find service
28
Provenance
29
Scenario part 1 the annotation pipeline
  • Look at workbench ltvideo fragmentgt
  • Discover I have been notified ltvideo fragmentgt
  • Run a workflow over the data I just got (the set
    of affy probe ids)
  • Workflow wizard
  • Discover the workflow
  • Enact it
  • Monitor workflow
  • Be notified that results are returned ltvideo
    fragmentgt
  • Look at provenance of experiment. ltvideo
    fragmentgt
  • Select embl ids and retrieve the record
  • Read the flat file.
  • Select medline id and do some text extraction
    using AMBIT ltvideo fragmentgt
  • ltendgt

30
Scenario part 2
  • Assume have done annotation pipeline
  • Why is this candidate gene differentially
    expressed in GD patients. Is it possible that it
    is caused by the presence of a SNP or SNPs.
  • So run workflow that is about SNP expression wrt
    your candidate gene.
  • Look at workbench ltvideogt
  • Discover I have been notified ltvideogt
  • Or look in MIR for candidate gene interested in.
  • Run a workflow over the data I just got Workflow
    wizard
  • Discover the workflow
  • Enact it
  • Monitor workflow
  • Be notified that results are returned ltvideogt
  • Look at provenance of experiment. ltvideogt
  • Look at the EMBL record, presented through a
    specialist viewer.
  • Select Medline ids from the EMBL record and do
    some text extraction using AMBIT.Or select
    medlines ids from results of previous workflow
    that was the annotation pipeline workflow.
  • ltendgt

31
Scenario part 3
  • Assume have done annotation pipeline
  • Why is this candidate gene differentially
    expressed in GD patients. Is it possible that it
    is caused by the presence of a SNP or SNPs.
  • So run workflow that is about SNP expression wrt
    your candidate gene.
  • Look at workbench ltvideogt
  • Discover I have been notified ltvideogt
  • Or look in MIR for candidate gene interested in.
  • Run a workflow over the data I just got Workflow
    wizard
  • Discover the workflow
  • Enact it
  • Monitor workflow
  • Be notified that results are returned ltvideogt
  • Look at provenance of experiment. ltvideogt
  • Look at the EMBL record, presented through a
    specialist viewer.
  • ltendgt

32
Where is the Grid?
  • Some words here

33
Summary
  • Service based
  • Open
  • Free
  • Grid-compliant
  • Personalised
  • Generic
  • Bioinformatics
  • Semantics
  • Available from here.
  • 18 months to go.

34
Our esteemed scientific colleagues
  • Claire and Simon
  • Institute of Human Genetics School of Clinical
    Medical Sciences
  • University of Newcastle

35
Bioinformaticans
  • Peter Li
  • Neil Wipat
  • Robert Stevens
  • Phil Lord
  • Martin Senger
  • Tom Oinn

36
Computer Scientists
  • More people

37
  • Comparerestrict workflow as core of poster. Point
    to services. Icons for services.

38
http//www.mygrid.org.uk/
39
Spares
40
myGrid
  • EPSRC UK e-Science pilot project
  • Open Source Upper Middleware for Bioinformatics
  • Data intensive not compute intensive
  • Sharing knowledge and sharing components

IBM
41
Open architecture shared components
  • Incorporating third party tools and services
  • Working in the public domain consuming public
    repositories
  • SoapLab, a soap-based programmatic interface to
    command-line applications
  • EMBOSS Suite, BLAST, Swiss-Prot, OpenBQS, etc.
    300 services
  • Incorporation of third party tools and
    applications
  • Talisman, a rapid application development tool
    for annotation pipelines using by the InterPro
    programme
  • Lab book application to show off myGrid core
    components
  • Graves disease (defective immune system cause of
    hyperthyroidis)
  • Circadian rhythms in Drosophila

42
Experiment life cycle
Personalised registries Personalised
workflows Info repository views Personalised
annotations Personalised metadata Security
Resource service discovery Repository
creation Workflow creation Database query
formation
Forming experiments
Personalisation
Discovering and reusing experiments and resources
Executing experiments
Workflow discovery refinement Resource
service discovery Repository creation Provenance
Workflow enactment Distributed Query
processing Job execution Provenance
generation Single sign-on authentican Event
notification
Providing services experiments
Managing experiments
Service registration Workflow deposition Metadata
Annotation Third party registration
Information repository Metadata
management Provenance management Workflow
evolution Event notification
43
Workflow
  • Workflow enactment engine
  • IBMs Web Service
  • Flow Language (WSFL)
  • Dynamic workflow service invocation and service
    discovery
  • Choose services when running workflow
  • Shared development with Comb-e-Chem
  • User interactivity during workflow enactment
  • Not a batch script!
  • Requires user proxies,
  • Ontologies for describing and finding workflows
    and guiding service composition
  • Service A outputs compatible with Service B
    inputs
  • Blastn compares a nucleotide query sequence
    against a nucleotide sequence database (usually
    intelligent misuse of services)

44
Notification Personalisation
  • Dynamic creation of personal data sets in mIR
  • Personal views over repositories.
  • Personalisation of workflows.
  • Personal notification
  • Annotation of datasets and workflows.
  • Personalised service registries what I think
    the service does, which services can GSK
    employees use
  • Has PDB changed since I last ran this?
  • Has the record I derived my record from changed?
  • Has the workflow I adapted my workflow from
    changed?
  • Did the provenance record change?
  • Has a service I am using right now gone? Has an
    equivalent one sprung up?
  • Event notification service.

45
Information Weaving
  • Large amounts of data many applications.
  • Highly heterogeneous.
  • Different types, algorithms, forms,
    implementations, communities, service providers
  • Highly complex and inter-related.
  • Highly volatile.
  • Obstacles Everywhere
Write a Comment
User Comments (0)
About PowerShow.com