Workflow in myGrid - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Workflow in myGrid

Description:

Particular thanks to the other members of the Taverna ... Purple: Taylor-made services. Green: Emboss soaplab services. Yellow: Manchester soaplab services ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 35
Provided by: Chris547
Category:
Tags: made | mygrid | taylor | workflow

less

Transcript and Presenter's Notes

Title: Workflow in myGrid


1
Workflow in myGrid
  • Matthew Addis
  • IT Innovation Centre
  • http//www.mygrid.org.uk

2
myGrid team
myGrid is an EPSRC funded UK eScience Program
Pilot Project
Particular thanks to the other members of the
Taverna project, http//taverna.sf.net
3
Application Testbeds
  • Graves Disease
  • Simon Pearce and Claire Jennings, Institute of
    Human Genetics School of Clinical Medical
    Sciences, University of Newcastle
  • Autoimmune disease of the thyroid
  • Discover all you can about a gene Affymetrix
    microarray analysis, Gene annotation
  • Services from Japan, Hong Kong, various sites in
    UK
  • Williams-Beuren Syndrome
  • Hannah Tipney, May Tassabehji, Andy Brass, St
    Marys Hospital, Manchester, UK
  • Microdeletion of 155 Mbases on Chromosome 7
  • Characterise an unknown gene Gene alerting
    service, gene and protein annotation
  • Services from USA, Japan, various sites in UK
  • Trypanosomiasis in cattle
  • Steve Kemp, University of Liverpool, UK
  • Annotation pipelines and Gene expression analysis
    Services from USA, Japan, various sites in UK

4
What people do nowPoint, click, cut, paste
Slide courtesy of GSK
ID MURA_BACSU STANDARD PRT 429
AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE
1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7)
(ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMI
NE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA
OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA
FIRMICUTES BACILLUS/CLOSTRIDIUM GROUP
BACILLACEAE OC BACILLUS. KW PEPTIDOGLYCAN
SYNTHESIS CELL WALL TRANSFERASE. FT ACT_SITE
116 116 BINDS PEP (BY SIMILARITY). FT
CONFLICT 374 374 S -gt A (IN REF.
3). SQ SEQUENCE 429 AA 46016 MW 02018C5C
CRC32 MEKLNIAGGD SLNGTVHISG AKNSAVALIP
ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE
MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI
GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER
LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE
IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP
DRIEAGTFMI
5
What people would like to do
Forming experiments
Personalisation
Discovering and reusing experiments and resources
Executing and monitoring experiments
Managing lifecycle, provenance and results of
experiments
Sharing services experiments
Soaplab
6
WBS Workflows
Query nucleotide sequence
ncbiBlastWrapper
RepeatMasker
The problem (1)
Pink Outputs/inputs of a service Purple
Taylor-made services Green Emboss soaplab
services Yellow Manchester soaplab services
Grey Unknowns
GenBank Accession No
URL inc GB identifier
Translation/sequence file. Good for records and
publications
prettyseq
GenBank Entry
Amino Acid translation
Sort for appropriate Sequences only
Identifies PEST seq
epestfind
6 ORFs
Seqret
Identifies FingerPRINTS
pscan
MW, length, charge, pI, etc
Nucleotide seq (Fasta)
pepstats
sixpack
ORFs
transeq
Predicts Coiled-coil regions
RepeatMasker
pepcoil
tblastn Vs nr, est, est_mouse, est_human
databases. Blastp Vs nr
GenScan
Coding sequence
ncbiBlastWrapper
Restriction enzyme map
restrict
SignalP TargetP PSORTII
Predicts cellular location
CpG Island locations and
cpgreport
InterPro PFAM Prosite Smart
Identifies functional and structural
domains/motifs
RepeatMasker
Repetative elements
Hydrophobic regions
Pepwindow? Octanol?
Blastn Vs nr, est databases.
ncbiBlastWrapper
7
The problem (2)
  • Two major steps
  • Extend into the gap Similarity searches
    RepeatMasker, BLAST
  • Characterise the new sequence NIX, Interpro,
    etc
  • Numerous web-based services (i.e. BLAST,
    RepeatMasker)
  • Cutting and pasting between screens
  • Large number of steps
  • Frequently repeated info now rapidly added to
    public databases
  • Dont always get results
  • Time consuming
  • Huge amount of interrelated data is produced
    handled in lab book and files saved to local hard
    drive
  • Mundane
  • Much knowledge remains undocumented
  • Bioinformatician does the analysis

8
The problem (3)
  • Cant assume or rely on anything
  • Built a type-agnostic workflow engine with a
    domain neutral information model. The semantics
    are in the ontology
  • Services in the wild are rare, awful and
    unreliable
  • Significant time take to wrap applications as web
    services (licensing, installation, maintenance)
    Soaplab and Gowlab try to help
  • Enactor fault tolerance and type tolerance.
  • SHIM services essential

9
The myGrid solution (20,000 feet)
Semantic Discovery Registration
Provenance and Data browser Haystack or Portal
Taverna Workbench Portal
View Service
LSID Authority
UDDI
mIR data
Freefluo Workflow Engine
Store Service
mIR metadata
Web services, local tools User interaction etc.
Event Notification Service
10
Approach
  • Innovative work
  • Service and workflow registration
  • Semantic discovery
  • Provenance management
  • Text mining
  • Core functionality
  • Services Soaplab and Gowlab
  • Workflow enactment engine Freefluo
  • Workflow workbench Taverna
  • Data integration OGSADQP
  • Information model management
  • Mediator
  • In between
  • Event notification

11
Workflow environment
  • Freefluo workflow enactment engine
  • http//freefluo.sourceforge.net
  • Taverna development and execution environment
  • http//taverna.sourceforge.net
  • Simple conceptual unified flow language (XScufl)
    wraps up units of activity
  • own open source development community

12
Taverna/FreeFluo
  • Implicit iteration and data flow
  • Data sets and nested flows
  • Configurable failure handling
  • Life Science Id resolution
  • Provenance and status reporting
  • Permissive best effort type management
  • Plug-in framework
  • Graphical display
  • Event notification
  • Data entry wizard
  • Libraries of SHIM services
  • Libraries of workflows

13
Domain Services
  • Native WSDL Web services
  • DDBJ, NCBI BLAST, PathPort, BioMOBY
  • Some easy to use, some not!
  • Wrapped legacy services
  • SoapLab and GowLab
  • Leveraged the EMBOSS Suite and others
  • 300 services
  • Lots of them including redundant services
  • Firewalls and licensing
  • Semantic annotation for discovery and (re)use

14
SHIM Services
  • Services that enable domain services to fit
    together
  • Outnumber domain services
  • Libraries
  • Candidates for automatic selection, composition
    and substitution

Main Bioinformatics Applications
Main Bioinformatics Services
Main Bioinformatics Application
Main Bioinformatics Application
SHIM Services
15
Drag and drop WSDL
  • Scavenge!
  • Seek the WSDL
  • Parse it
  • Incorporating a service into Taverna
    straightforward

16
(No Transcript)
17
Running a workflow
18
Intermediate Results
19
Workflow results
20
Supporting the scientist
  • Easy to use workflow tools arent enough!
  • Users need to be able to
  • Change their way of working for experiment design
    and results management
  • Record what theyve done, why, how, when and
    where
  • Work effectively in context of the various and
    often changing people, processes, and services in
    their domain

21
Results Amplification
  • Automated workflows produce lots of heterogeneous
    data
  • The workflows changed how our bioinformaticans
    work.
  • Before analyse results as go along
  • After all results, and hence all the analysis,
    in one go
  • So linking intermediate results important
  • Intermediate results management and associated
    provenance management essential LSID/RDF

One input
Many outputs
22
Life Science IDs
  • Used throughout myGrid as an object naming
    device.
  • myGrid Repository acts an LSID Authority
  • Universal access to results for collaboration, as
    well as for review.
  • RDFLSID explains the context of results, and
    provides guidance for further investigations.
  • LSID provides a uniform naming scheme.
  • LSID Resolver guarantees to resolve to same data
    object.
  • LSID Authority dishes them out.
  • Returns metadata of object -gt RDF

http//www.i3c.org/wgr/ta/resources/lsid/docs/
23
Provenance tracking
  • Automated generation of this web of links
  • Workflow enactor generates
  • LSIDs
  • Data derivation links
  • Knowledge links
  • Process links
  • Organisation links

Relationship BLAST report has with other items in
the repository
Other classes of information related to BLAST
report
24
Organisation level provenance
Process level provenance
Service
Project
runBye.g. BLAST _at_ NCBI
Experiment design
Process
Workflow design
componentProcesse.g. web service invocation of
BLAST _at_ NCBI
Event
partOf
instanceOf
componentEvente.g. completion of a web service
invocation at 12.04pm
Workflow run
Data/ knowledge level provenance
knowledge statementse.g. similar protein
sequence to
run for
User can add templates to each workflow process
to determine links between data items.
Data item
Person
Organisation
Data item
Data item
data derivation e.g. output data derived from
input data
25
(No Transcript)
26
Map of Context
Literature relevant to provenance study or data
in this workflow
Provenance record of a workflow run
Interlinking graph of the workflow that generates
the provenance logs
Web page of people who has related interests as
the owner of the workflow
Experiment Notes
27
3. Virtual organisations
Service Platform Administrators
Bioinformaticians
Service Providers
4. Reuse
Annotation providers
Biologists
Tool middleware developers
28
Semantics
Ontology-aided workflow construction
  • RDF-based service and data registries
  • RDF-based metadata for ALL experimental
    components
  • RDF-based provenance graphs
  • OWL based controlled vocabularies for database
    content
  • OWL based integration of experiment entities

RDF-based semantic mark up of results, logs,
notes, data entries
29
Semantic discovery
  • User chooses services
  • A common ontology is used to annotate and query
    any myGrid object including services.
  • Service ontology
  • Discover workflows and services described in the
    registry via Taverna.
  • Look for all workflows that accept an input of
    semantic type nucleotide sequence.

30
Conclusion
  • User-driven scenarios have been essential
  • myGrid provides an end to end solution
  • Lifecycle of the users experiment
  • Communities of user and service providers
  • Scientific process
  • Workflow
  • Service/workflow tools need to be usable by
    scientists
  • Hide complexity, provide layers of abstraction
  • Accommodate services as is using shims,
    annotations, and plug-ins
  • Record provenance
  • Give Bioinformaticans control
  • User specified processes, services, alternatives,
    retries

31
myGrid People
  • Core
  • Matthew Addis, Nedim Alpdemir, Tim Carver, Rich
    Cawley, Neil Davis, Alvaro Fernandes, Justin
    Ferris, Robert Gaizaukaus, Kevin Glover, Carole
    Goble, Chris Greenhalgh, Mark Greenwood, Yikun
    Guo, Ananth Krishna, Peter Li, Phillip Lord,
    Darren Marvin, Simon Miles, Luc Moreau, Arijit
    Mukherjee, Tom Oinn, Juri Papay, Savas
    Parastatidis, Norman Paton, Terry Payne, Matthew
    Pokock Milena Radenkovic, Stefan
    Rennick-Egglestone, Peter Rice, Martin Senger,
    Nick Sharman, Robert Stevens, Victor Tan, Anil
    Wipat, Paul Watson and Chris Wroe.
  • Users
  • Simon Pearce and Claire Jennings, Institute of
    Human Genetics School of Clinical Medical
    Sciences, University of Newcastle, UK
  • Hannah Tipney, May Tassabehji, Andy Brass, St
    Marys Hospital, Manchester, UK
  • Steve Kemp, Liverpool, UK
  • Postgraduates
  • Martin Szomszor, Duncan Hull, Jun Zhao, Pinar
    Alper, John Dickman, Keith Flanagan, Antoon
    Goderis, Tracy Craddock, Alastair Hampshire
  • Industrial
  • Dennis Quan, Sean Martin, Michael Niemi, Syd
    Chapman (IBM)
  • Robin McEntire (GSK)
  • Collaborators
  • Keith Decker

32
http//www.mygrid.org.uk
33
(No Transcript)
34
Taverna architecture
Write a Comment
User Comments (0)
About PowerShow.com