Title: Web Services, Workflows
1Web Services, Workflows Taverna
- Superglue for the Semantic Web
- Tom Oinn EMBL-EBI,
- tmo_at_ebi.ac.uk
http//mygrid.org.uk http//taverna.sf.net
2Who are we?
- myGrid
- An EPSRC funded eScience Pilot Project
- Based across multiple sites in the UK
- Taverna
- A tethered spin-off of the myGrid project
- Aimed at producing powerful tools to complement
the basic research work
EBI Hinxton Campus
3What is Taverna?
- Allows scientists to graphically construct
complex processes in the form of workflows - What is a workflow?
- Set of activities that make up a process
- Definitions about how data moves between these
activities - The user specifies what to do but not how to do
it - Insulates users from the complexity of
distributed computing
4Looks a bit like this
5myGrid, Taverna and WBS
- One of several early adopters of Taverna
- Manchester based group working on Williams-Beuren
Syndrome in the medical genetics department - Workflows written by life scientists not computer
scientists ? - Following slides stolen at the last minute from
Hannah Tipney at Manchester!
6Williams-Beuren Syndrome (WBS)
- Contiguous sporadic gene deletion disorder
- 1/20,000 live births, caused by unequal crossover
(homologous recombination) during meiosis - Haploinsufficiency of the region results in the
phenotype - Multisystem phenotype muscular, nervous,
circulatory systems - Characteristic facial features
- Unique cognitive profile
- Mental retardation (IQ 40-100, mean60, normal
mean 100 ) - Outgoing personality, friendly nature, charming
7Williams-Beuren Syndrome Microdeletion
POM121
C-cen
Eicher E, Clark R She, X An Assessment of the
Sequence Gaps Unfinished Business in a Finished
Human Genome. Nature Genetics Reviews (2004)
5345-354 Hillier L et al. The DNA Sequence of
Human Chromosome 7. Nature (2003) 424157-164
NOLR1
A-cen
FKBP6
B-cen
FZD9
C-mid
BAZ1B
BCL7B
TBL2
WBSCR14
WBSCR18
WBSCR22
STX1A
WBSCR21
CLDN3
CLDN4
ELN
LIMK1
WBSCR1/E1f4H
WBSCR5/LAB
RFC2
B-mid
CYLN2
A-mid
GTF2IRD1
B-tel
GTF2I
A-tel
NCF1
C-tel
GTF2IRD2
8Experiment
RepeatMasker
BLASTwrapper
GenBank Accession No
Promotor Prediction
URL inc GB identifier
TF binding Prediction
Translation/sequence file. Good for records and
publications
prettyseq
Regulation Element Prediction
GenBank Entry
Amino Acid translation
Sort for appropriate Sequences only
Identifies PEST seq
epestfind
Identify regulatory elements in genomic sequence
Seqret
Identifies FingerPRINTS
pscan
MW, length, charge, pI, etc
Nucleotide seq (Fasta)
pepstats
6 ORFs
Predicts Coiled-coil regions
RepeatMasker
pepcoil
tblastn Vs nr, est, est_mouse, est_human
databases. Blastp Vs nr
GenScan
Coding sequence
BlastWrapper
Restriction enzyme map
restrict
SignalP TargetP PSORTII
sixpack
Predicts cellular location
transeq
CpG Island locations and
cpgreport
Identifies functional and structural
domains/motifs
InterPro
RepeatMasker
Repetitive elements
ORFs
Hydrophobic regions
Pepwindow? Octanol?
Blastn Vs nr, est databases.
ncbiBlastWrapper
9Analysis via Cut and Paste
10Workflows
A
B
C
A Identification of overlapping sequence B
Characterisation of nucleotide sequence C
Characterisation of protein sequence
11The Biological Results
Four workflow cycles totalling 10 hours The gap
was correctly closed and all known features
identified
WBSCR14
ELN
CTA-315H11
CTB-51J22
12Different Kinds of Services
- Pure web services are not always the solution
- Abstraction Level?
- Typing?
- Description?
- Data Volumes?
- Taverna employs a hybrid architecture which
includes web services amongst other components
13Complex Invocation Patterns
- E.g. Soaplab has a typical factory pattern
create job, set parameter, run task,
wait, get results, destroy task. - Multiple web service calls per conceptual
operation - Handled in Taverna by embedding this invocation
pattern within a Soaplab processor.
14Large Data Sets
- No explicit limit to message size in WS specs
but - Most common toolkits equally terrible at handling
large data. - WS Standards for bulk data transfer
insufficiently mature or lacking
interoperability. - Transfer references across WS calls, transfer
actual data out of band - More info from Jon later, handled in Taverna via
a Styx Grid Service plugin.
15Service Description
- WS standards fail to address the description of a
service. - Registries UDDI is an old standard and predates
work on semantic description - BioMoby and myGrid include Semantic Description
and Discovery components. - Search for services by task, by input or by past
involvement in another workflow - Essential for AI assisted workflow construction
16Multiple Service Types
BioMoby (orange), Soaplab (wheat), Workflow
(red), SOAP Service (green), SeqHound (blue),
Local Java operation (purple), String constant
(pale blue)
17Taverna Demo
- There should be a live demo of the Workflow
Workbench here
18Obtaining Taverna
- Taverna is available under the LGPL from our
project site on Sourceforge.net - http//taverna.sourceforge.net
- Release 1.0 as of the 20th Jan 2005 (after twelve
beta releases) - Includes online and downloadable user manual,
examples etc. - Support via project mailing lists
19myGrid and WBS People!
- Core
- Matthew Addis, Nedim Alpdemir, Tim Carver, Rich
Cawley, Neil Davis, Alvaro Fernandes, Justin
Ferris, Robert Gaizaukaus, Kevin Glover, Carole
Goble, Chris Greenhalgh, Mark Greenwood, Yikun
Guo, Ananth Krishna, Peter Li, Phillip Lord,
Darren Marvin, Simon Miles, Luc Moreau, Arijit
Mukherjee, Tom Oinn, Juri Papay, Savas
Parastatidis, Norman Paton, Terry Payne, Matthew
Pockock Milena Radenkovic, Stefan
Rennick-Egglestone, Peter Rice, Martin Senger,
Nick Sharman, Robert Stevens, Victor Tan, Anil
Wipat, Paul Watson and Chris Wroe. - Users
- Simon Pearce and Claire Jennings, Institute of
Human Genetics School of Clinical Medical
Sciences, University of Newcastle, UK - Hannah Tipney, May Tassabehji, Andy Brass, St
Marys Hospital, Manchester, UK - Postgraduates
- Martin Szomszor, Duncan Hull, Jun Zhao, Pinar
Alper, John Dickman, Keith Flanagan, Antoon
Goderis, Tracy Craddock, Alastair Hampshire - Industrial
- Dennis Quan, Sean Martin, Michael Niemi, Syd
Chapman (IBM) - Robin McEntire (GSK)
- Collaborators
- Keith Decker
20Acknowledgements
myGrid is an EPSRC funded UK eScience Program
Pilot Project
Particular thanks to the other members of the
Taverna project, http//taverna.sf.net