Web Services, Workflows - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Web Services, Workflows

Description:

Web Services, Workflows & Taverna. Superglue for the Semantic Web ... Outgoing personality, friendly nature, charming' Chr 7 ~155 Mb ~1.5 Mb. 7q11.23. C-cen ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 21
Provided by: tomo160
Category:

less

Transcript and Presenter's Notes

Title: Web Services, Workflows


1
Web Services, Workflows Taverna
  • Superglue for the Semantic Web
  • Tom Oinn EMBL-EBI,
  • tmo_at_ebi.ac.uk

http//mygrid.org.uk http//taverna.sf.net
2
Who are we?
  • myGrid
  • An EPSRC funded eScience Pilot Project
  • Based across multiple sites in the UK
  • Taverna
  • A tethered spin-off of the myGrid project
  • Aimed at producing powerful tools to complement
    the basic research work

EBI Hinxton Campus
3
What is Taverna?
  • Allows scientists to graphically construct
    complex processes in the form of workflows
  • What is a workflow?
  • Set of activities that make up a process
  • Definitions about how data moves between these
    activities
  • The user specifies what to do but not how to do
    it
  • Insulates users from the complexity of
    distributed computing

4
Looks a bit like this
5
myGrid, Taverna and WBS
  • One of several early adopters of Taverna
  • Manchester based group working on Williams-Beuren
    Syndrome in the medical genetics department
  • Workflows written by life scientists not computer
    scientists ?
  • Following slides stolen at the last minute from
    Hannah Tipney at Manchester!

6
Williams-Beuren Syndrome (WBS)
  • Contiguous sporadic gene deletion disorder
  • 1/20,000 live births, caused by unequal crossover
    (homologous recombination) during meiosis
  • Haploinsufficiency of the region results in the
    phenotype
  • Multisystem phenotype muscular, nervous,
    circulatory systems
  • Characteristic facial features
  • Unique cognitive profile
  • Mental retardation (IQ 40-100, mean60, normal
    mean 100 )
  • Outgoing personality, friendly nature, charming

7
Williams-Beuren Syndrome Microdeletion
POM121
C-cen
Eicher E, Clark R She, X An Assessment of the
Sequence Gaps Unfinished Business in a Finished
Human Genome. Nature Genetics Reviews (2004)
5345-354 Hillier L et al. The DNA Sequence of
Human Chromosome 7. Nature (2003) 424157-164
NOLR1
A-cen
FKBP6
B-cen
FZD9
C-mid
BAZ1B
BCL7B
TBL2
WBSCR14
WBSCR18
WBSCR22
STX1A
WBSCR21
CLDN3
CLDN4
ELN
LIMK1
WBSCR1/E1f4H
WBSCR5/LAB
RFC2
B-mid
CYLN2
A-mid
GTF2IRD1
B-tel
GTF2I
A-tel
NCF1
C-tel
GTF2IRD2
8
Experiment
RepeatMasker
BLASTwrapper
GenBank Accession No
Promotor Prediction
URL inc GB identifier
TF binding Prediction
Translation/sequence file. Good for records and
publications
prettyseq
Regulation Element Prediction
GenBank Entry
Amino Acid translation
Sort for appropriate Sequences only
Identifies PEST seq
epestfind
Identify regulatory elements in genomic sequence
Seqret
Identifies FingerPRINTS
pscan
MW, length, charge, pI, etc
Nucleotide seq (Fasta)
pepstats
6 ORFs
Predicts Coiled-coil regions
RepeatMasker
pepcoil
tblastn Vs nr, est, est_mouse, est_human
databases. Blastp Vs nr
GenScan
Coding sequence
BlastWrapper
Restriction enzyme map
restrict
SignalP TargetP PSORTII
sixpack
Predicts cellular location
transeq
CpG Island locations and
cpgreport
Identifies functional and structural
domains/motifs
InterPro
RepeatMasker
Repetitive elements
ORFs
Hydrophobic regions
Pepwindow? Octanol?
Blastn Vs nr, est databases.
ncbiBlastWrapper
9
Analysis via Cut and Paste
10
Workflows
A
B
C
A Identification of overlapping sequence B
Characterisation of nucleotide sequence C
Characterisation of protein sequence
11
The Biological Results
Four workflow cycles totalling 10 hours The gap
was correctly closed and all known features
identified
WBSCR14
ELN

CTA-315H11
CTB-51J22
12
Different Kinds of Services
  • Pure web services are not always the solution
  • Abstraction Level?
  • Typing?
  • Description?
  • Data Volumes?
  • Taverna employs a hybrid architecture which
    includes web services amongst other components

13
Complex Invocation Patterns
  • E.g. Soaplab has a typical factory pattern
    create job, set parameter, run task,
    wait, get results, destroy task.
  • Multiple web service calls per conceptual
    operation
  • Handled in Taverna by embedding this invocation
    pattern within a Soaplab processor.

14
Large Data Sets
  • No explicit limit to message size in WS specs
    but
  • Most common toolkits equally terrible at handling
    large data.
  • WS Standards for bulk data transfer
    insufficiently mature or lacking
    interoperability.
  • Transfer references across WS calls, transfer
    actual data out of band
  • More info from Jon later, handled in Taverna via
    a Styx Grid Service plugin.

15
Service Description
  • WS standards fail to address the description of a
    service.
  • Registries UDDI is an old standard and predates
    work on semantic description
  • BioMoby and myGrid include Semantic Description
    and Discovery components.
  • Search for services by task, by input or by past
    involvement in another workflow
  • Essential for AI assisted workflow construction

16
Multiple Service Types
BioMoby (orange), Soaplab (wheat), Workflow
(red), SOAP Service (green), SeqHound (blue),
Local Java operation (purple), String constant
(pale blue)
17
Taverna Demo
  • There should be a live demo of the Workflow
    Workbench here

18
Obtaining Taverna
  • Taverna is available under the LGPL from our
    project site on Sourceforge.net
  • http//taverna.sourceforge.net
  • Release 1.0 as of the 20th Jan 2005 (after twelve
    beta releases)
  • Includes online and downloadable user manual,
    examples etc.
  • Support via project mailing lists

19
myGrid and WBS People!
  • Core
  • Matthew Addis, Nedim Alpdemir, Tim Carver, Rich
    Cawley, Neil Davis, Alvaro Fernandes, Justin
    Ferris, Robert Gaizaukaus, Kevin Glover, Carole
    Goble, Chris Greenhalgh, Mark Greenwood, Yikun
    Guo, Ananth Krishna, Peter Li, Phillip Lord,
    Darren Marvin, Simon Miles, Luc Moreau, Arijit
    Mukherjee, Tom Oinn, Juri Papay, Savas
    Parastatidis, Norman Paton, Terry Payne, Matthew
    Pockock Milena Radenkovic, Stefan
    Rennick-Egglestone, Peter Rice, Martin Senger,
    Nick Sharman, Robert Stevens, Victor Tan, Anil
    Wipat, Paul Watson and Chris Wroe.
  • Users
  • Simon Pearce and Claire Jennings, Institute of
    Human Genetics School of Clinical Medical
    Sciences, University of Newcastle, UK
  • Hannah Tipney, May Tassabehji, Andy Brass, St
    Marys Hospital, Manchester, UK
  • Postgraduates
  • Martin Szomszor, Duncan Hull, Jun Zhao, Pinar
    Alper, John Dickman, Keith Flanagan, Antoon
    Goderis, Tracy Craddock, Alastair Hampshire
  • Industrial
  • Dennis Quan, Sean Martin, Michael Niemi, Syd
    Chapman (IBM)
  • Robin McEntire (GSK)
  • Collaborators
  • Keith Decker

20
Acknowledgements
myGrid is an EPSRC funded UK eScience Program
Pilot Project
Particular thanks to the other members of the
Taverna project, http//taverna.sf.net
Write a Comment
User Comments (0)
About PowerShow.com