Title: The Taverna Workflow System
1The Taverna Workflow System
- Katy Wolstencroft
- University of Manchester
2Taverna and myExperiment
- Taverna
- Enables the interoperation between databases and
tools by providing a toolkit for composing,
executing and managing workflow experiments - Access to local and remote resources and analysis
tools - Automation of data flow
- Iteration over large data sets
- myExperiment
- Workflow sharing and publishing environment
3Client Applications
Provenance Ontology
myExperiment Web Portal
Taverna Workbench GUI
Workflow Warehouse
Provenance Warehouse
Service / Component Catalogue
Taverna Workflow Enactor
Feta Information Services
LogBook Provenance Management
Default Results
Service Ontology
Custom Datasets
3rd Party Resources (Web Services, Grid Services)
Service Management
Resources
4myGrid Development
- Started 2001 EPSRC E-Science project
- 2005 OMII-UK
- myExperiment - started 2007
- Open Source
- Funded until after 2010
5myGrid Development
6 What is a Workflow?
- Workflows provide a general technique for
describing and enacting a process - Describes what you want to do, not how you want
to do it - Simple language specifies how bioinformatics
processes fit together - Processes are represented as web services
7Taverna
8Tree view of workflow structure
9Who Provides the Services?
- Open domain services and resources.
- Taverna accesses 3500 services
- Third party we dont own them we didnt build
them - All the major providers
- NCBI, DDBJ, EBI
- Enforce NO common data model.
- Quality Web Services considered desirable
10What types of service?
- WSDL Web Services
- BioMart
- R-processor
- BioMoby
- Soaplab
- Local Java services
- Beanshell
- Workflows
11Adding your own services
http//www.ebi.ac.uk/soaplab/
12Who uses Taverna?
- Over 51,000 downloads
- Users worldwide
- Systems biology
- Proteomics
- Gene/protein annotation
- Microarray data analysis
- Medical image analysis
- Heart simulations
- High throughput screening
- Genotype/Phenotype studies
- Health Informatics
- Astronomy
- Chemoinformatics
- Data integration
13What do Scientists use Taverna for?
- Data gathering and annotating
- Distributed data and knowledge
- Building models and knowledge management
- Populating SBML or hypothesis generation
- Data analysis
- Distributed analysis tools and high throughput
14Data Gathering
- Collecting evidence from lots of places
- Accessing local and remote databases, extracting
info and displaying a unified view to the user
15Annotation Pipelines
- Genome annotation pipelines
- Bergen Center for Computational Science Gene
Prediction in Algal Viruses, a case study. - Workflow assembles evidence for predicted genes /
potential functions - Human expert can review this evidence before
submission to the genome database - Data warehouse pipelines
- e-Fungi model organism warehouse
- ISPIDER proteomics warehouse
- Annotating the up/down regulated genes in a
microarray experiment
16(No Transcript)
17Building models and knowledge management
- Populating databases
- Populating models (e.g. SBML)
- Comparing models and experimental data
18Systems Biology Model Construction
Peter Li, Doug Kell
Automatic reconstruction of genome-scale yeast
metabolism from distributed data in the life
sciences to create and manipulate Systems Biology
Markup Models.
19Integration of microarray data onto SBML models
Read enzyme names from SBML
Query maxdLoad2 using enzyme names
Calculate colours based on gene expn level
Peter Li, Doug Kell, University of Manchester
Create new SBML model with new colour nodes
20LibSBML Integration
- API consumer used to integrate libSBML directly
into Taverna
Performing statistical analyses on quantitative
data in Taverna workflows an example using R and
maxdBrowse to identify differentially-expressed
genes from microarray data Peter Li, Juan I.
Castrillo, Giles Velarde, Ingo Wassink, Stian
Soiland-Reyes, Stuart Owen, David Withers, Tom
Oinn, Matthew R. Pocock, Carole A. Goble, Stephen
G. Oliver, Douglas B. Kell Submitted to BMC
bioinformatics
21Data Analysis
- Access to local and remote analysis tool
- You start with your own data / public data of
interest - You need to analyse it to extract biological
knowledge
22Trichuris muris
- Mouse whipworm infection - parasite model of the
human parasite - Trichuris trichuria - Understanding Phenotype
- Comparing resistant vs susceptible strains
Microarrays - Understanding Genotype
- Mapping quantitative traits Classical genetics
QTL
Joanne Pennock, Richard Grencis University of
Manchester
23Trichuris muris
- Identified the biological pathways involved in
sex dependence in the mouse model, previously
believed to be involved in the ability of mice to
expel the parasite. - Manual experimentation Two year study of
candidate genes, processes unidentified
Joanne Pennock, Richard Grencis University of
Manchester
24Trichuris muris
- Identified the biological pathways involved in
sex dependence in the mouse model, previously
believed to be involved in the ability of mice to
expel the parasite. - Manual experimentation Two year study of
candidate genes, processes unidentified - JO IS A LAB BIOLOGIST
- JO HAS NEVER BUILT A WORKFLOW
Joanne Pennock, Richard Grencis University of
Manchester
25- Sleeping Sickness in African Cattle
- Caused by infection by parasite (Trypanosoma
brucei) - Some cattle breeds more resistant than others
- Differences between resistant and susceptible
cattle? - Can we breed cattle resistant to infection?
Steve Kemp
Andy Brass
Fisher et al (2007) A systematic strategy for
large-scale analysis of genotype phenotype
correlations identification of candidate genes
involved in African trypanosomiasis. Nucleic
Acids Res.35(16)5625-33
Paul Fisher
http//www.genomics.liv.ac.uk/tryps/trypsindex.htm
l
26Why was the Workflow Approach Successful?
- Workflows are protocols they can be reused or
repurposed - Workflow analysed each piece of data
systematically - Eliminated user bias and premature filtering of
datasets and results leading to single sided,
expert-driven hypotheses - The size of the QTL and amount of the microarray
data made a manual approach impractical - Workflows capture exactly where data came from
and how it was analysed - Workflow output produced a manageable amount of
data for the biologists to interpret and verify - make sense of this data -gt does this make
sense?
27Sharing Experiments
- Taverna supports the in silico experimental
process for individual scientists - How do you share your results/experiments/experien
ces with your - Research group
- Collaborators
- Scientific community
- How do you compare your results with others
produced by e.g. Kepler / Triana?
28(No Transcript)
29Just Enough Sharing.
- myExperiment can provide a central location for
workflows from one community/group - myExperiment allows you to say
- Who can look at your workflow
- Who can download your workflow
- Who can modify your workflow
- Who can run your workflow
- Share individual workflows or packs collections
of workflows and/or associated data /
documentation
30Google Gadgets
31Running Workflows Through myExperimentTaverna
Remote Execution (T-REX)
32Current Directions
33Speed and Scalability
- Taverna 2 enactor
- Support for long running workflows
- Large scale data industrial bioinformatics
- Data streaming
- Passing data by reference
- Integration with established computing platforms
34(No Transcript)
35Extensibility and ease of use
- Drag and drop workflow building
- More content greater pool of workflows in
myExperiment - More components gathering together commonly
used groups of services - Shim libraries for combining incompatible
service - Service and workflow annotation checking
36BioCatalogueJoint Manchester-EBI
Curation by Developers
refine validate
seed
Curation by Experts
refine validate
refine validate
seed
seed
Automated Curation
Curation by the Community
37Toolkits Taverna Inside
- Workflows under the hood
- e-Laboratories (portals)
- Systems Biology, e-Health
- Dashboards
- configurable platforms for small biological
communities - Visualisation clients that call workflows in the
background
38UTOPIA Pettifer, Kell, University of Manchester
39Toolkits Taverna InsideWorkflow development
pipeline
E-Labs, Dashboards and 3rd party clients
Social support to find and reuse workflows and
expertise CONFIGURABLE access to ready made
workflows for biologists Workflows embedded in
applications and combined with data management
systems
Social support for bioinformaticians to find and
reuse workflows and expertise Access to ready
made workflows for biologists
Workflows developed by bioinformaticians Enacted
locally
Workflows enacted locally
Taverna remote execution service (T-Rex)
40myGrid acknowledgements
- Carole Goble, Norman Paton, Robert Stevens, Anil
Wipat, David De Roure, Steve Pettifer - OMII-UK and myExperiment Tom Oinn, Katy
Wolstencroft, Daniele Turi, June Finch, Stuart
Owen, David Withers, Stian Soiland, Franck Tanoh,
Matthew Gamble, Alan Williams, Ian Dunlop, Alex
Nenadic, Jiten Bhagat, Don Cruickshank, Sergejs
Aleksejevs - Research Martin Szomszor, Duncan Hull, Jun Zhao,
Pinar Alper, Antoon Goderis, Alastair Hampshire,
Qiuwei Yu, Wang Kaixuan. - Current contributors Matthew Pocock, James Marsh,
Khalid Belhajjame, PsyGrid project, Bergen
people, EMBRACE people. - User Advocates and their bosses Simon Pearce,
Claire Jennings, Hannah Tipney, May Tassabehji,
Andy Brass, Paul Fisher, Peter Li, Simon Hubbard,
Tracy Craddock, Doug Kell, Marco Roos, Matthew
Pocock, Mark Wilkinson - Past Contributors Matthew Addis, Nedim Alpdemir,
Tim Carver, Rich Cawley, Neil Davis, Alvaro
Fernandes, Justin Ferris, Robert Gaizaukaus,
Kevin Glover, Chris Greenhalgh, Mark Greenwood,
Yikun Guo, Ananth Krishna, Phillip Lord, Darren
Marvin, Simon Miles, Luc Moreau, Arijit
Mukherjee, Juri Papay, Savas Parastatidis, Milena
Radenkovic, Stefan Rennick-Egglestone, Peter
Rice, Martin Senger, Nick Sharman, Victor Tan,
Paul Watson, and Chris Wroe. - Industrial Dennis Quan, Sean Martin, Michael
Niemi (IBM), Chimatica. - Funding EPSRC, Wellcome Trust.
http//www.mygrid.org.uk http//www.myexperiment.o
rg
41Workflow Environments Summary
- Taverna
- Workflow design and execution environment
- Access to local and remote resources and analysis
tools - Automation of data flow
- Iteration over large data sets
- 50,000 downloads
- Used by 350 organisations
- myExperiment
- Workflow sharing and publishing environment
- Web 2.0
- Social network
- Sharing workflows, expertise, knowledge and/or
data - 964 users
- 301 workflows