The Taverna Workflow System - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

The Taverna Workflow System

Description:

The Taverna Workflow System – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 42
Provided by: Wol80
Category:

less

Transcript and Presenter's Notes

Title: The Taverna Workflow System


1
The Taverna Workflow System
  • Katy Wolstencroft
  • University of Manchester

2
Taverna and myExperiment
  • Taverna
  • Enables the interoperation between databases and
    tools by providing a toolkit for composing,
    executing and managing workflow experiments
  • Access to local and remote resources and analysis
    tools
  • Automation of data flow
  • Iteration over large data sets
  • myExperiment
  • Workflow sharing and publishing environment

3
Client Applications
Provenance Ontology
myExperiment Web Portal
Taverna Workbench GUI
Workflow Warehouse
Provenance Warehouse
Service / Component Catalogue
Taverna Workflow Enactor
Feta Information Services
LogBook Provenance Management
Default Results
Service Ontology
Custom Datasets
3rd Party Resources (Web Services, Grid Services)
Service Management
Resources
4
myGrid Development
  • Started 2001 EPSRC E-Science project
  • 2005 OMII-UK
  • myExperiment - started 2007
  • Open Source
  • Funded until after 2010

5
myGrid Development
6
What is a Workflow?
  • Workflows provide a general technique for
    describing and enacting a process
  • Describes what you want to do, not how you want
    to do it
  • Simple language specifies how bioinformatics
    processes fit together
  • Processes are represented as web services

7
Taverna
8
Tree view of workflow structure
9
Who Provides the Services?
  • Open domain services and resources.
  • Taverna accesses 3500 services
  • Third party we dont own them we didnt build
    them
  • All the major providers
  • NCBI, DDBJ, EBI
  • Enforce NO common data model.
  • Quality Web Services considered desirable

10
What types of service?
  • WSDL Web Services
  • BioMart
  • R-processor
  • BioMoby
  • Soaplab
  • Local Java services
  • Beanshell
  • Workflows

11
Adding your own services
  • SoapLab
  • Java API Consumer

http//www.ebi.ac.uk/soaplab/
12
Who uses Taverna?
  • Over 51,000 downloads
  • Users worldwide
  • Systems biology
  • Proteomics
  • Gene/protein annotation
  • Microarray data analysis
  • Medical image analysis
  • Heart simulations
  • High throughput screening
  • Genotype/Phenotype studies
  • Health Informatics
  • Astronomy
  • Chemoinformatics
  • Data integration

13
What do Scientists use Taverna for?
  • Data gathering and annotating
  • Distributed data and knowledge
  • Building models and knowledge management
  • Populating SBML or hypothesis generation
  • Data analysis
  • Distributed analysis tools and high throughput

14
Data Gathering
  • Collecting evidence from lots of places
  • Accessing local and remote databases, extracting
    info and displaying a unified view to the user

15
Annotation Pipelines
  • Genome annotation pipelines
  • Bergen Center for Computational Science Gene
    Prediction in Algal Viruses, a case study.
  • Workflow assembles evidence for predicted genes /
    potential functions
  • Human expert can review this evidence before
    submission to the genome database
  • Data warehouse pipelines
  • e-Fungi model organism warehouse
  • ISPIDER proteomics warehouse
  • Annotating the up/down regulated genes in a
    microarray experiment

16
(No Transcript)
17
Building models and knowledge management
  • Populating databases
  • Populating models (e.g. SBML)
  • Comparing models and experimental data

18
Systems Biology Model Construction
Peter Li, Doug Kell
Automatic reconstruction of genome-scale yeast
metabolism from distributed data in the life
sciences to create and manipulate Systems Biology
Markup Models.
19
Integration of microarray data onto SBML models
Read enzyme names from SBML
Query maxdLoad2 using enzyme names
Calculate colours based on gene expn level
Peter Li, Doug Kell, University of Manchester
Create new SBML model with new colour nodes
20
LibSBML Integration
  • API consumer used to integrate libSBML directly
    into Taverna

Performing statistical analyses on quantitative
data in Taverna workflows an example using R and
maxdBrowse to identify differentially-expressed
genes from microarray data Peter Li, Juan I.
Castrillo, Giles Velarde, Ingo Wassink, Stian
Soiland-Reyes, Stuart Owen, David Withers, Tom
Oinn, Matthew R. Pocock, Carole A. Goble, Stephen
G. Oliver, Douglas B. Kell Submitted to BMC
bioinformatics
21
Data Analysis
  • Access to local and remote analysis tool
  • You start with your own data / public data of
    interest
  • You need to analyse it to extract biological
    knowledge

22
Trichuris muris
  • Mouse whipworm infection - parasite model of the
    human parasite - Trichuris trichuria
  • Understanding Phenotype
  • Comparing resistant vs susceptible strains
    Microarrays
  • Understanding Genotype
  • Mapping quantitative traits Classical genetics
    QTL

Joanne Pennock, Richard Grencis University of
Manchester
23
Trichuris muris
  • Identified the biological pathways involved in
    sex dependence in the mouse model, previously
    believed to be involved in the ability of mice to
    expel the parasite.
  • Manual experimentation Two year study of
    candidate genes, processes unidentified

Joanne Pennock, Richard Grencis University of
Manchester
24
Trichuris muris
  • Identified the biological pathways involved in
    sex dependence in the mouse model, previously
    believed to be involved in the ability of mice to
    expel the parasite.
  • Manual experimentation Two year study of
    candidate genes, processes unidentified
  • JO IS A LAB BIOLOGIST
  • JO HAS NEVER BUILT A WORKFLOW

Joanne Pennock, Richard Grencis University of
Manchester
25
  • Sleeping Sickness in African Cattle
  • Caused by infection by parasite (Trypanosoma
    brucei)
  • Some cattle breeds more resistant than others
  • Differences between resistant and susceptible
    cattle?
  • Can we breed cattle resistant to infection?

Steve Kemp
Andy Brass
Fisher et al (2007) A systematic strategy for
large-scale analysis of genotype phenotype
correlations identification of candidate genes
involved in African trypanosomiasis. Nucleic
Acids Res.35(16)5625-33
Paul Fisher
http//www.genomics.liv.ac.uk/tryps/trypsindex.htm
l
26
Why was the Workflow Approach Successful?
  • Workflows are protocols they can be reused or
    repurposed
  • Workflow analysed each piece of data
    systematically
  • Eliminated user bias and premature filtering of
    datasets and results leading to single sided,
    expert-driven hypotheses
  • The size of the QTL and amount of the microarray
    data made a manual approach impractical
  • Workflows capture exactly where data came from
    and how it was analysed
  • Workflow output produced a manageable amount of
    data for the biologists to interpret and verify
  • make sense of this data -gt does this make
    sense?

27
Sharing Experiments
  • Taverna supports the in silico experimental
    process for individual scientists
  • How do you share your results/experiments/experien
    ces with your
  • Research group
  • Collaborators
  • Scientific community
  • How do you compare your results with others
    produced by e.g. Kepler / Triana?

28
(No Transcript)
29
Just Enough Sharing.
  • myExperiment can provide a central location for
    workflows from one community/group
  • myExperiment allows you to say
  • Who can look at your workflow
  • Who can download your workflow
  • Who can modify your workflow
  • Who can run your workflow
  • Share individual workflows or packs collections
    of workflows and/or associated data /
    documentation

30
Google Gadgets
31
Running Workflows Through myExperimentTaverna
Remote Execution (T-REX)
32
Current Directions
33
Speed and Scalability
  • Taverna 2 enactor
  • Support for long running workflows
  • Large scale data industrial bioinformatics
  • Data streaming
  • Passing data by reference
  • Integration with established computing platforms

34
(No Transcript)
35
Extensibility and ease of use
  • Drag and drop workflow building
  • More content greater pool of workflows in
    myExperiment
  • More components gathering together commonly
    used groups of services
  • Shim libraries for combining incompatible
    service
  • Service and workflow annotation checking

36
BioCatalogueJoint Manchester-EBI
Curation by Developers
refine validate
seed
Curation by Experts
refine validate
refine validate
seed
seed
Automated Curation
Curation by the Community
37
Toolkits Taverna Inside
  • Workflows under the hood
  • e-Laboratories (portals)
  • Systems Biology, e-Health
  • Dashboards
  • configurable platforms for small biological
    communities
  • Visualisation clients that call workflows in the
    background

38
UTOPIA Pettifer, Kell, University of Manchester
39
Toolkits Taverna InsideWorkflow development
pipeline
E-Labs, Dashboards and 3rd party clients
Social support to find and reuse workflows and
expertise CONFIGURABLE access to ready made
workflows for biologists Workflows embedded in
applications and combined with data management
systems
Social support for bioinformaticians to find and
reuse workflows and expertise Access to ready
made workflows for biologists
Workflows developed by bioinformaticians Enacted
locally
Workflows enacted locally
Taverna remote execution service (T-Rex)
40
myGrid acknowledgements
  • Carole Goble, Norman Paton, Robert Stevens, Anil
    Wipat, David De Roure, Steve Pettifer
  • OMII-UK and myExperiment Tom Oinn, Katy
    Wolstencroft, Daniele Turi, June Finch, Stuart
    Owen, David Withers, Stian Soiland, Franck Tanoh,
    Matthew Gamble, Alan Williams, Ian Dunlop, Alex
    Nenadic, Jiten Bhagat, Don Cruickshank, Sergejs
    Aleksejevs
  • Research Martin Szomszor, Duncan Hull, Jun Zhao,
    Pinar Alper, Antoon Goderis, Alastair Hampshire,
    Qiuwei Yu, Wang Kaixuan.
  • Current contributors Matthew Pocock, James Marsh,
    Khalid Belhajjame, PsyGrid project, Bergen
    people, EMBRACE people.
  • User Advocates and their bosses Simon Pearce,
    Claire Jennings, Hannah Tipney, May Tassabehji,
    Andy Brass, Paul Fisher, Peter Li, Simon Hubbard,
    Tracy Craddock, Doug Kell, Marco Roos, Matthew
    Pocock, Mark Wilkinson
  • Past Contributors Matthew Addis, Nedim Alpdemir,
    Tim Carver, Rich Cawley, Neil Davis, Alvaro
    Fernandes, Justin Ferris, Robert Gaizaukaus,
    Kevin Glover, Chris Greenhalgh, Mark Greenwood,
    Yikun Guo, Ananth Krishna, Phillip Lord, Darren
    Marvin, Simon Miles, Luc Moreau, Arijit
    Mukherjee, Juri Papay, Savas Parastatidis, Milena
    Radenkovic, Stefan Rennick-Egglestone, Peter
    Rice, Martin Senger, Nick Sharman, Victor Tan,
    Paul Watson, and Chris Wroe.
  • Industrial Dennis Quan, Sean Martin, Michael
    Niemi (IBM), Chimatica.
  • Funding EPSRC, Wellcome Trust.

http//www.mygrid.org.uk http//www.myexperiment.o
rg
41
Workflow Environments Summary
  • Taverna
  • Workflow design and execution environment
  • Access to local and remote resources and analysis
    tools
  • Automation of data flow
  • Iteration over large data sets
  • 50,000 downloads
  • Used by 350 organisations
  • myExperiment
  • Workflow sharing and publishing environment
  • Web 2.0
  • Social network
  • Sharing workflows, expertise, knowledge and/or
    data
  • 964 users
  • 301 workflows
Write a Comment
User Comments (0)
About PowerShow.com