Title: Taverna in 2006
1Taverna in 2006
- Industry Workshop,
- tmo_at_ebi.ac.uk,
- 8th March 2006
2Taverna 1
- 3 Years old, 1300 downloads in latest release
over two months. - Expanding community covering an increasing
variety of domains - Originally funded as part of an EPSRC pilot
project, research rather than production focus - A success but with limitations
3Taverna 1.3.1 Workbench
4Evolving challenges
- Long running data intensive workflows
- Manipulation of confidential or otherwise
protected information - Use with classical grid systems
- Interaction with users during workflows
- Workflow authoring, service discovery and
composition - Data comprehension, provenance and visualization
5User Interaction Handling
- Interaction Service and corresponding Taverna
processor allows a workflow to call out to an
expert human user - Used to embed the Artemis annotation editor
within an otherwise automated genome annotation
pipeline
6Interaction Service Architecture
Patterns
Results
Status
Upload
Submit
Interaction Store Proxy
Download
Taverna 1.3
7DALEC Linking Taverna and DAS
- DALEC exposes a Taverna workflow as a Distributed
Annotation System (DAS) annotation source. - Design workflow in Taverna
- Deploy in DALEC
- Access through any DAS client (Spice, Ensembl web
server etc)
Standard DAS Service
DALEC DAS Service
8Taverna 2
- Funded as part of OMII-UK
- 10 Developers
- Dedicated design, implementation, testing and
support team - First new developers started three weeks ago,
project manager arriving in April
9myGrid Alliance
Source-forge community
Ingest
OMII-UK Release
myGrid Release
myGrid Pre-release
Evaluation
Software Engineering Quality Test
OMII Software Engineering Quality Test
Software Engineering XP
Prioritise Plan
Applications Professional Services
Production
Conservatives Early adopters Pioneers
Early adopters Pioneers
Pioneers
10Future Direction
- Enhancements to the Workflow Core
- Enhancements to user interface and experience
- Expanded use of semantic web technologies
- Engagement with new user communities
cheminformatics, humanities, social sciences etc. - Code remains open source and always will
11Composite Workflow Models
12Enhanced Dataflow Model
- Modular dispatcher mechanism
- Dynamic service binding
- Recursive invocation
- Data filter implementation
- Retry, failover, back-off behaviours
- Transparent third party data transfers
- High throughput stream handling with implicit
iteration semantics
13Runtime Service Binding
- Service definition consists of an abstract
description - Resolved at workflow runtime to one or more
concrete resources by a broker - Allows load balancing or economic model based
service selection over grid environments
14Recursive Invocation
Receive Input
- Dispatcher allowing recursive invocation to be
plugged into per operation semantics.
Return Result
15Dynamic Dispatch Configuration
163rd Party Data Transfers
- Allows in place referencing of data
- Large data sets no longer round-trip between
workflow engine and data provider - Allows restricted access to sensitive data
- Automatic de-reference when a reference type is
linked to a value type within a workflow. - Connecting a grid service to a web service
17Service 1
Service 2
Service 3
Logical Workflow Structure defined by user
Client pushes workflow input data value to
workflow enactor, enactor stores the value in a
local cache for future use.
18Service 1
Service 2
Service 3
Logical Workflow Structure defined by user
Workflow enactor sends cached data value to
Service 1.
Service 1
Service 2
Provider A
19Service 1
Service 2
Service 3
Logical Workflow Structure defined by user
Service 1 completes and stores its result value
in a local data store, for example SRB, on the
same host (Provider A). It returns a reference to
that value to the workflow enactor.
20Service 1
Service 2
Service 3
Logical Workflow Structure defined by user
The enactor examines the workflow and determines
that Service 2 understands the reference it has
to the Service 1 result. It sends this reference
to Service 2 which uses it to directly access the
local data store.
21Service 1
Service 2
Service 3
Logical Workflow Structure defined by user
Service 2 completes, stores its result in the
local store and returns a reference to that data
to the enactor.
22Service 1
Service 2
Service 3
Logical Workflow Structure defined by user
The enactor examines Service 3. This service,
located on another provider, cannot consume the
reference returned from Service 2. The enactor
forces a de-reference, requesting and caching the
value of that reference from Provider A
23Service 1
Service 2
Service 3
Logical Workflow Structure defined by user
As the enactor now has a value rather than a
reference it can invoke Service 3, which is fed
data from the enactor local cache, operates over
that data and returns a result which is in turn
cached by the enactor.
24Service 1
Service 2
Service 3
Logical Workflow Structure defined by user
The workflow is complete, the enactor sends the
final result back to the client.
25Streaming Data
- Allow execution of downstream workflow stages on
partially complete results from upstream.
Service 1
Service 2
Service 3
Non streaming (Taverna 1), entire iteration must
complete at each stage
Streamed data, Service 2 starts operating on
partial results from Service 1
26New UI Development
- Smart graph editing module
- 3d virtual reality style enactment status
display - Data playground design workflows by example
- Integrated semantic search
- Knowledge driven visualization for result mining
27KAVE Data and metadata management
- Life Science Identifiers
- Information Model
- File management
- Support for custom database building
- Provenance metadata capture using RDF
- SRB integration
- OGSA-DAI integration
28Computational Steering
Scientist designs, initiates and steers
simulation from Taverna Workbench
Scientists
Workflow Workbench
Steering of simulations by manipulation of
service state
Steering Control
Process 1
Process 2
Process 3
Enactor
Workflow definition sent to enactor
myGrid Metadata Stores
Process and data provenance captured and stored
by metadata services
29Service Types
- Closer integration with grid systems i.e. Condor,
EGEE et al and their associated security and
access control mechanisms. - R for numerical analysis (microarray informatics
amongst others) - Continued improvements to SOAP, BioMoby, Biomart,
Soaplab, SGS, Local scripting and other components
30Obtaining Taverna
- Taverna is available under the LGPL from our
project site on Sourceforge.net - http//taverna.sourceforge.net
- Release 1.3.1 as of December 2005
- Win32, Solaris / Linux OS-X
- Includes online and downloadable user manual,
examples etc. - Support via project mailing lists
31myGrid team Early adopters
- Core
- Matthew Addis, Nedim Alpdemir, Tim Carver, Rich
Cawley, Neil Davis, Alvaro Fernandes, Justin
Ferris, Robert Gaizaukaus, Kevin Glover, Carole
Goble, Chris Greenhalgh, Mark Greenwood, Yikun
Guo, Ananth Krishna, Peter Li, Phillip Lord,
Darren Marvin, Simon Miles, Luc Moreau, Arijit
Mukherjee, Tom Oinn, Juri Papay, Savas
Parastatidis, Norman Paton, Terry Payne, Matthew
Pockock Milena Radenkovic, Stefan
Rennick-Egglestone, Peter Rice, Martin Senger,
Nick Sharman, Robert Stevens, Victor Tan, Anil
Wipat, Paul Watson and Chris Wroe. - Users
- Simon Pearce and Claire Jennings, Institute of
Human Genetics School of Clinical Medical
Sciences, University of Newcastle, UK - Hannah Tipney, May Tassabehji, Andy Brass, St
Marys Hospital, Manchester, UK - Postgraduates
- Martin Szomszor, Duncan Hull, Jun Zhao, Pinar
Alper, John Dickman, Keith Flanagan, Antoon
Goderis, Tracy Craddock, Alastair Hampshire - Industrial
- Dennis Quan, Sean Martin, Michael Niemi, Syd
Chapman (IBM) - Robin McEntire (GSK)
- Collaborators
- Keith Decker