Title: Taverna Workbench
1Taverna Workbench
- Stuart Owen
- University of Mancester, UK
- stuart.owen_at_manchester.ac.uk
2What is a workflow
- Data workflows
- A task is invoked once its expected data has been
received, and when complete passes any resulting
data downstream. - B starts when it receives data from A.
- C and D run in parallel when they receive data
from B - E starts once its received data from both C and
D. - Control workflows
- A task is invoked once its dependant tasks have
completed. - B starts when A has completed.
- C and D run in parallel once B has completed
- E starts once both C and D have completed.
A
B
C
D
E
F
3Advantages of workflows
4Advantages to workflows
- High-level abstraction
- Easier to understand and modify.
- Easier to describe and discuss with others.
- Describes what you want to do, not how to do it.
- Automation
- Sytematic
- Sharing and re-use
- Either on its own, or within other workflows!
-
5Workflows within Taverna
- Predominantly based around the flow of data, but
does allow control constraints as well. - Service oriented workflows. Services may or not
be grid enabled. - High-level GUI approach seperated from lower
level coding, you dont have to be a coder to
build a workflow. - Enactment can take place separate to the GUI,
allowing workflows to be executed from the
command line or within other systems.
6(No Transcript)
7Taverna 1.4 Workbench
- Integral part of the myGrid project
- Java based, runs on Windows, Mac OS, Linux,
Solaris - Open source and user driven development
- Taverna in OMII-UK
- Dedicated team of developers focused on design,
implementation, testing and support leading to
production quality software. - Development of Taverna 2.0
8Taverna 1.4 workbench
9SCUFL
Taverna Workbench
(Simple Conceptual Unified Flow Language)
Application data flow layer Scufl graph service
introspection
Scufl Workflow Object Model
Execution flow layer List management implicit
iteration mechanism MIME semantic type
decoration fault management service alternates
Workflow Execution
Freefluo Workflow enactor
Processor invocation layer
Processor
Processor
Processor
Processor
Processor
Processor
Web Service
Soap lab
Local App
Bio MOBY
?
Enactor
10Nested workflows
- A processor can be a workflow itself.
- Encourages the reuse of workflows within a more
complex scenario. - Greater abstraction of an overall process making
it more manageable.
11(No Transcript)
12Iterations
- Scufl handles iterations implicitly
- i.e. Taverna handles it automagically, theres no
need for the user to indicate that there is an
iteration required. - Taverna recognises the data mismatch and
repeatedly runs the task over each data element
in the list. - Iteration stategy with multiple inputs can be
configured.
- Cross product - all against all
- Dot product first against first, second
against second .. etc
13What about when a service fails?
- Most services are owned by other people
- No control over service failure
- Some are research level
- Workflows are only as good as the services they
connect! - To help - Taverna can
- Notify failures
- Instigate retries
- Set criticality
- Substitute alternative
- services
14Provenance Data?
- Supports scientific method and best practice
- Metadata about the origin of a resource (workflow
, service, data , experiment hypothesis etc) and
the process of how a resource was generated. - The Who? , What? , When? ,Where? and Why? about
resources. - Stored as RDF triples
- Also available as OWL, opening it up to complex
reasoning
15Typed Workflow Run
launchedBy
Provenance Ontology
executed
Experimenter
Organization
ProcessRun
WorkflowRun
Workflow
belongsTo
runs
urnlsidworkflow6
urnlsidorgHY7
runs
belongsTo
urnlsid..wfInstance8
launchedBy
urnlsidperson4
executed
executed
urnlsidprocessRun84
urnlsidprocessRun51
16Provenance Browser
17New plans for Taverna 2.0
18Evolving challenges
- Long running data intensive workflows
- Manipulation of confidential or otherwise
protected information - Use with classical grid systems
- Publishing and sharing of workflows
- Better use of provenance
19Runtime Service Binding
- Service definition consists of an abstract
description - Resolved at workflow runtime to one or more
concrete resources by a broker - Allows load balancing or economic model based
service selection over grid environments
20Processor Dispatch Stack
213rd party data transfers
- Allows in place referencing of data
- Large data sets no longer round-trip between
workflow engine and data provider - Allows restricted access to sensitive data
- Automatic de-reference when a reference type is
linked to a value type within a workflow.
22Streaming Data
- Allow execution of downstream workflow stages on
partially complete results from upstream.
Service 1
Service 2
Service 3
Non streaming (Taverna 1), entire iteration must
complete at each stage
Streamed data, Service 2 starts operating on
partial results from Service 1
23Conclusions
- Taverna and its source code is free to download.
- http//taverna.sourceforge.net
- Taverna is being adopted by a number of different
disciplines outside its bio-science origins,
including chemoinformatics, social science,
astronomy. - Open architecture and support for plugins to cope
with open world allows expansion into other
areas - User driven development
- Taverna users mailing list
- Taverna hackers mailing list
- Production quality software within OMII-UK
24Acknowledgements
- The myGrid group, past and present.
- OMII-UK
- All our users
- Carole Goble
- Katy Wolstencroft
- Daniele Turi
- Matthew Gamble
- Tom Oinn
- Paul Fisher