Workflows within Taverna - PowerPoint PPT Presentation

About This Presentation
Title:

Workflows within Taverna

Description:

Coordinate units of work and the flow of documents according to ... Ingest. Early adopters. Pioneers. Pioneers. Conservatives. Early adopters. Pioneers. myGrid ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 29
Provided by: stuar48
Category:

less

Transcript and Presenter's Notes

Title: Workflows within Taverna


1
Workflows within Taverna
  • Stuart Owen
  • University of Mancester, UK
  • stuart.owen_at_manchester.ac.uk

2
What is a workflow?
  • Origins stem from the business world 1970s.
  • Coordinate units of work and the flow of
    documents according to some procedural rules, to
    describe and carry out a complex process within
    an organisation.
  • Adopted within the scientific world over the past
    decade.
  • Coordinate a series of computational tasks
    according to some procedural rules, to describe
    and execute a complex process within an
    experiment.

3
What is a workflow
  • Data workflows
  • A task is invoked once its expected data has been
    received, and when complete passes any resulting
    data downstream.
  • B starts when it receives data from A.
  • C and D run in parallel when they receive data
    from B
  • E starts once its received data from both C and
    D.
  • Control workflows
  • A task is invoked once its dependant tasks have
    completed.
  • B starts when A has completed.
  • C and D run in parallel once B has completed
  • E starts once both C and D have completed.

A
B
C
D
E
F
4
Advantages of workflows
5
Advantages to workflows
  • High-level abstraction
  • Easier to understand and modify.
  • Easier to describe and discuss with others.
  • Describes what you want to do, not how to do it.
  • Automation
  • Sharing and re-use
  • Either on its own, or within other workflows!

6
Workflows within Taverna
  • A hybrid between data and control workflows.
  • Predominantly based around the flow of data.
  • Service oriented workflows. Services may or not
    be grid enabled.
  • High-level GUI approach seperated from lower
    level coding, you dont have to be a coder to
    build a workflow.
  • Enactment can take place separate to the GUI,
    allowing workflows to be executed from the
    command line or within other systems.

7
(No Transcript)
8
Taverna 1.4 Workbench
  • Integral part of the myGrid project
  • Java based, runs on Windows, Mac OS, Linux,
    Solaris .
  • Open source and user driven development
  • 1000 downloads of current version over past
    month
  • Over 3000 downloads of version 1.3.1
  • Over 10000 downloads in total
  • http//taverna.sourceforge.net
  • Taverna in OMII-UK
  • Dedicated team of developers focused on design,
    implementation, testing and support leading to
    production quality software.
  • Development of Taverna 2.0

9
Taverna 1.4 workbench
10
(No Transcript)
11
SCUFL
Taverna Workbench
(Simple Conceptual Unified Flow Language)
Application data flow layer Scufl graph service
introspection
Scufl Workflow Object Model
Execution flow layer List management implicit
iteration mechanism MIME semantic type
decoration fault management service alternates
Workflow Execution
Freefluo Workflow enactor
Processor invocation layer
Processor
Processor
Processor
Processor
Processor
Processor
Bio MOBY
Plain Web Service
Soap lab
Local App
?
Enactor
12
Taverna Processor
  • Primary component of a scufl workflow.
  • Represents a unit of work a task.
  • Data flows between processors.
  • Most are associated with some sort of external
    resource, for example a WSDL based webservice.
  • Also includes basic local widgets, most
    commonly used for data format transformations
    shims.
  • Follow a standard architecture pattern and are
    extendable plugins you can create your own for
    you specific needs. (But needs to be shared to
    share your workflow).

13
Nested workflows
  • A processor can be a workflow itself.
  • Encourages the reuse of workflows within a more
    complex scenario.
  • Greater abstraction of an overall process making
    it more manageable.

14
(No Transcript)
15
Iterations
  • Scufl handles iterations implicitly
  • i.e. Taverna handles it automagically, theres no
    need for the user to indicate that there is an
    iteration required.
  • Taverna recognises the data mismatch and
    repeatedly runs the task over each data element
    in the list.
  • Iteration stategy with multiple inputs can be
    configured.
  • Cross product - all against all
  • Dot product first against first, second
    against second .. etc

16
What about when a service fails?
  • Most services are owned by other people
  • No control over service failure
  • Some are research level
  • Workflows are only as good as the services they
    connect!
  • To help - Taverna can
  • Notify failures
  • Instigate retries
  • Set criticality
  • Substitute alternative
  • services

17
Taverna Processor Task State Transition Diagram
18
Provenance Data?
  • Supports scientific method and best practice
  • Metadata about the origin of a resource (workflow
    , service, data , experiment hypothesis etc) and
    the process of how a resource was generated.
  • The Who? , What? , When? ,Where? and Why? about
    resources.
  • Stored as RDF triples
  • Also available as OWL, opening it up to complex
    reasoning

19
Typed Workflow Run
launchedBy
Provenance Ontology
executed
Experimenter
Organization
ProcessRun
WorkflowRun
Workflow
belongsTo
runs
urnlsidworkflow6
urnlsidorgHY7
runs
belongsTo
urnlsid..wfInstance8
launchedBy
urnlsidperson4
executed
executed
urnlsidprocessRun84
urnlsidprocessRun51
20
Provenance Browser
21
New plans for Taverna 2.0
22
Evolving challenges
  • Long running data intensive workflows
  • Manipulation of confidential or otherwise
    protected information
  • Use with classical grid systems
  • Publishing and sharing of workflows
  • Better use of provenance

23
Runtime Service Binding
  • Service definition consists of an abstract
    description
  • Resolved at workflow runtime to one or more
    concrete resources by a broker
  • Allows load balancing or economic model based
    service selection over grid environments

24
Processor Dispatch Stack
25
3rd party data transfers
  • Allows in place referencing of data
  • Large data sets no longer round-trip between
    workflow engine and data provider
  • Allows restricted access to sensitive data
  • Automatic de-reference when a reference type is
    linked to a value type within a workflow.
  • Connecting a grid service to a web service

26
Streaming Data
  • Allow execution of downstream workflow stages on
    partially complete results from upstream.

Service 1
Service 2
Service 3
Non streaming (Taverna 1), entire iteration must
complete at each stage
Streamed data, Service 2 starts operating on
partial results from Service 1
27
Conclusions
  • Taverna and its source code is free to download.
  • http//taverna.sourceforge.net
  • Taverna is being adopted by a number of different
    disciplines outside its bio-science origins,
    including chemoinformatics, social science,
    astronomy.
  • Open architecture and support for plugins to cope
    with open world allows expansion into other
    areas
  • User driven development
  • Taverna users mailing list
  • Taverna hackers mailing list
  • Production quality software within OMII-UK

28
Acknowledgements
  • The myGrid group, past and present.
  • OMII-UK
  • All our users
  • Carole Goble
  • Katy Wolstencroft
  • Daniele Turi
  • Matthew Gamble
Write a Comment
User Comments (0)
About PowerShow.com