Title: eBioFlow Different perspectives on scientific workflows
1e-BioFlow Different perspectives on scientific
workflows
- Ingo Wassink1, Paul van der Vet, Anton Nijholt
- Human Media Interaction Group, University of
Twente - Han Rauwerda, Timo Breit
- Micro Array Department, University of Amsterdam
- 1) i.wassink_at_ewi.utwente.nl
2Overview
- Scientific workflow systems
- Problems using workflow systems
- e-BioFlow
- Control flow perspective
- Data flow perspective
- Resource perspective
- Relations between perspectives
- Conclusion
- Current state
- Future work
3Scientific workflow systems
- A lot of data is stored in online databases
- SwissProt, PDB
- Access to many services in a uniform manner
- BioMoby, WSDL, SoapLab
- Using a graphical workflow tool
- Connect these services and databases
- Share experiments and experimental results with
others - Discuss, improve and reuse experiments
- myExperiment, (www.myexperiment.org)
- Automatically store provenance data
4Problems using workflow systems
- Are often difficult to use due to a complex user
interface - It is not possible to model advanced control
structures - Loops Choices
- A priori knowledge about services is required
- Function
- Input and output
- Data produced by one service is often
incompatible with data consumed by others - If a service is not available, the workflow needs
to be modified
5e-BioFlow (I)
- Graphical tool user interacts with the workflow
directly - Develop templates for experiments
- Abstract from web services until workflow
execution - Enable complex workflow structures
- Sequential, parallel, iteration and choices
- Shows limited information at a time to prevent
information overload - But does not restrict the user in modeling
workflows - Distributes information using a tabbed user
interface - Ultimate goal improve usability of workflow
systems
6e-BioFlow (II) Screenshots
Control flow perspective
Data flow perspective
Resource perspective
7Control flow perspective (I)
- Defines the order of task execution
- Uses dependencies a task can depend on prior
tasks - Advanced control structures to define the order
of task execution - Sequential a task needs to wait for completion
of prior task(s) - Parallel tasks can be executed at the same time
- Iterative a task needs to be repeated until some
criterion is met - Branching the execution of a task depends on a
certain criterion
8Control flow perspective (II)
- AND execute next tasks in parallel
- XOR execute one of the next tasks, depending on
conditions
9Data flow perspective
- A task can require information from prior tasks
- Tasks have input and output ports to consume and
produce data - Pipes are used to define output of prior tasks
being input for next tasks - Type restrictions on data are tested
10Resource perspective
- Defines what type of action (service) needs to be
executed instead of which actor (web service,
tool, user) to invoke - Actor is chosen at runtime
- Uses roles to describe constraints on actors
- A role defines the required capabilities of an
actor - Service type
- Input and output it should consume and produce
11Relations between perspectives (I)
- A central workflow specification is shared and
edited by the different perspectives - Changes in one perspective are propagated to the
other perspectives, wherever applicable - Visual effects
- To ease switching between perspectives
- Task positions size
- Zoom level
12Relations between perspectives (II)
- If data is transferred between tasks, this
implies the existence of a dependency between
these tasks
Task requires information from one of the prior
tasks
Task requires information from both prior tasks
13Relations between perspectives (III)
- If a task requires input and output, this puts
constraints on the suitability of actors, and
vice versa
An alignment task requires two input sequences
A Blast task requires only one input
14Conclusion
- e-BioFlow enables one to create advanced control
structures - By abstracting from services, it is possible to
design experiment templates - The amount of information presented to the user
is limited by providing different perspectives to
the user - However, an executable environment is required to
test the usability of this approach
15Current state
- Integration of a workflow engine
- A control flow workflow engine Yawl
(www.yawlfoundation.org) - Late binding of services
- Support for different type of services
- Support for WSDL/SOAP and BioMOBY services
- Support for scripting tasks (R, Perl, BeanShell)
- User interaction tasks
- New types of services can easily be created using
a plugin structure - Framework for storing provenance data
16Future work
- Creating workflows ad-hoc
- Directly execute tasks during workflow
construction - Redo steps, take alternative steps
- Store and browse provenance data
- Provide a user interface closely related to the
workflow model - Improve mapping between actors and roles
- Mapping between different ontologies structures
is required
17Acknowledgement
- Pieter Neerincx
- Laboratory of Bioinformatics, Wageningen
University - Wim de Leeuw
- MAD, University of Amsterdam
- Matthijs Ooms
- HMI, University of Twente
- This work was part of the BioRange programme of
the Netherlands Bioinformatics Centre (NBIC),
which is supported by a BSIK grant through the
Netherlands Genomics Initiative (NGI).
18Thanks
Questions
Advice
Remarks
Ideas
More information
e-BioFlow is open source http//ewi.utwente.nl/bi
orange/ebioflow