Title: GRADD: Scientific Workflows
1GRADD Scientific Workflows
2Scientific Workflow E. Science laboris
- Workflows are the new rock and roll of eScience
- Machinery for coordinating the execution of
(scientific) services and linking together
(scientific) resources. - Era of service oriented apps (SOA)
- Repetitive and mundane boring tasks made easier
(data cleaning...) - Facilitates sharing of science
3Trident Scientific Workflow Workbench
- Visually program workflows, through a web browser
- Libraries of activities, workflows and services
- Social annotations and search
- Abstract parallelism, for HPC many core (CCR)
- Adaptive workflows, to detect and respond to
events - Automatic provenance capture, open provenance
model - Costing model, resources include time, power,
data xfer - Integrated data storage and access
- Integrated visualization tools
- Fault tolerance, facilitate smart reruns, what-if
analysis - Factory scheduling of workflows
4Trident ImplementationBuilt on top of industrial
workflow engine
- Windows Workflow Foundation
- Workflow in a general purpose framework
- Part of Microsofts .NET Framework 3.5
5TridentLogical Architecture
- Domain specific custom activities
- Visual Workflow Designer
- Runtime Services
- Provenance
- Fault Tolerance
- HPC Scheduling Service
- Monitoring Service
- Registry
- Runtime Admin Tools
- Community Site
6Activities An Extensible Approach
Domain-Specific Workflow Packages
Custom Activity Libraries
Base Activity Library
Rosetta net
Biology
Out-of-Box Activities
CRM
Oceanography
- Create/Extend/ Compose activities
- Read from sensors,
- Data pipelines, etc.
- First-class citizens
- OOB activities, workflow types,
-
- General-purpose
- Basic workflow
- constructs
- Domain-specific activities
- Domain specific workflow packages - oceanography
7Trident Workflow DesignerVisually compose,
search and archive (share)
8Workflow Execution Provenance
Scientists routinely record the provenance of
bench experiments in lab notebooks this is
essential for computational experiments as well.
- For a workflow management system, provenance
identifies what activities were executed,
parameters supplied at runtime, data passed
between activities, intermediate results
generated, etc - Explain how a workflow result was created
sufficient to establish trust - Provides a replication recipe
- Guide development of future experiments
9Provenance in Trident
Enactment engine documents all steps linking
original inputs with final result so execution
can be verified, reproduced or rerun provenance
is a first class data product in Trident
Provenance capture is automatic and
transparent Will persist provenance data for a
fixed period of time. Supports multiple levels of
representation. Storage provided by underlying
system Interface to query and reason over
provenance data. Efficient storage representation
and query performance.
10Applications and Scientists need a Curated
Registry of ServicesJust having a workflow
system isnt enoughand its not just about
workflows...
Trident Registry
Note Registry, not repositoryServices are
hosted elsewhere
11A Curated Registry of Services
12(and) Registry of Data Products
13 (and) Registry of Provenance