Title: Workshop on Data Derivation
1Workshop on Data Derivation ProvenanceWorkflow
Session
- Rick Cavanaugh, Rick Hull, Miron Livny, Mike
Wilde - 17 October 2002
- Chicago
2Workflow Session Agenda
- 20 min - Overview 5 - workflow research - Rick
Hull - Â 5 - workflow needs - Rick Cavanaugh
- 5 provenance -gt workflow - Mike Wilde 5
- workflow and the grid - Miron Livny - 70 min Discussion
- What can provenance learn from workflow?
3Rick Hull Workflow Overview
4Workflow Essentials
- Terminology
- Workflow model the building blocks for workflow
schemas - Workflow schema (typically) a graph, showing
control flow and processing steps - Workflow enactment an instance/occurrence/run
through a workflow schema - Key abstraction focus on flow, not processing
steps - Interpret the workflow schema
- What you see is what you get unlike query
optimization - Few workflow schemas many workflow enactments
- Many workflow enactments are pushed through same
schema - Main issues
- Systems Persistence recovery logging audits
pushing work items to human agents finishing
workflow steps on time - Formal analysis of schemas, e.g., reachability,
termination - Unresolved exceptions schema evolution
5Workflow Models
- Variety of commercial and academic workflow
models - Process flow x-flowcharts, petri-nets, WSFL,
- Finite State Automaton, state charts
- Action Workflow based on model of
conversations - Wil van der Aalst patterns in workflow models
- Focus on process flow models, covering all major
commercial workflow systems - About 24 patterns identified (no surprises)
- flowchart constructs, parallelism, variants on
choice (e.g., do m out of the n choices - Things not modeled
- The data being processed, the internals of the
process steps - Query language for workflow schemas
- similarity, composition, evolution
- Aggregations of multiple enactments
There is some recent work
6Rick Cavanaugh HEP Application
7Generator
Formator
Simulator
Digitiser
writeESD
writeAOD
writeTAG
ODBMS
Analysis Scripts
Calib. DB
writeESD
writeAOD
writeTAG
8Combined Ntuple
Re-formated
Simulated
Digitised
Ntuple
Generated
OODBMS
...
...
...
...
...
OODBMS
OODBMS
9Combined Ntuples
Re-formated
Simulated
Digitised
Generated
...
...
OODBMS
10Mike Virtual Data Interface
11Virtual Data in the Science Process
12Virtual Data Language XML
(Older version. Latest at http//www.griphyn.org/
workspace/VDS/vdl-1.20/vdl-1.20.png)
13Managing Dependencies
file1
- TR tr1( out a2, in a1 )
- profile hints.exec-pfn "/usr/bin/app1"Â
- argument stdin a1Â
- argument stdout a2
- TR tr2( out a2, in a1 )
- profile hints.exec-pfn "/usr/bin/app2"
- argument stdin a1
- argument stdout a2
- DV x1-gttr1( a2_at_outfile2, a1_at_infile1)
- DV x2-gttr2( a2_at_outfile3, a1_at_infile2)
x1
file2
x2
file3
14Workflow in the Virtual Data Grid
15Miron Running Workflow on the Grid
16Workflow execution on the Grid
17Discussion Topics
- What can provenance learn from workflow?
- There are several workflow models - what is the
right data derivation model? - Expanding the p-schema to handle the temporal
declarions that are provided by workflow system - More Triggers error handling/recovery and
propagation recovery. Tracking manual workflow
(eg, Excel, SAS, ROOT) - Viewpoint focus on the provenance schema object
18Derivation Schema as anew Abstract Data Type
- What are the right building blocks to construct
Derivation Schemas (composite transformations,
derivation dags, )
Workflow Model Workflow Schema
Derivation Model Derivation Schema
- E.g., iteration? Aggregates of derivation
schemas seem important
. . .
vs.
. . .
19Key Questions
- What kinds of information should be included?
- What constructs should be included?
- Notions of equivalence, similarity?
- What operators should be included? (query,
aggregate, compose)