Workshop on Data Derivation - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Workshop on Data Derivation

Description:

5 provenance - workflow - Mike Wilde. 5 - workflow and the grid ... flowchart constructs, parallelism, variants on choice (e.g., 'do m out of the n choices' ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 20
Provided by: leel171
Category:

less

Transcript and Presenter's Notes

Title: Workshop on Data Derivation


1
Workshop on Data Derivation ProvenanceWorkflow
Session
  • Rick Cavanaugh, Rick Hull, Miron Livny, Mike
    Wilde
  • 17 October 2002
  • Chicago

2
Workflow Session Agenda
  • 20 min - Overview  5 - workflow research - Rick
    Hull
  •   5 - workflow needs - Rick Cavanaugh
  • 5 provenance -gt workflow - Mike Wilde  5
    - workflow and the grid - Miron Livny
  • 70 min Discussion
  • What can provenance learn from workflow?

3
Rick Hull Workflow Overview
4
Workflow Essentials
  • Terminology
  • Workflow model the building blocks for workflow
    schemas
  • Workflow schema (typically) a graph, showing
    control flow and processing steps
  • Workflow enactment an instance/occurrence/run
    through a workflow schema
  • Key abstraction focus on flow, not processing
    steps
  • Interpret the workflow schema
  • What you see is what you get unlike query
    optimization
  • Few workflow schemas many workflow enactments
  • Many workflow enactments are pushed through same
    schema
  • Main issues
  • Systems Persistence recovery logging audits
    pushing work items to human agents finishing
    workflow steps on time
  • Formal analysis of schemas, e.g., reachability,
    termination
  • Unresolved exceptions schema evolution

5
Workflow Models
  • Variety of commercial and academic workflow
    models
  • Process flow x-flowcharts, petri-nets, WSFL,
  • Finite State Automaton, state charts
  • Action Workflow based on model of
    conversations
  • Wil van der Aalst patterns in workflow models
  • Focus on process flow models, covering all major
    commercial workflow systems
  • About 24 patterns identified (no surprises)
  • flowchart constructs, parallelism, variants on
    choice (e.g., do m out of the n choices
  • Things not modeled
  • The data being processed, the internals of the
    process steps
  • Query language for workflow schemas
  • similarity, composition, evolution
  • Aggregations of multiple enactments

There is some recent work
6
Rick Cavanaugh HEP Application
7
Generator
Formator
Simulator
Digitiser
writeESD
writeAOD
writeTAG
ODBMS
Analysis Scripts
Calib. DB
writeESD
writeAOD
writeTAG
8
Combined Ntuple
Re-formated
Simulated
Digitised
Ntuple
Generated
OODBMS
...
...
...
...
...
OODBMS
OODBMS
9
Combined Ntuples
Re-formated
Simulated
Digitised
Generated
...
...
OODBMS
10
Mike Virtual Data Interface
11
Virtual Data in the Science Process
12
Virtual Data Language XML
(Older version. Latest at http//www.griphyn.org/
workspace/VDS/vdl-1.20/vdl-1.20.png)
13
Managing Dependencies
file1
  • TR tr1( out a2, in a1 )
  • profile hints.exec-pfn "/usr/bin/app1" 
  • argument stdin a1 
  • argument stdout a2
  • TR tr2( out a2, in a1 )
  • profile hints.exec-pfn "/usr/bin/app2"
  • argument stdin a1
  • argument stdout a2
  • DV x1-gttr1( a2_at_outfile2, a1_at_infile1)
  • DV x2-gttr2( a2_at_outfile3, a1_at_infile2)

x1
file2
x2
file3
14
Workflow in the Virtual Data Grid
15
Miron Running Workflow on the Grid
16
Workflow execution on the Grid
17
Discussion Topics
  • What can provenance learn from workflow?
  • There are several workflow models - what is the
    right data derivation model?
  • Expanding the p-schema to handle the temporal
    declarions that are provided by workflow system
  • More Triggers error handling/recovery and
    propagation recovery. Tracking manual workflow
    (eg, Excel, SAS, ROOT)
  • Viewpoint focus on the provenance schema object

18
Derivation Schema as anew Abstract Data Type
  • What are the right building blocks to construct
    Derivation Schemas (composite transformations,
    derivation dags, )

Workflow Model Workflow Schema
Derivation Model Derivation Schema
  • E.g., iteration? Aggregates of derivation
    schemas seem important

. . .
vs.
. . .
19
Key Questions
  • What kinds of information should be included?
  • What constructs should be included?
  • Notions of equivalence, similarity?
  • What operators should be included? (query,
    aggregate, compose)
Write a Comment
User Comments (0)
About PowerShow.com