Title: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18
1Karma Provenance Framework v2Provenance
Challenge Workshop/GGF18
- Yogesh L. Simmhan
- Beth Plale, Dennis Gannon, Srinath Perera
- Indiana University
2Outline
- Architecture of Karma
- Workflow Setup Collecting Provenance
- Provenance Traces
- canonical Challenge Queries
- Suggested Variations
3Provenance Collection Challenges Uses
- Linked Environments for Atmospheric Discovery
(LEAD) project - Weather Severe Storm Prediction Applications
- Provenance on workflow (process) data products
at fine granularity - Dynamic, Long running workflows
- Helps scientists to search for workflows data
products, Track workflow execution, Analyze
mine data products from runs
4Karma Provenance Framework
- Lightweight do not duplicate existing metadata
cataloging effort - myLEAD personal metadata catalog
- ResCat service data registry
- Glue to integrate metadata on data services
with runtime workflow information - Scalability1 500 users, 100s of workflows,
10,000s of data products
1 Performance Evaluation of the Karma
Provenance Framework, Simmhan, Y., et al. IPAW,
2006
5Karma Architecture2
Workflow Engine
Workflow Instance 10 Data Products Consumed
Produced by each Service
Orchestration
Service 2
Service 1
Service 10
Service 9
10C
10P
10P/10C
10P
10C
10P/10C
2 A Framework for Collecting Provenance in
Data-Centric Scientific Workflows, Simmhan, Y.,
et al., Submitted to ICWS Conference, 2006
6Provenance Challenge Workflow
- Applications modeled as web-services
- GFac toolkit creates service for command-line
applications - Service invokes a shell-script wrapper of the
application, passing command-line arguments - Created services automatically instrumented to
generate provenance using Karma client library - Workflow composed as GPEL script
- XBaya Workflow composer GUI
- Central GPEL workflow engine orchestrates
execution
Grid Process Execution Language, an extension of
the Business Process Execution Language (BPEL)
7Provenance Challenge Workflow
8Provenance Traces
- Data Provenance getRecursiveDataProvenance
- What (ID), where (URL), when (Timestamp)
- How (Process, inputs)
9Provenance Traces
- Process Provenance getProcessProvenance
- What (ID), when (Timestamp), who (Invoker)
- State (execution/completion status)
- Input Output data products
10Provenance Traces
- Workflow Trace getWorkflowTrace
- What (ID), when (Timestamp), who (Invoker)
- State (execution/completion status)
- Process provenance of workflow steps
11(No Transcript)
12Provenance Challenge Queries
- ?! Answered by Karma Service API Directly
- ? Answered by Karma Service API,
- with post-processing by client
- ? Answered by access to backend DB (SQL)
- ? Not answered
Query 1 2 3 4 5 6 7 8 9
Result ?! ? ?! ? ? ? ? ? ?
13Provenance Challenge Queries Q1
- Find everything that caused Atlas X Graphic to be
as it is - ?! Answered by Karma Service API Directly
- This is the recursive data provenance of the
Atlas X Graphic file - A call to
- getRecursiveDataProvenance(
- leaduuid1157946992-atlas-x.gif)
- returns this www
14Provenance Challenge Queries Q2
- Find the process that led to Atlas X Graphic,
excluding all prior to softmean - ? Answered by Karma Service API, with
post-processing by client - First call getDataProvenance
- Then recursively get data provenance till
SoftmeanService is seen - Returns this www
1. let dataList 'leaduuid1157946992-atlas-x
.gif' 2. while (dataList ! empty) do //
get data provenance for this level a.
dataProvenance karma.getDataProvenance(dataLis
t0) // print process information
remove data from list b. Print
dataProvenance dataList.delete(0) c. if
(dataProvenance.getProducedBy()
'SoftmeanService') break // found Softmean.
Stop. // get input data used by this data
recurse up the tree d. foreach (inputData in
dataProvenance.getUsingData()) do i.
dataList.add(inputData) 3. End
15Provenance Challenge Q4
- Find all invocations of align_warp ( with
parameter "-m 12") that ran on a Monday - ? Answered by access to backend DB (SQL)
- Use SQL query to get matching invocations
- Call getProcessProvenance to get description of
align_warp - Returns this www
SELECT invokee.workflow_id, invokee.service_id,
invokee.workflow_node_id, invokee.workflow_timeste
p, invoker.workflow_id, invoker.service_id,
invoker.workflow_node_id, invoker.workflow_timeste
p FROM invocation_state_table invocation,
entity_table invokee, entity_table invoker,
notification_table notifications WHERE
invokee.entity_id invocation.invokee_id AND
invoker.entity_id invocation.invoker_id AND
notifications.source_id invocation.invokee_id
AND notifications.notification_type
'ServiceInvoked' AND invokee.service_id
'urnqnamehttp//www.extreme.indiana.edu/karma/ch
allenge06AlignWarpService' AND
notifications.notification_xml LIKE'ltModelMenuNum
bergt12lt/ModelMenuNumbergt AND DayOfWeek(invocatio
n.request_receive_time) 2 // 1Sunday,
2Monday, ...
16Provenance Challenge Q9
- Find all the graphical atlas sets that have
metadata annotation studyModality with values
speech, visual or audio, and return all other
annotations to these files. - ? Not answered
- We do not expect to answer such queries through
the provenance system - We push the provenance information to external
metadata management systems such as MyLEAD, which
can answer such join queries on data product
metadata and provenance
17Variations of Workflow
- Workflows with loops
- Workflows whose structure changes dynamically
- or, as a simpler case, workflows with conditional
branches - Hierarchical composition of workflows
- workflows invoking other workflows
18Variations of Queries
- Find all workflows processes with a
particular execution status completed failed
waiting for input - Show the client view and service view of the
provenance and check for differences
19AcknowledgementsAlek Slominski (GPEL
Engine)Satoshi Shirasuna (XBaya Composer)LEAD
MembersNSF
- Questions
- www.extreme.indiana.edu/karma
20Sample Activities Published
21Karma DB Schema