Title: Workflow Management in Grid Computing
1Workflow Management in Grid Computing
- Abu Zafar Abbasi
- PhD Student
- January 30, 2008
Center for Research in Ubiquitous
Computing National University of Computer
Emerging Sciences- FAST, Karachi
2Workflow
- The automation of a business process, in whole
or part, during which documents, information or
tasks are passed from one participant to another
for action, according to a set of procedural
rules - Workflow Management Coalition
- Workflow management system is the software that
does it
3Workflow in a Business Process
4Workflow Overview
5Interfaces/Components of a WFMS
6Why use WorkFlow in Grid?
- Build applications through orchestration of
distributed resources - Utilization of resources that are located in a
particular domain to increase throughput or
reduce execution costs - Execution spanning multiple administrative
domains to obtain specific processing
capabilities - Integration of multiple teams involved in
managing of different parts of the experiment
workflow - thus promoting inter-organizational
collaborations - Workflow concepts now extends to science,
engineering, and industrial process management
also - Ease of use for e-scientists and business people
who lack grid programming and toolkit usage
knowledge
7Grid Workflow
- Grid workflow can be defined as the composition
of grid application services which execute on
heterogeneous and distributed resources in a
well-defined order to accomplish a specific goal. - R. Buyya
- The automation of the processes, which involves
the orchestration of a set of Grid services,
agents and actors that must be combined together
to solve a problem or to define a new service. - Geoffrey Fox GGF 10
8Grid WFMS
9Workflow in OGSA
10Issues
- Lack of central control and ownership
- local policy is subject to change without
informing the users of workflow systems - Undedicated resource sharing
- Unavailability and change of status during wf
execution - Variation in computation and networking
facilities - Computation or data intensive jobs involving
large number of activities - Less human interaction
- Heterogeneity
- QoS Constraints (Time, Cost, Reliability,
Security) - A full-ahead-plan is not always suitable
- Exact location of resources assigned to each task
- Physical files of data, etc
11Taxonomy of Grid Workflows (1/2)
12Taxonomy of Grid Workflows (2/2)
13Grid workflow Survey
14Grid Workflow Survey
- DAGMan was developed to schedule jobs to Condor
system in an order represented by a DAG and to
process them. - Pegasus map and execute complex workflow based on
full-ahead-planning. In Pegasus, a workflow can
be generated from metadata description of the
desired data product using AI-based planning - The Taverna project has developed a tool for the
composition and enactment of bioinformatics
workflow for the life science community. The tool
provides a graphical user interface for the
composition of workflows
15Gridbus Workflow Management System
- Just in-time scheduling allows the decision of
resources allocation to be made dynamically at
the time of the execution of tasks in the
workflow - Takes advantage of various middleware services
such as security, grid resource access, file
movement and replica management services provided
by the Globus middleware, and parametric
application execution provided by the Gridbus
Broker and VO directory service provided by the
Grid Market Directory (GMD)
16Architecture of Gridbus WFE
17Components of Gridbus WFEE
- Workflow submission, workflow language parser,
resource discovery, dispatcher, data movement and
workflow scheduler - Workflow submission accepts workflow enactment
requests from planner level applications. - Workflow language parser converts workflow
description from XML format into Java objects,
Task, Parameter and Data Constraint (workflow
dependency) which can be accessed by workflow
scheduler - Resource discovery is intended to query grid
information services such as Globus MDS, GMD and
replica catalogs, to locate suitable resources
for the tasks - It support different middleware by creating
dispatchers for each middleware to support its
interaction with resources - Data movement system enables data transfer
between grid nodes by using HTTP and GridFTP
protocols - Workflow scheduler is the central component in
WFEE. It interacts with resource discovery to
find suitable grid resources at run time it
locates a task on resources by using the
dispatcher component it controls input data
transference between task execution nodes through
data movement
18GWFEE Workflow Scheduling
- Decentralized Just In-time Scheduling
- Every task has its own scheduler
- Handles processing, resource selection, resource
negotiation, task dispatcher and failure
processing - The lifetimes of TMs, as well as the whole
workflow execution, are controlled by a workflow
coordinator
Architecture
Event driven mechanism
19Thank you
20References
- David Hollingsworth, The Workflow Reference
Model, Workflow Management Coalition, Document
No. TC00-1003, 1995 - Geoffrey Fox Dennis Gannon, Workflow in Grid
Systems, Concurrency and Computation Practice
Experience, Volume 18, Issue 10, John Wiley and
Sons Ltd., August 2006, pp 1009-1019 - Jia Yu Rajkumar Buyya, A Taxonomy of
Scientific Workflow for Grid Computing, SIGMOD
Record, Vol 34, No.3, pp 44-49, September 2005 - Jia Yu and Rajkumar Buyya, "A Novel Architecture
for Realizing Grid Workflow using Tuple
Spaces",The Proceedings of 5th IEEE/ACM
International Workshop on Grid Computing
(GRID'04), USA, November 2004 - gridworkflow.org