BPEL4Job: a Fault-handling Design for Job Flow Management - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

BPEL4Job: a Fault-handling Design for Job Flow Management

Description:

BPEL4Job: a Fault-handling Design for Job Flow Management Wei Tan1, Liana Fong2, Norman Bobroff2 1 Dept. Automation, Tsinghua University, Beijing, China – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 21
Provided by: IBMU225
Category:

less

Transcript and Presenter's Notes

Title: BPEL4Job: a Fault-handling Design for Job Flow Management


1
BPEL4Job a Fault-handling Design for Job Flow
Management
  • Wei Tan1, Liana Fong2, Norman Bobroff2
  • 1 Dept. Automation, Tsinghua University, Beijing,
    China
  • 2 IBM T. J. Watson Research Center, Hawthorne,
    USA
  • tanwei_at_mails.tsinghua.edu.cn
  • llfong_at_us.ibm.com, bobroff_at_us.ibm.com

2
Agenda
  • 1 Introduction
  • 2 BPEL4Job a fault-handling design for job flow
    management
  • 3 Integrating fault-handling policies with job
    flow modeling
  • 4 Fault-handling at the flow execution layer
  • 5 Implementation and sample application
  • 6 Conclusion and ongoing future work

3
1 Introduction Motivation
  • Job flow is especially relevant in orchestrating
    batch jobs
  • Enforce job execution sequence
  • Manage job execution trace
  • Handle run-time fault in flow level
  • Various languages systems have been devised
  • DAGMan/Condor, Taverna/myGrid, Job Stream/Tivoli-
    Workload Scheduler, JobCommand/Tivoli-LoadLeveler
  • BPEL-based job flow management is attracting more
    attention
  • Resource and applications are becoming
    service-oriented
  • Requirement to combine business process
    (including human tasks) with back-end batch jobs
  • BPEL as a framework on flow orchestration, data
    manipulation, fault handling, and could be
    extended or enhanced
  • BPEL is supported by industry and open source
    community

4
1 Introduction Challenges
  • The use of BPEL for job flow is not without
    technical challenges
  • Defining a job entity
  • BPEL does not support using JSDL or other job
    specification languages
  • Supporting data flow and dependencies
  • Data staging in/out
  • Incorporating the asynchronous interaction with
    schedulers
  • Usually job scheduler reports job status in an
    asynchronous manner
  • Incorporating fault tolerance and recovery
    strategy in job flow
  • Job flow has special requirement on fault
    handling, like re-try and re-submit
  • Supporting dynamic changes of flow instances
  • In case that flow execution logic could not be
    fully anticipated in-advance.

5
1 Introduction BPEL4Job
  • The goal of BPEL4Job
  • A BPEL-based job flow system with fault-handling
    capability
  • Challenges addressed
  • How to communicate with job schedulers?
  • A generic job proxy to facilitate the
    asynchronous job submission and job status
    notification
  • How to model a job flow with fault-handling
    capability?
  • A policy-based, two-stage approach
  • How to enforce various fault-handling policies at
    run-time?
  • A set of fundamental fault-handling schemes,
    especially, including instance migration between
    flow engines

6
2 BPEL4Job a fault-handling design for job flow
management
  • Flow modeling layer
  • Stage 1 define base flow, job definitions, the
    fault-handling policies.
  • Stage 2, generate expanded flow.
  • Flow execution layer
  • Flow engine
  • Job proxy
  • Fault-handling service
  • Job scheduling layer
  • Job schedulers

7
3 Integrating fault-handling policies with job
flow modeling
  • BPEL4Job considers three kinds of policies
  • Cleanup
  • generate fault report and delete the instance
    data in flow engine.
  • Re-try
  • re-execute the job in the same engine.
  • Re-submit
  • Export flow instance state
  • Restore flow instance in a different engine, such
    that the flow can resume from the failed job
  • More policies could be defined and implemented
    based on the three fundamental policies
  • Rollback, alternate job, etc.

8
3 Integrating fault-handling policies with job
flow modeling
The re-try policy
The re-submit policy
The base flow with policies embedded
9
3 Integrating fault-handling policies with job
flow modeling
Expanded flow
Base flow
The transformation to implement the re-try policy
of Job1
10
4 Fault-handling at the flow execution layer
  • We leverage
  • BPEL fault-handling construct Catch, CatchAll
  • We enhance
  • Specific capabilities to recognize job failures
    and to handle faults according to defined
    policies.
  • Components in this layer
  • The generic job proxy for job submission and job
    status notification
  • The fault-handling service to enforce the
    policies defined in flow modeling layer

11
The generic job proxy
  • Generic job proxy
  • Receives a job submission request.
  • Forwards the request to a scheduler, and start to
    listen for the job state notification from it.
  • For notification indicating job success/failure,
    forwards to flow engine and returns otherwise
    continue listening.

12
Fault-handling schemes in flow execution
13
Flow re-submission and instance migration
  • Extract all the information related to a BPEL
    instance.
  • Re-shape the instance data and migrate it into
    another WPS engine.

14
Implementation
Websphere Integration Developer (WID)
Websphere Process Server (WPS)
Tivoli Dynamic Workload Broker (ITDWB)
15
Sample Montage Job Flow
  • Montage a toolkit for assembling raw astronomy
    images into custom mosaics.
  • Developed by NASA California Institute of
    Technology.
  • The assembling process is usually expressed as a
    job flow.

Generate Image table
Image projection in parallel
raw images
Generate Image table
Generate Mosaic
Transform to jpeg
16
Montage job flow and the re-start policy
Base flow
Expanded flow (partial)
Policy says re-submit from mImgtbl1 when mAdd1
fails
17
Instance migration from saba10 to weitan

(a) Montage initiated failed at saba10
(b) Montage migrated to weitan
(c) Montage re-started and completed at weitan
18
Conclusion
  • BPEL4Job the exploration of using BPEL as a job
    flow language
  • A two-stage approach for job flow modeling with
    fault-handling policies
  • A generic job proxy to facilitate the
    asynchronous nature of job submission and job
    status notification
  • A set of fundamental fault-handling schemes,
    including instance migration between flow engines
  • Future work
  • Support more complicated fault-handling policies
  • Involving Human Task, expressed as business
    rules, etc
  • Apply instance migration technique in
  • Load balance between flow engines
  • Instance migration to newer version

19
Future work
20
Thank you for your attention.
  • Please contact me at
  • Dept. Automation, Tsinghua Univ, Beijing, China
  • http//twtanwei.googlepages.com
  • twtanwei_at_gmail.com
Write a Comment
User Comments (0)
About PowerShow.com