Condor-G: A Case in Distributed Job Delegation - PowerPoint PPT Presentation

About This Presentation
Title:

Condor-G: A Case in Distributed Job Delegation

Description:

Front-end Condor-G distributes all jobs among several back-end Condor-Gs ... Condor-G Back-end 3. Condor-G Back-end 2. ondor. C. www.cs.wisc.edu/condor. Glide-In ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 25
Provided by: Miron1
Category:

less

Transcript and Presenter's Notes

Title: Condor-G: A Case in Distributed Job Delegation


1
Condor-G A Case in Distributed Job Delegation
2
Job Delegation
  • Transfer of responsibility to schedule and
    execute a job
  • Multiple delegations can form a chain

3
Job Delegation in Condor-G Today
Globus GRAM
Batch System Front-end
Execute Machine
Condor-G
4
Expanding the Model
  • What can we do with new forms of job delegation?
  • Some ideas
  • Mirroring
  • Load-balancing
  • Glide-in schedd
  • Multi-hop grid scheduling

5
Mirroring
  • What it does
  • Jobs mirrored on two Condor-Gs
  • If primary Condor-G crashes, secondary one starts
    running jobs
  • On recovery, primary Condor-G gets job status
    from secondary one
  • Removes Condor-G submit point as single point of
    failure

6
Mirroring Example
Matchmaker
Condor-G 1
Condor-G 2
Execute Machine
7
Mirroring Example
Matchmaker
Condor-G 1
Condor-G 2
Execute Machine
8
Load-Balancing
  • What it does
  • Front-end Condor-G distributes all jobs among
    several back-end Condor-Gs
  • Front-end Condor-G keeps updated job status
  • Improves scalability
  • Maintains single submit point for users

9
Load-Balancing Example
Condor-G Back-end 1
Condor-G Front-end
Condor-G Back-end 3
Condor-G Back-end 2
10
Glide-In Schedd
  • What it does
  • Drop a Condor-G onto the front-end machine of a
    cluster
  • Delegate jobs to the cluster through the glide-in
    schedd
  • Apply cluster-specific policies to jobs

11
Glide-In Schedd Example
Condor-G
Batch System
12
Multi-Hop Grid Scheduling
  • Match a job to a Virtual Organization (VO), then
    to a resource within that VO
  • Easier to schedule jobs across multiple VOs and
    grids

13
Multi-Hop Grid Scheduling Example
Experiment Resource Broker
VO Resource Broker
Experiment Condor-G
VO Condor-G
Globus GRAM
Batch Scheduler
14
Endless Possibilities
  • These new models can be combined with each other
    or with other new models
  • Resulting system can be arbitrarily sophisticated

15
Job Delegation Challenges
  • New complexity introduces new issues and
    exacerbates existing ones
  • A few
  • Transparency
  • Representation
  • Scheduling Control
  • Active Job Control
  • Revocation
  • Error Handling and Debugging

16
Transparency
  • Full information about job should be available to
    user
  • Information from full delegation path
  • No manual tracing across multiple machines
  • Users need to know whats happening with their
    jobs

17
Representation
  • Job state is a vector
  • How best to show this to user
  • Summary
  • Current delegation endpoint
  • Job state at endpoint
  • Full information available if desired
  • Series of nested ClassAds?

18
Scheduling Control
  • Avoid loops in delegation path
  • Give user control of scheduling
  • Allow limiting of delegation path length?
  • Allow user to specify part or all of delegation
    path

19
Active Job Control
  • User may request certain actions
  • hold, suspend, vacate, checkpoint
  • Actions cannot be completed synchronously for
    user
  • Must forward along delegation path
  • User checks completion later

20
Active Job Control (cont)
  • Endpoint systems may not support actions
  • If possible, execute them at furthest point that
    does support them
  • Allow user to apply action in middle of
    delegation path

21
Revocation
  • Leases
  • Lease must be renewed periodically for delegation
    to remain valid
  • Allows revocation during long-term failures
  • What are good values for lease lifetime and
    update interval?

22
Error Handling and Debugging
  • Many more places for things to go horribly wrong
  • Need clear, simple error semantics
  • Logs, logs, logs
  • Have them everywhere

23
Current Status
  • Done
  • Mirroring
  • In Progress
  • Condor-G -gt Condor-G delegation
  • User must specify hops
  • Glide-in schedd
  • Set up by hand

24
Thank You!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com