My_Qsub - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

My_Qsub

Description:

Our Intention is to add another functionality to this command such that it can ... (The number of Ambulance jobs submited dont exceed the number of free hosts) ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 16
Provided by: inba2
Category:
Tags: myqsub | dont

less

Transcript and Presenter's Notes

Title: My_Qsub


1
My_Qsub
2
Overriding the systems qsub
  • The system has a qsub command with which we
    submit scripts.
  • Our Intention is to add another functionality to
    this command such that it can handle ambulance
    jobs (jobs that must run uninterrupted when
    submitted)
  • In order to achieve this goal we created a cpp
    file named My_Qsub.cpp.

3
  • This file is compiled to a running binary file
    named qsub and will replace the systems built-in
    qsub command, which resides in the SGE_ROOT/
    bin/glinux
  • In order to keep the original qsub, we rename it
    to copy_qsub.
  • Copy_qsub remains in the directory of SGE_ROOT/
    bin/glinux

4
Algorithm
  • Every call to qsub checks for the flag
    "-ambulance".
  • Must be called in the form 'qsub -ambulance
    scriptName' (else it prompts an error message)
  • If the flag does not appear - we call the
    original qsub (now called "copy_qsub") with the
    given paramaters

5
Algorithm (cont.)
  • If the call was correct ('qsub -ambulance
    scriptName')
  • Looks for a free host (meaning has a free
    Ambulance Q)
  • If Exists - submit the Ambulance job to it
  • Else - Looks for the least busy host and submits
    to its. (least amount
    of waiting jobs on the Q and the most
    powerful among those with equal
    num of jobs waits)

6
Preconditions
  • A file name QconfiurationFile resides in the
    directory SGE_ROOT/bin/glinux on the computer
    from which we submit. QconfiurationFile
    contains in each row (hostName Qname)
  • A file named copy_qsub (the copy of the
    original qsub) will be kept in
    SGE_ROOT/bin/glinux .
  • each host has one ambulance Q - with 1 slot
    (begins disabled!)

7
Preconditions (cont.)
  • before an ambulance job commence running
  • it suspends all regular Qs on target (to be run
    at) host (by default they are unsuspended)
  • Enables the ambulance Q - (by default it is
    disabled)
  • NOTE
  • Suspended Q - All jobs in the Q get suspened and
    Q dont recieve new job.
  • Disabled Q - Jobs in Q continues to run, and Q
    dont recieve new job.

8
PostCondition
  • When A job finishes running normally - it checks
    if more jobs await the ambulance Q.
  • If yes - it enables the ambulance Q.
  • else - it disables the ambulance Q and unsuspends
    all the host's regular Qs.

9
Use-Cases
  • PreliminaryThe SGE allows priorities but does
    not directly supports The so called
    "Ambulance job" that upon entrance must run
    with no other disturbance (unless another
    ambulance is running already - than it waits)

10
The Cases
  • Simple-Case(1) No Ambulance runs in the system
    - and now we enter an ambulance job.
  • Simple-Case(2) The number of Ambulance jobs
    submited dont exceed the number of free hosts.
  • Complex-Case We submit one or more ambulance
    job excceding the number of free Ambulance Qs.

11
Simple-Case(1)(No Ambulance runs in the system
- and now we enter an ambulance job)
  • It will find the strongest host.
  • It will suspend the regular Qs on that host.
  • It will submit itself to that host.
  • Upon enetrence to the Q - disables the Q it runs
    on. (It can now either exit or get deleted)

12
Simple-Case 1 (cont.)
  • If deleted (the run over command qdel is used)
  • If Exits (normally)
  • If no one waits to the Ambulance Qwe unsuspends
    all the regular Q's in the host (that got
    suspended when the ambulance began)
  • If there are ambulance jobs waitingwe enable
    the Ambulance Q (because it got disabled when an
    ambulance job started running)

13
Simple-Case(2) (The number of Ambulance jobs
submited dont exceed the number of free hosts)
  • In that case we have multiple occurences of
    simple-case(1)

14
Complex-Case(We submit more than one ambulance
job excceding the number of free AMbulance Qs)
  • We seek the least busy host (host with minimal
    waiting ambulance-jobs, if there are several such
    hosts we choose the strongest decided by the
    order of appearance in QconfiurationFile)
  • Cases of deleted waiting/running jobs, are dealt
    in our version of qdel.

15
Another feature
  • Since often can arise a situation in which a host
    crashes we must avoid sending Ambulance jobs to
    it (because we bypass the scheduler) .
  • This was dealt also in My_Qsub.cpp
Write a Comment
User Comments (0)
About PowerShow.com