Title: 10 Actions When SuperScheduling: A Grid Scheduling Architecture
110 Actions When SuperSchedulingA Grid
Scheduling Architecture
- Jennifer Schopf
- Argonne National Laboratory
- jms_at_mcs.anl.gov
- Scheduling Architecture Workshop
- GGF7, Tokyo
- March 5, 2003
210 Actions History
- 10 Actions for Superscheduling
- June 2000-July 2001
- How does a user schedule on the grid?
- Resulted in GGF document GWD-CI.5
- Late 2002 this was updated, sent to special issue
- Includes examples from current approaches
- www.mcs.anl.gov/jms/Pubs
3Context
- Grid scheduling defined as the process of making
scheduling decisions involving resources over
multiple administrative domains - User is currently the most common Grid
Scheduler - The steps defined are currently performed by
other Grid-level schedulers, but none of them by
any one approach
4Context, cont.
- Grid schedulers arent Local Resource Managers
(LRMS) - no ownership or control over resources
- jobs get submitted to LRMS as user
- Grid scheduler doesnt have control or often even
info about job submitted at this level - We did not consider errors
5In a nutshell...
6Ordering is approximate(from GGF doc)
- We use the word step and a numbering system
for easy reference. This does not imply that
these actions are actually performed in this
order, or that they all MUST occur in every
system. In general, dont pay too much attention
to the numbering. Some of the steps may be
interactive, recursive, repeated, or just plain
ignored by a given system.
7Phase One Resource Discovery
Phase One-Resource Discovery
81. Authorization Filtering
- Where do you have an account?
- User
- List in a drawer
- Ideally
- A wallet of credentials, smart enough to
remember my username at different sites as well - Todays systems
- EDG and GridLab scheduler use MDS for this info
92. Application Definition
- User
- Generally user defined
- Often inaccurate, incomplete
- Ideally
- Smart compilers or other tools to automatically
generate information about application
requirements and runtimes - Todays systems
- User defined at the command line info or Condor
ClassAds
103. Minimum Requirement Filtering
- Use static data to limit the search space
- Used to cut down dynamic queries needed
- Can be combined with dynamic search (4)
- User
- I know I need Linux, I dont consider others
- Ideally
- Automatic, part of dynamic search
- Todays systems
- Part of dynamic search (4)
11Phase One Resource Discovery
Phase One-Resource Discovery
Phase Two - System Selection
124. Information Gathering
- Dynamic searches to match resources with
application requirements - User
- Might use the Globus MDS or a portal information
service like HotPage, might just know - Ideally
- Seamless interface to global monitoring
- Todays systems
- EDG interacts with MDS, PBS has its own
internally
135. System Selection
- Matching between resources and application
information - Users
- Best estimate
- Ideally
- Perfect matches based on current information,
using variance information and other predictions - Todays systems
- Condor - matchmaking
- PBS - heuristic algorithms
- Maui/Silver - submit to local sites, evaluate
14Phase One Resource Discovery
Phase One-Resource Discovery
Phase Three- Job Execution
156. Advance Reservation (Optional)
- Reserve resources in a guaranteed way
- Users
- Call up sys admins and friends (call, on the
phone) - Ideally
- Automatically done when you submit a job based on
user requirements - Current systems
- Enabled in PBSPro and Maui
167. Job Submission
- Run the job on the resources selected
- User
- Qsub
- Ideally
- Make it so
- Current systems
- Each has its own API
178. Preparation tasks(11. Clean-up tasks)
- File transfers, directory set ups
- Users
- Scp, ftp, mkdir
- Ideally
- Automatically done as part of job submission
- Current systems
- Condor/DagMan can do file staging
189. Monitoring Progress
- How is my job doing?
- Should I move it somewhere else?
- Users
- qstat
- Moving is hard to do, so generally not done
- Ideally
- System takes care of it based on intuitive
knowledge of user requirements, and good
prediction techniques - Current Systems
- Every LRMS has a stat command
19Summary
20For more information
- Jennifer Schopf
- jms_at_mcs.anl.gov
- Current document
- www.mcs.anl.gov/jms/Pubs/sched.arch.2002.pdf