Title: Grid Scheduling Overview
1Grid Scheduling Overview
- Jennifer M. Schopf
- Argonne National Lab
- March 10, 2003
2What is Grid Scheduling?
- Process of making scheduling decisions involving
resources over multiple administrative domains - May be one machine at one site, but choices are
distributed - May be multiple machines at multiple sites
- Also called superscheduling, meta-scheduling,
scheduling at the Grid level, etc.
3A Grid Scheduler is Not a LocalResource
Management System (LRMS)
- No ownership or control over local resources
- Jobs get submitted to LRMS as user
- No control or often even information about other
Grid jobs
4Grid Scheduling Involves
- Acquiring information about jobs and resources
generally inaccurate and out of date - Matching jobs to resources
- Managing data
- Monitoring progress of the job
- And more!
5This Talk
- Overview of a few systems
- Condor, PBS, KB scheduler, AppLes, Maui/Silver
- Grid Scheduling Architecture from GGF
- Summary
6Condor
- Condor is a High-throughput computing (HTC)
approach - deliver large amounts of processing capacity over
long periods of time - User submits a condor job, system finds an
available machine - When machine becomes busy, job is checkpointed
and migrated
7ClassAds and MatchMaking
- ClassAds
- a way to describe jobs and resources
- similar to a newspapers classified ads
- Machines use a resource offer ad to advertise
resource properties - both static and dynamic
- available RAM memory, CPU type, CPU speed,
virtual memory size, physical location, and
current load average - User specifies a resource request ad
- Defines both the required and a desired set of
properties of the resource to run the job - Both ads have ranking functions
8Portable Batch System PBS
- Workload management solution for HPC systems and
Linux clusters - Originally designed for NASA because existing
LRMS were inadequate for modern
parallel/distributed computers and clusters - Provides
- Extraction of scheduling policy into a single
separable, completely customizable module. - Additional controls over initiating or scheduling
execution of batch jobs - Allow routing of those jobs between different
hosts. - Site can define and implement individual policies
9PBS Components
- Typical interaction - client-server model
- clients making (batch) requests to servers
- servers performing work on behalf of the clients
- Server manages a number of different objects
- queues or jobs, each object consisting of a
number of data items or attributes. - Server provides batch services
- creating, routing, executing, modifying, or
deleting jobs for batch clients
10KB Scheduler
- Led by Jarek Nabrzyski, Poznan Supercomputing and
Networking Center, part of GridLab - Decisions using an AI knowledge-based (KB)
multi-criteria job searching technique - Information about time costs, user preferences,
load balancing, memory usage, cache usage - Uses a set of AI expert techniques, each with its
own strengths and weaknesses - Implemented on top of the Globus Toolkit with
some added high-level services - Advanced reservations
- Extensions to the standard information providers
11AppLeS
- AppLeS Application Level Scheduling project
- Berman, Wolski et al., UCSD
- High-performance scheduler targeted to multi-user
distributed heterogeneous environments - Each Grid app scheduled by its own AppLeS
- determines and actuates a schedule
- Schedule customized for the individual app and
the target computational Grids at execution time.
- Everything in the system is evaluated in terms of
its impact on the application - Resources in the system are evaluated in terms of
predicted capacities at execution time, as well
as their potential for satisfying application
resource requirements
12AppLeS Parameter Sweep Template
- Parameter sweep applications
- Structured as sets of experiments
- Each executed with a distinct set of parameters
- Each experiment is independent
- Often structured so that distinct experiments
share large input files, and produce large output
files - To achieve efficiency, shared data files must be
co-located with experiments - Schedule must adapt to the dynamically
fluctuating performance of the shared resources.
13AppLeS Architecture
- Application specific info from the end user or
application developer - Via the Heterogeneous Application Template
- User preferences as well
- Dynamic system information provided by the
Network Weather Service - All data used by the AppLeS Coordinator to
determine a potentially performance-efficient
application schedule - Coordinator then works with the appropriate
resource management systems to implement the
schedule on the relevant resources
14Silver/Maui
- Grid-level module, Silver, interacts with local
version of the Maui scheduler on each resource - User submits a PBS-style job submission locally
- This is translated into a meta-job submission and
sent to the Silver module - Queuing policies of the local Maui installations
determined individually - Simulation framework as well
15System Overview
- Each addresses a slightly different context
- There are many many more I dont have time to
discuss here - None of them does everything you want them to do
in a grid environment - So what is the right approach?
16This Talk
- Overview of a few systems
- Condor, PBS, KB scheduler, AppLes, Maui/Silver
- Grid Scheduling Architecture from GGF
- Summary
1710 Actions for Superscheduling
- 10 Actions for Superscheduling
- GGF Scheduling working group
- Initial discussions June 2000-July 2001
- Q How does a user schedule on the grid?
- Resulted in GGF CI.5
- In late 2002 this was updated and sent to special
issue on Grid computing - Includes examples from current approaches
- www.mcs.anl.gov/jms/Pubs
18Context
- User is currently the most common Grid
Scheduler - Every action defined is currently performed by
some Grid-level scheduler, but no current
approach does them all - Note We did not consider error conditions
19In a nutshell...
20Ordering was approximate(from GGF doc)
- We use the word step and a
- numbering system for easy reference.
- This does not imply that these actions
- are actually performed in this order, or
- that they all MUST occur in every
- system. In general, dont pay too much
- attention to the numbering. Some of
- the steps may be interactive, recursive,
- repeated, or just plain ignored.
21Phase One Resource Discovery
Phase One-Resource Discovery
221. Authorization Filtering
- Authentication
- Establishing identity (who are you?)
- Authorization
- Establishing permissions (what can you do?)
- Where do you have an account?
- Not a new problem, only one made more complicated
by more extensive access to systems
23Authorization Filtering Cont.
- User
- List in a drawer
- Ideally
- A wallet of credentials, smart enough to
remember my username at different sites as well
24Todays Systems
- KB scheduler uses Globus MDS for this info
- PBS allows administrators to set up specific
queues for authorization groups by setting up
execution lists and attaching this information to
a specific queue - Condor does not require an account (login) on
machines where it runs a job
252. Application Definition
- Minimal set of job requirements to further filter
number of available resources - Can be static data
- OS type, hardware a binary is available for, an
architecture it is best suited for - Can be dynamic info
- Amount of RAM, connectivity, space in /tmp
26Application definitions
- User
- Generally user defined
- Often inaccurate, incomplete
- Ideally
- Smart compilers or other tools to automatically
generate information about application
requirements and runtimes
27Todays systems
- User defined at the command line
- Eg. RSL in Globus
- Information in Condor ClassAds, also user defined
283. Minimum Requirement Filtering
- Use static data to limit the search space
- Used to cut down dynamic queries needed
- Can be combined with dynamic search (4)
29Minimum Requirement Filtering
- User
- I know I need Linux, I dont consider others
- Ideally
- Automatic, part of dynamic search
- No reason to limit search this way
30Todays Systems
- Most do this as part of Dynamic Filtering, 4
- PBS
- First pass to sort available jobs according to
some administrator-defined policy - Second-tier evaluation using dynamic filters for
which those that should be run soonest (step 4) - High-level filter is then used on the most
deserving job to determine which of the
available resources can be used based on static
criteria. - Maui/Silver scheduler
- Static-level information filtering at Silver
level and information at the local level using
the Maui - Condor
- initial matching followed by a feasibility
evaluation upon claiming the actual resources
31Phase One Resource Discovery
Phase One-Resource Discovery
Phase Two - System Selection
324. Information Gathering
- Dynamic searches to match resources with
application requirements - What information is available and how the user
can get access to it - Generally involves using some kind of Grid
Information System - Scalability issues
- More queries slows down system
- Consistency concerns
- No such thing as a global view of the system
33Information Gathering
- User
- Might use the Globus MDS or a portal information
service like HotPage, or they might just know - Ideally
- Seamless interface to global monitoring and
prediction - Todays systems
- KB interacts with MDS
- PBS has its own internally
- Condor uses some internal monitors
345. System Selection
- Matching between resources and application
information - Users
- Best estimate
- Ideally
- Perfect matches based on current information,
using variance information and other predictions
35Todays Systems
- Condor Matchmaking
- Silver/Maui
- submits a full job descr. to each local scheduler
- They returns feasible time ranges, including
estimated execution times, cost, and resources
used - Higher-level Silver daemon does range-based
calculus to select the best resource - KB scheduler
- Multi-objective schedule evaluation based on a
set of AI techniques - Avoids some of the more common scalability issues
with more statistical-based approaches
36Phase One Resource Discovery
Phase One-Resource Discovery
Phase Three- Job Execution
376. Advance Reservation (Optional)
- Reserve resources in a guaranteed way
- Users
- Call up sys admins and friends (call, like, on
the phone) - Ideally
- Automatically done when you submit a job based on
user requirements - Current systems
- Enabled in PBSPro and Maui
- Service Level agreements in new GRAM-2 protocol
387. Job Submission
- Run the job on the resources selected
- User
- Qsub
- Ideally
- Make it so
- Current systems
- Each has its own API
398. Preparation tasks(11. Clean-up tasks)
- File transfers, directory set ups
- Users
- Scp, ftp, mkdir, GridFTP
- Ideally
- Automatically done as part of job submission
40Current systems
- Condor
- DagMan can do file staging as a separate task
- PBS
- allows file staging using SCP or GridFTP
419. Monitoring Progress
- How is my job doing?
- Should I move it somewhere else?
- Users
- qstat
- Moving is hard to do, so generally not done
- Ideally
- System takes care of it based on intuitive
knowledge of user requirements, and good
prediction techniques - Current Systems
- Every LRMS has a stat command
42Prophesy by Taylor at TAMU
- 3 major components
- a relational database to record performance data,
system features and application details - an application analysis component that
automatically instruments applications and
generates control flow information - data analysis component that facilitates the
development of performance models, predictions
and trends. - Used to develop models based upon significant
data and predict the performance on a different
system
43Smith at NASA
- Basic AI matching techniques on previous runtimes
- Matches to most suitable past approaches
- Using these run-time predictors results in lower
mean wait times for the workloads with higher
offered loads
44Lee and Schopf, ANL
- Log past runtimes of applications
- Log environmental data
- CPU load information
- NWS bandwidth data
- Use regression techniques to predict runtimes
without any application models
45Summary
46Places where Grid Scheduling Work is Discussed
- Conferences
- Job Scheduling Workshop, SuperComputing, HPDC,
IPDPS, EuroPar - Journals
- JPDC, TPDS, special issues on Grid computing in
various journals - Global Grid Forum Scheduling Area
- Upcoming book edited by Nabrzyski, Schopf and
Weglarz
47For more information
- Jennifer Schopf
- jms_at_mcs.anl.gov
- Current 10 actions document
- www.mcs.anl.gov/jms/Pubs
- All references available in that document