Grid Scheduling Overview - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Grid Scheduling Overview

Description:

Jennifer M. Schopf. Argonne National Lab. March 10, 2003. 6/20/09. Scheduling Overview, Jennifer M. Schopf. 2. What is Grid Scheduling? ... – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 48
Provided by: jennife62
Category:

less

Transcript and Presenter's Notes

Title: Grid Scheduling Overview


1
Grid Scheduling Overview
  • Jennifer M. Schopf
  • Argonne National Lab
  • March 10, 2003

2
What is Grid Scheduling?
  • Process of making scheduling decisions involving
    resources over multiple administrative domains
  • May be one machine at one site, but choices are
    distributed
  • May be multiple machines at multiple sites
  • Also called superscheduling, meta-scheduling,
    scheduling at the Grid level, etc.

3
A Grid Scheduler is Not a LocalResource
Management System (LRMS)
  • No ownership or control over local resources
  • Jobs get submitted to LRMS as user
  • No control or often even information about other
    Grid jobs

4
Grid Scheduling Involves
  • Acquiring information about jobs and resources
    generally inaccurate and out of date
  • Matching jobs to resources
  • Managing data
  • Monitoring progress of the job
  • And more!

5
This Talk
  • Overview of a few systems
  • Condor, PBS, KB scheduler, AppLes, Maui/Silver
  • Grid Scheduling Architecture from GGF
  • Summary

6
Condor
  • Condor is a High-throughput computing (HTC)
    approach
  • deliver large amounts of processing capacity over
    long periods of time
  • User submits a condor job, system finds an
    available machine
  • When machine becomes busy, job is checkpointed
    and migrated

7
ClassAds and MatchMaking
  • ClassAds
  • a way to describe jobs and resources
  • similar to a newspapers classified ads
  • Machines use a resource offer ad to advertise
    resource properties
  • both static and dynamic
  • available RAM memory, CPU type, CPU speed,
    virtual memory size, physical location, and
    current load average
  • User specifies a resource request ad
  • Defines both the required and a desired set of
    properties of the resource to run the job
  • Both ads have ranking functions

8
Portable Batch System PBS
  • Workload management solution for HPC systems and
    Linux clusters
  • Originally designed for NASA because existing
    LRMS were inadequate for modern
    parallel/distributed computers and clusters
  • Provides
  • Extraction of scheduling policy into a single
    separable, completely customizable module.
  • Additional controls over initiating or scheduling
    execution of batch jobs
  • Allow routing of those jobs between different
    hosts.
  • Site can define and implement individual policies

9
PBS Components
  • Typical interaction - client-server model
  • clients making (batch) requests to servers
  • servers performing work on behalf of the clients
  • Server manages a number of different objects
  • queues or jobs, each object consisting of a
    number of data items or attributes.
  • Server provides batch services
  • creating, routing, executing, modifying, or
    deleting jobs for batch clients

10
KB Scheduler
  • Led by Jarek Nabrzyski, Poznan Supercomputing and
    Networking Center, part of GridLab
  • Decisions using an AI knowledge-based (KB)
    multi-criteria job searching technique
  • Information about time costs, user preferences,
    load balancing, memory usage, cache usage
  • Uses a set of AI expert techniques, each with its
    own strengths and weaknesses
  • Implemented on top of the Globus Toolkit with
    some added high-level services
  • Advanced reservations
  • Extensions to the standard information providers

11
AppLeS
  • AppLeS Application Level Scheduling project
  • Berman, Wolski et al., UCSD
  • High-performance scheduler targeted to multi-user
    distributed heterogeneous environments
  • Each Grid app scheduled by its own AppLeS
  • determines and actuates a schedule
  • Schedule customized for the individual app and
    the target computational Grids at execution time.
  • Everything in the system is evaluated in terms of
    its impact on the application
  • Resources in the system are evaluated in terms of
    predicted capacities at execution time, as well
    as their potential for satisfying application
    resource requirements

12
AppLeS Parameter Sweep Template
  • Parameter sweep applications
  • Structured as sets of experiments
  • Each executed with a distinct set of parameters
  • Each experiment is independent
  • Often structured so that distinct experiments
    share large input files, and produce large output
    files
  • To achieve efficiency, shared data files must be
    co-located with experiments
  • Schedule must adapt to the dynamically
    fluctuating performance of the shared resources.

13
AppLeS Architecture
  • Application specific info from the end user or
    application developer
  • Via the Heterogeneous Application Template
  • User preferences as well
  • Dynamic system information provided by the
    Network Weather Service
  • All data used by the AppLeS Coordinator to
    determine a potentially performance-efficient
    application schedule
  • Coordinator then works with the appropriate
    resource management systems to implement the
    schedule on the relevant resources

14
Silver/Maui
  • Grid-level module, Silver, interacts with local
    version of the Maui scheduler on each resource
  • User submits a PBS-style job submission locally
  • This is translated into a meta-job submission and
    sent to the Silver module
  • Queuing policies of the local Maui installations
    determined individually
  • Simulation framework as well

15
System Overview
  • Each addresses a slightly different context
  • There are many many more I dont have time to
    discuss here
  • None of them does everything you want them to do
    in a grid environment
  • So what is the right approach?

16
This Talk
  • Overview of a few systems
  • Condor, PBS, KB scheduler, AppLes, Maui/Silver
  • Grid Scheduling Architecture from GGF
  • Summary

17
10 Actions for Superscheduling
  • 10 Actions for Superscheduling
  • GGF Scheduling working group
  • Initial discussions June 2000-July 2001
  • Q How does a user schedule on the grid?
  • Resulted in GGF CI.5
  • In late 2002 this was updated and sent to special
    issue on Grid computing
  • Includes examples from current approaches
  • www.mcs.anl.gov/jms/Pubs

18
Context
  • User is currently the most common Grid
    Scheduler
  • Every action defined is currently performed by
    some Grid-level scheduler, but no current
    approach does them all
  • Note We did not consider error conditions

19
In a nutshell...
20
Ordering was approximate(from GGF doc)
  • We use the word step and a
  • numbering system for easy reference.
  • This does not imply that these actions
  • are actually performed in this order, or
  • that they all MUST occur in every
  • system. In general, dont pay too much
  • attention to the numbering. Some of
  • the steps may be interactive, recursive,
  • repeated, or just plain ignored.

21
Phase One Resource Discovery
Phase One-Resource Discovery

22
1. Authorization Filtering
  • Authentication
  • Establishing identity (who are you?)
  • Authorization
  • Establishing permissions (what can you do?)
  • Where do you have an account?
  • Not a new problem, only one made more complicated
    by more extensive access to systems

23
Authorization Filtering Cont.
  • User
  • List in a drawer
  • Ideally
  • A wallet of credentials, smart enough to
    remember my username at different sites as well

24
Todays Systems
  • KB scheduler uses Globus MDS for this info
  • PBS allows administrators to set up specific
    queues for authorization groups by setting up
    execution lists and attaching this information to
    a specific queue
  • Condor does not require an account (login) on
    machines where it runs a job

25
2. Application Definition
  • Minimal set of job requirements to further filter
    number of available resources
  • Can be static data
  • OS type, hardware a binary is available for, an
    architecture it is best suited for
  • Can be dynamic info
  • Amount of RAM, connectivity, space in /tmp

26
Application definitions
  • User
  • Generally user defined
  • Often inaccurate, incomplete
  • Ideally
  • Smart compilers or other tools to automatically
    generate information about application
    requirements and runtimes

27
Todays systems
  • User defined at the command line
  • Eg. RSL in Globus
  • Information in Condor ClassAds, also user defined

28
3. Minimum Requirement Filtering
  • Use static data to limit the search space
  • Used to cut down dynamic queries needed
  • Can be combined with dynamic search (4)

29
Minimum Requirement Filtering
  • User
  • I know I need Linux, I dont consider others
  • Ideally
  • Automatic, part of dynamic search
  • No reason to limit search this way

30
Todays Systems
  • Most do this as part of Dynamic Filtering, 4
  • PBS
  • First pass to sort available jobs according to
    some administrator-defined policy
  • Second-tier evaluation using dynamic filters for
    which those that should be run soonest (step 4)
  • High-level filter is then used on the most
    deserving job to determine which of the
    available resources can be used based on static
    criteria.
  • Maui/Silver scheduler
  • Static-level information filtering at Silver
    level and information at the local level using
    the Maui
  • Condor
  • initial matching followed by a feasibility
    evaluation upon claiming the actual resources

31
Phase One Resource Discovery
Phase One-Resource Discovery
Phase Two - System Selection

32
4. Information Gathering
  • Dynamic searches to match resources with
    application requirements
  • What information is available and how the user
    can get access to it
  • Generally involves using some kind of Grid
    Information System
  • Scalability issues
  • More queries slows down system
  • Consistency concerns
  • No such thing as a global view of the system

33
Information Gathering
  • User
  • Might use the Globus MDS or a portal information
    service like HotPage, or they might just know
  • Ideally
  • Seamless interface to global monitoring and
    prediction
  • Todays systems
  • KB interacts with MDS
  • PBS has its own internally
  • Condor uses some internal monitors

34
5. System Selection
  • Matching between resources and application
    information
  • Users
  • Best estimate
  • Ideally
  • Perfect matches based on current information,
    using variance information and other predictions

35
Todays Systems
  • Condor Matchmaking
  • Silver/Maui
  • submits a full job descr. to each local scheduler
  • They returns feasible time ranges, including
    estimated execution times, cost, and resources
    used
  • Higher-level Silver daemon does range-based
    calculus to select the best resource
  • KB scheduler
  • Multi-objective schedule evaluation based on a
    set of AI techniques
  • Avoids some of the more common scalability issues
    with more statistical-based approaches

36
Phase One Resource Discovery
Phase One-Resource Discovery
Phase Three- Job Execution

37
6. Advance Reservation (Optional)
  • Reserve resources in a guaranteed way
  • Users
  • Call up sys admins and friends (call, like, on
    the phone)
  • Ideally
  • Automatically done when you submit a job based on
    user requirements
  • Current systems
  • Enabled in PBSPro and Maui
  • Service Level agreements in new GRAM-2 protocol

38
7. Job Submission
  • Run the job on the resources selected
  • User
  • Qsub
  • Ideally
  • Make it so
  • Current systems
  • Each has its own API

39
8. Preparation tasks(11. Clean-up tasks)
  • File transfers, directory set ups
  • Users
  • Scp, ftp, mkdir, GridFTP
  • Ideally
  • Automatically done as part of job submission

40
Current systems
  • Condor
  • DagMan can do file staging as a separate task
  • PBS
  • allows file staging using SCP or GridFTP

41
9. Monitoring Progress
  • How is my job doing?
  • Should I move it somewhere else?
  • Users
  • qstat
  • Moving is hard to do, so generally not done
  • Ideally
  • System takes care of it based on intuitive
    knowledge of user requirements, and good
    prediction techniques
  • Current Systems
  • Every LRMS has a stat command

42
Prophesy by Taylor at TAMU
  • 3 major components
  • a relational database to record performance data,
    system features and application details
  • an application analysis component that
    automatically instruments applications and
    generates control flow information
  • data analysis component that facilitates the
    development of performance models, predictions
    and trends.
  • Used to develop models based upon significant
    data and predict the performance on a different
    system

43
Smith at NASA
  • Basic AI matching techniques on previous runtimes
  • Matches to most suitable past approaches
  • Using these run-time predictors results in lower
    mean wait times for the workloads with higher
    offered loads

44
Lee and Schopf, ANL
  • Log past runtimes of applications
  • Log environmental data
  • CPU load information
  • NWS bandwidth data
  • Use regression techniques to predict runtimes
    without any application models

45
Summary
46
Places where Grid Scheduling Work is Discussed
  • Conferences
  • Job Scheduling Workshop, SuperComputing, HPDC,
    IPDPS, EuroPar
  • Journals
  • JPDC, TPDS, special issues on Grid computing in
    various journals
  • Global Grid Forum Scheduling Area
  • Upcoming book edited by Nabrzyski, Schopf and
    Weglarz

47
For more information
  • Jennifer Schopf
  • jms_at_mcs.anl.gov
  • Current 10 actions document
  • www.mcs.anl.gov/jms/Pubs
  • All references available in that document
Write a Comment
User Comments (0)
About PowerShow.com