Scheduling in HPC Resource Management System: Queuing vs' Planning - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Scheduling in HPC Resource Management System: Queuing vs' Planning

Description:

(Condor) Application. Grid Middleware. Globus. vgES. Co-allocation. QoS ... Run as a background job to steal resources in a space sharing system (like condor) ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 18
Provided by: Jer547
Category:

less

Transcript and Presenter's Notes

Title: Scheduling in HPC Resource Management System: Queuing vs' Planning


1
Scheduling in HPC Resource Management System
Queuing vs. Planning
  • Matthias Hovestadt, Odej Kao, Alex Keller, and
    Achim Streit
  • 2003 Job Scheduling Strategies for Parallel
    Processing (JSSPP) Workshop
  • Jerry Chou 8/29/2005

2
Outline
  • Background
  • Queuing and Planning Systems
  • Advanced Planning Functions
  • Example Computing Center Software
  • Conclusion
  • Discussion

3
Background
  • HPC systems are operated by resource management
    systems (RMS) based on the queuing approach
  • PBS, SGE, Loveleveler, etc
  • Grid middleware emerges between resource
    management systems and applications
  • Globus, vgES, etc
  • High level function (co-allocation) needs
    features from RMS
  • Advanced reservation, quality of service
  • It is hard to realize those features with RMS
    because it only consider present resource usage
  • gt This paper purpose planning system to close
    the gap

4
Big Picture
Application
Co-allocation
Grid Middleware
Globus
vgES
Advanced Reservation
QoS
RMS (PBS)
RMS (Loadleveler)
RMS (SGE)
RMS (Condor)
Resources
5
Queuing and Planning Systems
  • Queuing Systems
  • Planning Systems
  • Queuing vs. Planning Systems

6
Queuing Systems
  • Queues have different limits on the resource
    requests
  • Number of resources requested
  • Execution time
  • Interactive/Batch jobs
  • Jobs are sorted by schedule policy in the queue
  • The highest priority request is the queue head
  • If more than one queue can be started, further
    criteria are needed, such as Queue priority
  • If no queue head can be started, the idle
    resources may be utilized with backfilling

7
Planning Systems - Replanning
  • Requested
  • Start time
  • Estimated run time
  • When
  • A new request is submitted
  • A running request ends before its estimated end
    time
  • How
  • Delete all non-reservations from schedule
  • Sort non-reservations according to schedule
    policy
  • Arrange reservations into schedule
  • Insert non-reservations in the schedule at the
    earliest possible start time

8
Queuing vs. Planning Systems
9
Advanced Planning Functions
  • Requesting Resources
  • Dynamic Aspects
  • Service Level Agreements

10
Requesting Resources
  • Diffuse requests
  • Give a range need 32128 CPUs
  • Let RMS optimizes need as much nodes as
    possible
  • Negotiation

11
Dynamic Aspects
  • Variable Reservations
  • Make a reservation ASAP
  • Different from reserved jobs
  • No fix start time
  • Different from non-reserved jobs
  • Never planed later than its first planned start
    time
  • Resource Reclaiming
  • Replace requested resources at run time
  • Automatic Duration Extension
  • Extend the runtime of jobs while they are running
  • How long can it be extended
  • Hoe many time it can be extended

12
Dynamic Aspects (Cont.)
  • Automatic Restart
  • It can utilize short time slots in the scheduling
  • Space sharing Cycle Stealing
  • Run as a background job to steal resources in a
    space sharing system (like condor)
  • Deployment Servers
  • RMS plans both the requested resources and the
    time to reconfigure the hardware

13
Service Level Agreements (SLA)
  • SLA has to be considered not only in the
    scheduling process but also during the runtime
  • At runtime the scheduler is not responsible for
    measuring the fulfillment of the SLA, but to
    provide all granted resources

14
Computing Center Software (CCS)
  • Architecture
  • User Interface (UI) provide single access point
    to one or more systems
  • Access Manager (AM) manages the user interface
    and is responsible for authentication,
    authorization and accounting
  • Planning Manager (PM) plans the user requests
    onto the machine
  • Machine Manager (MM) provides machine specific
    feature
  • Island Manager (IM) provide CCS internal
    services and watchdog facilities to keep the
    island in a stable condition

15
Process Flow
User specify the expected duration of their
requests
Requests
  • PM re-plans the schedule
  • Fix-time Request request reserves resource for a
    given time
  • Var-time Request can move to a earlier time slot
    when replanning

Schedule
MM maps schedule to machines
Verify if a schedule can be realized with the
available hardware.
No
Yes
Find alternative time Send conflict list to PM
Conflict List
Done
No
Can PM accept?
Yes
16
Conclusion
  • Classify and compare queuing systems with
    planning systems
  • Present possible advanced planning functionality
  • The aim of the paper is to show the benefit of
    planning systems for managing HPC machines

17
Discussion
  • Does planning system solve all the problem?
  • What if most of jobs want to run ASAP
  • What if runtime is not estimated precisely
  • Whats the performance and utilization comparison
    between queuing systems and planning systems
  • If you are resource provider, will you use it?
  • What feature could be provided by vgES?
  • Diffuse requests
  • Resource reclaiming
  • Variable reservation
  • Negotiation
Write a Comment
User Comments (0)
About PowerShow.com