Scheduling in HPC Resource Management System: Queuing vs' Planning - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Scheduling in HPC Resource Management System: Queuing vs' Planning

Description:

(Condor) Application. Grid Middleware. Globus. vgES. Co-allocation. QoS ... Run as a background job to steal resources in a space sharing system (like condor) ... – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 18

Provided by: Jer547

Category:

more less

Transcript and Presenter's Notes

Title: Scheduling in HPC Resource Management System: Queuing vs' Planning

1
Scheduling in HPC Resource Management System
Queuing vs. Planning

Matthias Hovestadt, Odej Kao, Alex Keller, and
Achim Streit
2003 Job Scheduling Strategies for Parallel
Processing (JSSPP) Workshop
Jerry Chou 8/29/2005

2
Outline

Background
Queuing and Planning Systems
Advanced Planning Functions
Example Computing Center Software
Conclusion
Discussion

3
Background

HPC systems are operated by resource management
systems (RMS) based on the queuing approach
PBS, SGE, Loveleveler, etc
Grid middleware emerges between resource
management systems and applications
Globus, vgES, etc
High level function (co-allocation) needs
features from RMS
Advanced reservation, quality of service
It is hard to realize those features with RMS
because it only consider present resource usage
gt This paper purpose planning system to close
the gap

4
Big Picture
Application
Co-allocation
Grid Middleware
Globus
vgES
Advanced Reservation
QoS
RMS (PBS)
RMS (Loadleveler)
RMS (SGE)
RMS (Condor)
Resources
5
Queuing and Planning Systems

Queuing Systems
Planning Systems
Queuing vs. Planning Systems

6
Queuing Systems

Queues have different limits on the resource
requests
Number of resources requested
Execution time
Interactive/Batch jobs
Jobs are sorted by schedule policy in the queue
The highest priority request is the queue head
If more than one queue can be started, further
criteria are needed, such as Queue priority
If no queue head can be started, the idle
resources may be utilized with backfilling

7
Planning Systems - Replanning

Requested
Start time
Estimated run time
When
A new request is submitted
A running request ends before its estimated end
time
How
Delete all non-reservations from schedule
Sort non-reservations according to schedule
policy
Arrange reservations into schedule
Insert non-reservations in the schedule at the
earliest possible start time

8
Queuing vs. Planning Systems
9
Advanced Planning Functions

Requesting Resources
Dynamic Aspects
Service Level Agreements

10
Requesting Resources

Diffuse requests
Give a range need 32128 CPUs
Let RMS optimizes need as much nodes as
possible
Negotiation

11
Dynamic Aspects

Variable Reservations
Make a reservation ASAP
Different from reserved jobs
No fix start time
Different from non-reserved jobs
Never planed later than its first planned start
time
Resource Reclaiming
Replace requested resources at run time
Automatic Duration Extension
Extend the runtime of jobs while they are running
How long can it be extended
Hoe many time it can be extended

12
Dynamic Aspects (Cont.)

Automatic Restart
It can utilize short time slots in the scheduling
Space sharing Cycle Stealing
Run as a background job to steal resources in a
space sharing system (like condor)
Deployment Servers
RMS plans both the requested resources and the
time to reconfigure the hardware

13
Service Level Agreements (SLA)

SLA has to be considered not only in the
scheduling process but also during the runtime
At runtime the scheduler is not responsible for
measuring the fulfillment of the SLA, but to
provide all granted resources

14
Computing Center Software (CCS)

Architecture
User Interface (UI) provide single access point
to one or more systems
Access Manager (AM) manages the user interface
and is responsible for authentication,
authorization and accounting
Planning Manager (PM) plans the user requests
onto the machine
Machine Manager (MM) provides machine specific
feature
Island Manager (IM) provide CCS internal
services and watchdog facilities to keep the
island in a stable condition

15
Process Flow
User specify the expected duration of their
requests
Requests

PM re-plans the schedule
Fix-time Request request reserves resource for a
given time
Var-time Request can move to a earlier time slot
when replanning

Schedule
MM maps schedule to machines
Verify if a schedule can be realized with the
available hardware.
No
Yes
Find alternative time Send conflict list to PM
Conflict List
Done
No
Can PM accept?
Yes
16
Conclusion

Classify and compare queuing systems with
planning systems
Present possible advanced planning functionality
The aim of the paper is to show the benefit of
planning systems for managing HPC machines

17
Discussion

Does planning system solve all the problem?
What if most of jobs want to run ASAP
What if runtime is not estimated precisely
Whats the performance and utilization comparison
between queuing systems and planning systems
If you are resource provider, will you use it?
What feature could be provided by vgES?
Diffuse requests
Resource reclaiming
Variable reservation
Negotiation