Efficient Hierarchical SelfScheduling for MPI Applications Executing in Computational Grids presentation

About This Presentation

Transcript and Presenter's Notes

Title: Efficient Hierarchical SelfScheduling for MPI Applications Executing in Computational Grids

1
Efficient Hierarchical Self-Scheduling for MPI
Applications Executing in Computational
Grids Aline Nascimento, Alexandre Sena,
Cristina Boeres and Vinod Rebello Instituto de
Computação Universidade Federal Fluminense,
Niterói (RJ), Brazil http//easygrid.ic.uff.br e
-mail depaula,asena,boeres,vinod_at_ic.uff.br
2
Talk Outline

Introduction
Concepts and Related Work
The EasyGrid Application Management System
Hybrid Scheduling
Experimental Analysis
Conclusions and Future Work

3
Introduction

Grid computing has become increasingly widespread
around the world
Growth in popularity means a larger number of
applications will compete for limited resources
Efficient utilisation of the grid infrastructure
will be essential to achieve good performance
Grid infrastructure is typically
Composed of diverse heterogeneous computational
resources interconnected by networks of varying
capacities
Shared, executing both grid and local user
applications
Dynamic, resources may enter and leave without
prior notice

Hard to develop efficient grid management systems
4
Introduction

Much research is being invested in the
development of specialised middleware responsible
for
Resource discovery and controlling access
The efficient and successful execution of
applications on the available resources
Three implementation philosophies can be defined
Resource management systems (RMS)
User management systems (UMS)
Application management systems (AMS)

5
Concepts

Resource Management System
Adopts a system-centric viewpoint
Centralised in order to manage a number of
applications simultaneously
Aims to maximise system utilisation considering
just the resource requirements of the
applications not their internal characteristics
Scheduling middleware installed on single central
server, monitoring middleware on grid nodes
User Management System
Adopts an application-centric viewpoint
One per application, collectively decentralised
Utilises the grid in accordance with resource
availability and applications topology (e.g. Bag
of Tasks)
Installed on every machine from which an
application can be launched

6
Concepts

Application Management System
Also adopts an application-centric viewpoint
One per application
Utilises the grid in accordance with both the
available resources and the applications
characteristics
Is embedded into the application, offering better
portability
Works in conjunction with simplified RMS
Transforms applications into system-aware
versions
Each application can be made aware of their
computational requirements and adjust itself
according to the grid environment

7
Concepts

The EasyGrid AMS is
Hierarchically distributed within an application
Decentralised amongst various applications
Application specific
Designed for MPI applications
Automatically embedded into the application
EasyGrid simplifies the process of grid enabling
existing MPI applications
The grid resources just need to offer core
middleware services and a MPI communication
library

8
Objectives

This paper focuses specifically on the problem of
scheduling processes within the EasyGrid AMS
The AMS scheduling policies are application
specific
This paper highlights the scheduling features
through the execution of bag of tasks
applications
The main goals are
To show the viability of the proposed scheduling
strategy in the context of an Application
Management System
To quantify the quality of the results obtainable

9
The EasyGrid AMS Middleware

The EasyGrid framework is an AMS middleware for
MPI implementations with dynamic process creation

10
The EasyGrid AMS Architecture

A three level hierarchical management system

Site 4
SM
SM
GM
Site 1
HM
HM
HM
HM
HM
Computational Grid
Site 3
SM
Site 2
SM
HM
HM
HM
HM
11
The EasyGrid Portal

The EasyGrid Scheduling Portal is responsible for
Choosing the appropriate scheduling and fault
tolerance policies for the application
Determining an initial process allocation
Compiling the system aware application
Managing the users grid proxy
Creating the MPI environment (including
transferring files)
Providing fault tolerance for the GM process

Acts like a simplified resource management system
12
The EasyGrid AMS

GM creates one SM per site
Each SM spawns a HM on each remaining resource at
their respective site
The application processes will be created
dynamically according to the local policy of the
HM
Processes are created with unique communicators
Gives rise to the hierarchical AMS structure
Communication can only take place between parent
and child processes
HMs, SMs and GM route messages between
application processes

13
Process Scheduling

Scheduling is made complex due to the dynamic and
heterogeneous characteristics of grids
Predicting the processing power and communication
bandwidth available to an application is
difficult
Static schedulers
Estimates assumed a priori may be quite different
at runtime
More sophisticated heuristics can be employed at
compile time
Dynamic schedulers
Access to accurate runtime system information
Decisions need to be made quickly to minimise
runtime intrusion

Hybrid Schedulers
Combine the advantages by integrating both static
and dynamic schedulers

14
Static Scheduling

The scheduling heuristics require models that
capture the relevant characteristics of both the
application (typically represented by a DAG) and
the target system
To define the parameters of the architectural
model
The Portal executes a MPI modelling program with
the users credentials to determine the current
realistic performance available to this users
application
At start-up, application processes will be
allocated to the resources in accordance with the
static schedule

15
Dynamic Scheduling

Modifying the initial allocation at run time is
essential, given the difficulty of
Extracting precise information with regard to the
applications characteristics
Predicting the performance available from shared
grid resources
The current version only reschedules processes
which have yet to be created
The dynamic schedulers are associated with each
of the management processes distributed in the 3
levels of the AMS hierarchy

16
AMS Hierarchical Schedulers
Associated with the GM
Associated with a SM
Associated with a HM
17
Dynamic Scheduling

The appropriate scheduling policies depend on
The class of the application and the users
objectives
Different policies may be used in different
layers of the hierarchy and even within the same
layer
The dynamic schedulers collectively
Estimate the remaining execution time on each
resource
Verify if the allocation needs to be adjusted
If necessary, activate the rescheduling mechanism
A rescheduling mechanism characterises a
scheduling event

18
Host Dynamic Scheduler

HDS determines both the order and the instant
that an application process should be created on
the host
Possible scheduling polices to determine process
sequence include
The order specified by the static scheduler
Data flow selects any ready task
A second policy is necessary to indicate when the
selected process may execute
The optimal number of application process that
should execute concurrently depends on their I/O
characteristics
Often influenced by local usage restrictions

19
Host Dynamic Scheduler - HDS

When an application process terminates on a
resource, the monitor makes available to the HDS
the processs
wall clock time
CPU execution time

together with the heterogeneity factor

Both cp and ert are added to the monitoring
message and sent to the Site Manager

SDS
HDS
HDS
HDS
20
Site Dynamic Scheduler - SDS

On receiving the and the from each
resource in the site, the SDS calculates
The ideal makespan for the remaining processes in
the site
The site imbalance index
If the site imbalance index is above a predefined
threshold, a scheduling event at the SDS is
triggered

SDS requests a percentage of the remaining
workload from the most overloaded host

HDS send tasks to be rescheduled to the SDS

SDS

SDS distributes the tasks amongst the under
loaded hosts

HDS receives the request and decides which tasks
to release

HDS
HDS
HDS
21
Site Dynamic Scheduler

When not executing a scheduling event, SDS
calculates
The average estimated remaining time of the site
The sum of computational power of site resources

GDS
SDS
SDS
SDS
22
Global Dynamic Scheduler - GDS

On receiving the and the from each site, GDS
calculates
The ideal makespan for the whole application
The system imbalance index

GDS

GDS distributes the tasks between the under
loaded sites
Each under loaded site reschedules the received
tasks amongst their hosts

GDS requests to the most overloaded site a
percentage of its remaining workload
SDS receives the request and forwards it to its
HDSs

SDS waits for each HDS to answer and sends the
list of tasks to the GDS

SDS
SDS

HDS
HDS
HDS
HDS
HDS
23
Experimental Analysis

These scheduling policies were designed for BoT
applications such as parameter sweep (PSwp)
PSwp can be represented by simple fork-join DAGs
The scheduling policies for this class of DAG can
be seen as load balancing strategies
Semi-controlled three site grid environment
Pentium IV 2.6 GHz processors with 512 Mb RAM
Running Linux Fedora Core 2, Globus Toolkit 2.4
and LAM/MPI 7.0.6

24
Experimental Analysis
25
Experimental Analysis

Same HDS policies for PSwp1 and PSwp2

The overhead due to the AMS is very low, not
exceeding 2.5

The cost for rescheduling is also small, less
than 1.2

The standard deviation is less than the
granularity of a single task

26
Experimental Analysis

Different HDS policies for PSwp1 and PSwp2
PSwp1 processes execute only when resources are
idle

The interference produced by PSwp1 is less than
0.8

The cost for rescheduling not exceeds 1.4

27
Experimental Analysis

EasyGrid AMS with different scheduling policies

Round robin used by MPI without dynamic
scheduling
Near optimal static scheduling without dynamic
scheduling
Round robin used by MPI dynamic scheduling
Near optimal static scheduling dynamic
scheduling

28
Conclusion

In attempt to improve application performance a
hierarchical hybrid scheduling is employed
The low cost of the hierarchical scheduling
methodology leads to an efficient execution of
BoT MPI applications
Different scheduling policies may be used in
different levels of the scheduling hierarchy
One AMS per application permits that application
specific scheduling policies be used
System awareness permits various applications to
collaborate their scheduling efforts in a
decentralised manner to obtain good system
utilisation

29
Efficient Hierarchical Self-Scheduling for MPI
Applications Executing in Computational
Grids Thanks e-mail depaula,asena,boeres,vino
d_at_ic.uff.br
30
Calculation of the Optimal Value
31
Calculation of the Optimal Value

Together (Pswp1 uses only idle resources)
When Pswp2 finishes, Pswp1
Has already executed 56 tasks in each idle
resource (7 machines)
(56 tasks 7 machines) ? 392 tasks executed
And, remains (1000 392) ? 608 tasks, not yet
executed in the shared resources
Executes 608 tasks in 18 machines (Sites 1, 2)
tasks/machines 608/18 ? 34
? Optimal value (56 34) Duration of the
Task

32
Calculation of the Expected Value

The expected value when the applications are
executing together is based on the actual value
when the application executed alone
Example (same HDS policies)
actual value 57.20 (executing alone in 18
machines)
expected value executing in (7 machines 11
shared machines)
57.20 ? 18 machines
???? ? 12.5 machines (7 5.5)
Expected value (57.20 18)/12.5 82.36

33
Calculation of the Expected Value

The expected value when the applications are
executing together is based on the actual value
when the application executed alone
Example (different HDS policies)
actual value 57.67 (executing alone in 18
machines)
at this moment is considered that Pswp2
finished its execution
So, Pswp1 has already executed 56 tasks in each
idle resource (7 machines) ? 392 tasks executed
And remains (1000 392) ? 608 tasks, not yet
executed in the shared resources
Pswp1 executes 608 tasks 18 machines (Site 1 and
2)
tasks/machines 608/18 ? 34
Expected value (57.67 34) 91.67

34
(No Transcript)
35
(No Transcript)
36
Related Works

Accordingly to Buyya et al., the scheduling
subsystem may be
Centralised
A single scheduler is responsible for grid-wide
decisions
Not scalable
Hierarchical
Scheduling is divided into several levels,
permitting different scheduling policies at each
level
Failure of the top level scheduler, results in
the entire system failure
Decentralised
Various decentralised schedulers communicate with
each other, offering portability and fault
tolerance
Good individual scheduling may not lead to a good
overall performance

37
Related Work

OurGrid
Employs two different schedulers
A UMS job manager, with the responsibility for
allocating application jobs on grid resources
A RMS site manager in charge of enforce
particular site-specific policies
No information about the application is used
GridFlow
Is a RMS which focuses on service-level
scheduling and workflow
Management takes place at 3 levels global, local
and resource

38
Related Work

GrADS
Is a RMS that utilises Autopilot to monitor the
adherence to a performance contract between the
application demands and resource capabilities
If the contract is violated, rescheduler takes
corrective action by either suspending
execution, computing a new schedule, migrating
process and then restarting or by process
swapping
AppleS
Is an AMS where individual scheduling agents that
are embedded into the application perform
adaptive scheduling
Given the users goals, these centralised agents
use applications characteristics and system
information to select viable resources

Write a Comment

User Comments (0)

About PowerShow.com

Efficient Hierarchical SelfScheduling for MPI Applications Executing in Computational Grids PowerPoint PPT Presentation