Efficient Hierarchical SelfScheduling for MPI Applications Executing in Computational Grids PowerPoint PPT Presentation

presentation player overlay
1 / 38
About This Presentation
Transcript and Presenter's Notes

Title: Efficient Hierarchical SelfScheduling for MPI Applications Executing in Computational Grids


1
Efficient Hierarchical Self-Scheduling for MPI
Applications Executing in Computational
Grids Aline Nascimento, Alexandre Sena,
Cristina Boeres and Vinod Rebello Instituto de
Computação Universidade Federal Fluminense,
Niterói (RJ), Brazil http//easygrid.ic.uff.br e
-mail depaula,asena,boeres,vinod_at_ic.uff.br
2
Talk Outline
  • Introduction
  • Concepts and Related Work
  • The EasyGrid Application Management System
  • Hybrid Scheduling
  • Experimental Analysis
  • Conclusions and Future Work

3
Introduction
  • Grid computing has become increasingly widespread
    around the world
  • Growth in popularity means a larger number of
    applications will compete for limited resources
  • Efficient utilisation of the grid infrastructure
    will be essential to achieve good performance
  • Grid infrastructure is typically
  • Composed of diverse heterogeneous computational
    resources interconnected by networks of varying
    capacities
  • Shared, executing both grid and local user
    applications
  • Dynamic, resources may enter and leave without
    prior notice

Hard to develop efficient grid management systems
4
Introduction
  • Much research is being invested in the
    development of specialised middleware responsible
    for
  • Resource discovery and controlling access
  • The efficient and successful execution of
    applications on the available resources
  • Three implementation philosophies can be defined
  • Resource management systems (RMS)
  • User management systems (UMS)
  • Application management systems (AMS)

5
Concepts
  • Resource Management System
  • Adopts a system-centric viewpoint
  • Centralised in order to manage a number of
    applications simultaneously
  • Aims to maximise system utilisation considering
    just the resource requirements of the
    applications not their internal characteristics
  • Scheduling middleware installed on single central
    server, monitoring middleware on grid nodes
  • User Management System
  • Adopts an application-centric viewpoint
  • One per application, collectively decentralised
  • Utilises the grid in accordance with resource
    availability and applications topology (e.g. Bag
    of Tasks)
  • Installed on every machine from which an
    application can be launched

6
Concepts
  • Application Management System
  • Also adopts an application-centric viewpoint
  • One per application
  • Utilises the grid in accordance with both the
    available resources and the applications
    characteristics
  • Is embedded into the application, offering better
    portability
  • Works in conjunction with simplified RMS
  • Transforms applications into system-aware
    versions
  • Each application can be made aware of their
    computational requirements and adjust itself
    according to the grid environment

7
Concepts
  • The EasyGrid AMS is
  • Hierarchically distributed within an application
  • Decentralised amongst various applications
  • Application specific
  • Designed for MPI applications
  • Automatically embedded into the application
  • EasyGrid simplifies the process of grid enabling
    existing MPI applications
  • The grid resources just need to offer core
    middleware services and a MPI communication
    library

8
Objectives
  • This paper focuses specifically on the problem of
    scheduling processes within the EasyGrid AMS
  • The AMS scheduling policies are application
    specific
  • This paper highlights the scheduling features
    through the execution of bag of tasks
    applications
  • The main goals are
  • To show the viability of the proposed scheduling
    strategy in the context of an Application
    Management System
  • To quantify the quality of the results obtainable

9
The EasyGrid AMS Middleware
  • The EasyGrid framework is an AMS middleware for
    MPI implementations with dynamic process creation

10
The EasyGrid AMS Architecture
  • A three level hierarchical management system

Site 4
SM
SM
GM
Site 1
HM
HM
HM
HM
HM
Computational Grid
Site 3
SM
Site 2
SM
HM
HM
HM
HM
11
The EasyGrid Portal
  • The EasyGrid Scheduling Portal is responsible for
  • Choosing the appropriate scheduling and fault
    tolerance policies for the application
  • Determining an initial process allocation
  • Compiling the system aware application
  • Managing the users grid proxy
  • Creating the MPI environment (including
    transferring files)
  • Providing fault tolerance for the GM process

Acts like a simplified resource management system
12
The EasyGrid AMS
  • GM creates one SM per site
  • Each SM spawns a HM on each remaining resource at
    their respective site
  • The application processes will be created
    dynamically according to the local policy of the
    HM
  • Processes are created with unique communicators
  • Gives rise to the hierarchical AMS structure
  • Communication can only take place between parent
    and child processes
  • HMs, SMs and GM route messages between
    application processes

13
Process Scheduling
  • Scheduling is made complex due to the dynamic and
    heterogeneous characteristics of grids
  • Predicting the processing power and communication
    bandwidth available to an application is
    difficult
  • Static schedulers
  • Estimates assumed a priori may be quite different
    at runtime
  • More sophisticated heuristics can be employed at
    compile time
  • Dynamic schedulers
  • Access to accurate runtime system information
  • Decisions need to be made quickly to minimise
    runtime intrusion
  • Hybrid Schedulers
  • Combine the advantages by integrating both static
    and dynamic schedulers

14
Static Scheduling
  • The scheduling heuristics require models that
    capture the relevant characteristics of both the
  • application (typically represented by a DAG) and
  • the target system
  • To define the parameters of the architectural
    model
  • The Portal executes a MPI modelling program with
    the users credentials to determine the current
    realistic performance available to this users
    application
  • At start-up, application processes will be
    allocated to the resources in accordance with the
    static schedule

15
Dynamic Scheduling
  • Modifying the initial allocation at run time is
    essential, given the difficulty of
  • Extracting precise information with regard to the
    applications characteristics
  • Predicting the performance available from shared
    grid resources
  • The current version only reschedules processes
    which have yet to be created
  • The dynamic schedulers are associated with each
    of the management processes distributed in the 3
    levels of the AMS hierarchy

16
AMS Hierarchical Schedulers
Associated with the GM
Associated with a SM
Associated with a HM
17
Dynamic Scheduling
  • The appropriate scheduling policies depend on
  • The class of the application and the users
    objectives
  • Different policies may be used in different
    layers of the hierarchy and even within the same
    layer
  • The dynamic schedulers collectively
  • Estimate the remaining execution time on each
    resource
  • Verify if the allocation needs to be adjusted
  • If necessary, activate the rescheduling mechanism
  • A rescheduling mechanism characterises a
    scheduling event

18
Host Dynamic Scheduler
  • HDS determines both the order and the instant
    that an application process should be created on
    the host
  • Possible scheduling polices to determine process
    sequence include
  • The order specified by the static scheduler
  • Data flow selects any ready task
  • A second policy is necessary to indicate when the
    selected process may execute
  • The optimal number of application process that
    should execute concurrently depends on their I/O
    characteristics
  • Often influenced by local usage restrictions

19
Host Dynamic Scheduler - HDS
  • When an application process terminates on a
    resource, the monitor makes available to the HDS
    the processs
  • wall clock time
  • CPU execution time

together with the heterogeneity factor
  • Both cp and ert are added to the monitoring
    message and sent to the Site Manager

SDS
HDS
HDS
HDS
20
Site Dynamic Scheduler - SDS
  • On receiving the and the from each
    resource in the site, the SDS calculates
  • The ideal makespan for the remaining processes in
    the site
  • The site imbalance index
  • If the site imbalance index is above a predefined
    threshold, a scheduling event at the SDS is
    triggered
  • SDS requests a percentage of the remaining
    workload from the most overloaded host
  • HDS send tasks to be rescheduled to the SDS

SDS
  • SDS distributes the tasks amongst the under
    loaded hosts
  • HDS receives the request and decides which tasks
    to release

HDS
HDS
HDS
21
Site Dynamic Scheduler
  • When not executing a scheduling event, SDS
    calculates
  • The average estimated remaining time of the site
  • The sum of computational power of site resources

GDS
SDS
SDS
SDS
22
Global Dynamic Scheduler - GDS
  • On receiving the and the from each site, GDS
    calculates
  • The ideal makespan for the whole application
  • The system imbalance index

GDS
  • GDS distributes the tasks between the under
    loaded sites
  • Each under loaded site reschedules the received
    tasks amongst their hosts
  • GDS requests to the most overloaded site a
    percentage of its remaining workload
  • SDS receives the request and forwards it to its
    HDSs
  • SDS waits for each HDS to answer and sends the
    list of tasks to the GDS

SDS
SDS



HDS
HDS
HDS
HDS
HDS
23
Experimental Analysis
  • These scheduling policies were designed for BoT
    applications such as parameter sweep (PSwp)
  • PSwp can be represented by simple fork-join DAGs
  • The scheduling policies for this class of DAG can
    be seen as load balancing strategies
  • Semi-controlled three site grid environment
  • Pentium IV 2.6 GHz processors with 512 Mb RAM
  • Running Linux Fedora Core 2, Globus Toolkit 2.4
    and LAM/MPI 7.0.6

24
Experimental Analysis
25
Experimental Analysis
  • Same HDS policies for PSwp1 and PSwp2
  • The overhead due to the AMS is very low, not
    exceeding 2.5
  • The cost for rescheduling is also small, less
    than 1.2
  • The standard deviation is less than the
    granularity of a single task

26
Experimental Analysis
  • Different HDS policies for PSwp1 and PSwp2
  • PSwp1 processes execute only when resources are
    idle
  • The interference produced by PSwp1 is less than
    0.8
  • The cost for rescheduling not exceeds 1.4

27
Experimental Analysis
  • EasyGrid AMS with different scheduling policies
  • Round robin used by MPI without dynamic
    scheduling
  • Near optimal static scheduling without dynamic
    scheduling
  • Round robin used by MPI dynamic scheduling
  • Near optimal static scheduling dynamic
    scheduling

28
Conclusion
  • In attempt to improve application performance a
    hierarchical hybrid scheduling is employed
  • The low cost of the hierarchical scheduling
    methodology leads to an efficient execution of
    BoT MPI applications
  • Different scheduling policies may be used in
    different levels of the scheduling hierarchy
  • One AMS per application permits that application
    specific scheduling policies be used
  • System awareness permits various applications to
    collaborate their scheduling efforts in a
    decentralised manner to obtain good system
    utilisation

29
Efficient Hierarchical Self-Scheduling for MPI
Applications Executing in Computational
Grids Thanks e-mail depaula,asena,boeres,vino
d_at_ic.uff.br
30
Calculation of the Optimal Value
31
Calculation of the Optimal Value
  • Together (Pswp1 uses only idle resources)
  • When Pswp2 finishes, Pswp1
  • Has already executed 56 tasks in each idle
    resource (7 machines)
  • (56 tasks 7 machines) ? 392 tasks executed
  • And, remains (1000 392) ? 608 tasks, not yet
    executed in the shared resources
  • Executes 608 tasks in 18 machines (Sites 1, 2)
  • tasks/machines 608/18 ? 34
  • ? Optimal value (56 34) Duration of the
    Task

32
Calculation of the Expected Value
  • The expected value when the applications are
    executing together is based on the actual value
    when the application executed alone
  • Example (same HDS policies)
  • actual value 57.20 (executing alone in 18
    machines)
  • expected value executing in (7 machines 11
    shared machines)
  • 57.20 ? 18 machines
  • ???? ? 12.5 machines (7 5.5)
  • Expected value (57.20 18)/12.5 82.36

33
Calculation of the Expected Value
  • The expected value when the applications are
    executing together is based on the actual value
    when the application executed alone
  • Example (different HDS policies)
  • actual value 57.67 (executing alone in 18
    machines)
  • at this moment is considered that Pswp2
    finished its execution
  • So, Pswp1 has already executed 56 tasks in each
    idle resource (7 machines) ? 392 tasks executed
  • And remains (1000 392) ? 608 tasks, not yet
    executed in the shared resources
  • Pswp1 executes 608 tasks 18 machines (Site 1 and
    2)
  • tasks/machines 608/18 ? 34
  • Expected value (57.67 34) 91.67

34
(No Transcript)
35
(No Transcript)
36
Related Works
  • Accordingly to Buyya et al., the scheduling
    subsystem may be
  • Centralised
  • A single scheduler is responsible for grid-wide
    decisions
  • Not scalable
  • Hierarchical
  • Scheduling is divided into several levels,
    permitting different scheduling policies at each
    level
  • Failure of the top level scheduler, results in
    the entire system failure
  • Decentralised
  • Various decentralised schedulers communicate with
    each other, offering portability and fault
    tolerance
  • Good individual scheduling may not lead to a good
    overall performance

37
Related Work
  • OurGrid
  • Employs two different schedulers
  • A UMS job manager, with the responsibility for
    allocating application jobs on grid resources
  • A RMS site manager in charge of enforce
    particular site-specific policies
  • No information about the application is used
  • GridFlow
  • Is a RMS which focuses on service-level
    scheduling and workflow
  • Management takes place at 3 levels global, local
    and resource

38
Related Work
  • GrADS
  • Is a RMS that utilises Autopilot to monitor the
    adherence to a performance contract between the
    application demands and resource capabilities
  • If the contract is violated, rescheduler takes
    corrective action by either suspending
    execution, computing a new schedule, migrating
    process and then restarting or by process
    swapping
  • AppleS
  • Is an AMS where individual scheduling agents that
    are embedded into the application perform
    adaptive scheduling
  • Given the users goals, these centralised agents
    use applications characteristics and system
    information to select viable resources
Write a Comment
User Comments (0)
About PowerShow.com