SProj 3

About This Presentation

Transcript and Presenter's Notes

Title: SProj 3

1
SProj 3

Libra An Economy-Driven Cluster Scheduler
Jahanzeb Sherwani
Nosheen Ali
Nausheen Lotia
Zahra Hayat
Project Advisor/Client Rajkumar Buyya
Faculty Advisor Dr. Arif Zaman

2
Problem Statement

Implementing a computational-economy based
user-centric scheduler for clusters

3
What is a cluster?

A collection of workstations interconnected via a
network technology, in order to take advantage of
combined computational power and resources
An integrated collection of resources that can
provide a single system image spanning all its
nodes a virtual supercomputer
Used for computation-intensive applications such
as AI expert systems, nuclear simulations, and
scientific calculations

4
Why clusters?

Cost-effectiveness low cost-performance ratio
compared to a specialized supercomputer
Increase in workstation performance
Increase in network bandwidth
Decrease in network latency
Scalability higher than that of a specialized
supercomputer
Easier to integrate into an existing network than
specialized supercomputers

5
Computational Economy

Traditional system-centric performance metrics
CPU Throughput
Mean Response Time
Shortest Job First
Computational economy is the inclusion of
user-specified quality of service parameters with
jobs so that resource management is user-centric
rather than system-centric

6
Computational Economy (contd)

Project focus to implement a scheduler that aims
to maximize user utility
Job parameters most relevant to user-centric
scheduling
Budget allocated to job by user
Deadline specified by user

7
Computational Economy for Grids

What is a grid?
An infrastructure that couples resources such as
computers (workstations or clusters ), software
(for special purpose applications) and devices
(printers, scanners) across the Internet and
presents them as a unified integrated single
resource that can be widely used
How a grid differs from a cluster
Wide geographical area
Non-dedicated resources
No centralized resource management

8
Computational Economy for Grids

Management of resources and scheduling
computations in a grid environment is complex as
the resources are
geographically distributed
heterogeneous in nature
owned by different individuals or organizations
have different access and cost models
resource discovery required
security issues
Computational economy has been implemented for
grids the Nimrod/G resource broker is a global
resource management and scheduling system that
supports deadline and economy-based computations
in grid-computing environments

9
Computational Economy for Clusters

Market-based Proportional Resource Sharing for
Clusters Brent Chun and David E. Culler,
University of California at Berkeley, Computer
Science Division
a market-based approach based on the notion of a
computational economy which optimizes for user
value. It describes an architecture for
market-based cluster resource management based on
the idea of proportional resource sharing of
basic computing resources. Cluster nodes act as
independent sellers of computing resources while
user applications act as buyers who purchase
resources . Users are allocated
credits/tickets-the more tickets they have, the
greater their CPU share. Ticket allocation is on
the basis of the amount the user is willing to
pay his valuation of the job
Deadline not incorporated

10
Cluster Architecture
11
Cluster Management Software

Cluster Management Software is designed to
administer and manage application jobs submitted
to workstation clusters.
Creates a Single System Image
When a collection of interconnected computers
appear to be a unified resource, we say it
possesses a Single System Image
The benefit of a Single System Image is that the
exact location of the execution of a process is
entirely concealed from the user. The user is
offered the illusion of a single powerful
computer
Maintains centralized information about cluster
status and resources

12
Cluster Management Software

Commercial and Open-source Cluster Management
Software
Open-source Cluster Management Software
DQS (Distributed Queuing System )
CONDOR
GNQS (Generalized Network Queuing System)
MOSIX
REXEC (Remote Execution)
SGE (Sun Grid Engine)
PBS (Portable Batch System)

13
Cluster Management Software

Why SGE was rejected
lack of online support
lack of stability
Final choice of CMS PBS(Portable Batch System )

14
Pricing the Cluster Resources

Cost a (Job Execution Time) b (Job Execution
Time / Deadline)
Cost of using the cluster depends on job length
and job deadline the longer the user is prepared
to wait for the results, the lower his cost
Cost formula forces user to reveal his true
deadline

15
Scheduling Algorithm

How to meet budget and deadline constraints?
Ensuring low run-time for the algorithm
Greedy Algorithm
Complex solutions unfeasible
Test run of algorithm
5 jobs, arriving at time t0, 5, 7, 9, 9, on a 3
node cluster

16
LIBRA with PBS

Portable Batch System (PBS) as the Cluster
Management Software (CMS)
Robust, portable, effective, extensible batch job
queuing and resource management system
Supports different schedulers
Job accounting
Technical Support

17
Setting up the PBS Cluster

Installation of Linux with Windows
Installation of SGE as well as PBS
Setting up a Network File System
Configuring GridSim in Java
Configuring PBSWeb
Setting up the Apache WebServer
PHP scripting for Apache
Setting up PostgreSQL
Setting up SSH

18
PBS Overview

Main components of PBS
Job Server pbs_server
Job Scheduler pbs_sched
Job Executor Resource Monitor pbs_mom
The server accepts commands and communicates with
the daemons
qsub - submit a job
qstat - view queue and job status
qalter - change jobs attributes
qdel - delete a job

19
Xpbs GUI for PBS
20
Xpbs --- GUI for PBS
21
Job Scheduling in PBS
22
The Libra Scheduler

Default FIFO Scheduler in PBS
FIFO - sort jobs by job queuing time running the
earliest job first
Fair share sort schedule jobs based on past
usage of the machine by the job owners
Round-robin - pick a job from each queue
By key - sort jobs by a set of keys
shortest_job_first, smallest_memory_first

23
The Libra Scheduler

Job Input Controller
Adding parameters at job submission time
deadline
budget
executionTime
Defining new attributes of job
Job Acceptance and Assignment Controller
Budget checked through cost function
Admission control through deadline scheduling
Execution host with the minimum load and ability
to finish job on time selected
Equal Share instead of Minimum Share

24
The Libra Scheduler

Job Execution Controller
Job run on the best node according to algorithm
Cluster and node status updated
runTime
cpuLoad
Job Querying Controller
Server, Scheduler, Exec Host, and Accounting Logs

25
PBS-Libra Web --- Front-end for the Libra Engine
26
PBS-Libra Web
27
PBS-Libra Web
28
PBS-Libra Web
29
PBS-Libra Web
30
PBS-Libra Web
31
PBS-Libra Web
32
PBS-Libra Web
33
PBS-Libra Web
34
(No Transcript)
35
Simulations

Goal
Measure the performance of Libra Scheduler
Performance ?
Maximize user satisfaction

36
Simulations

Simulation Software
Alter GridSim (grid resource management
simulation)

37
GridSim Class Diagram
38
Simulations

Methodology
Workload
120 jobs with deadlines and budgets
Job lengths 1000 to 10000
Resources
10 node, single processor (MIPS rating 100)
homogenous cluster

39
Simulations

Assumptions
Strict deadlines
Ignores processing overhead due to scheduler and
clock interrupt
Scheduler simulated as a function
Input job size, deadline, budget
Output accept/reject, node , share allocated

40
Simulations

Compared
Proportional Share
FIFO
Experiments
120 jobs, 10 nodes
Increasing workload to 150 and 200
Increasing cluster size to 20

41
Simulation Results

120 jobs, 20 did not meet budget

42
100 Jobs, 10 NodesFIFO 23 rejected -
Proportional Share 14 rejected
43
Simulation Results

Increase workload to 200 jobs on the same 10 node
cluster

44
200 Jobs, 10 NodesFIFO 105 rejected -
Proportional Share 93 rejected
45
Simulation Results

Scale the cluster up to 20 nodes

46
200 Jobs, 20 NodesFIFO 35 rejected -
Proportional Share 23 rejected
47
Simulation Results
48
Simulation Results
49
Simulation Results
50
Simulation Results
51
Conclusion Future Work

Succesfully implemented a Linux-based cluster
that schedules jobs using PBS with our
economy-driven Libra scheduler, and PBS-Libra Web
as the front end.
Successfully tested our scheduling policy
Proportional Share delivers more value to users
Exploring other pricing mechanisms
Expanding the cluster with more nodes and with
support for parallel jobs

Write a Comment

User Comments (0)

About PowerShow.com

SProj 3 PowerPoint PPT Presentation