Faucets:%20Scheduling%20on%20Clusters%20and%20Across%20the%20Grid - PowerPoint PPT Presentation

About This Presentation
Title:

Faucets:%20Scheduling%20on%20Clusters%20and%20Across%20the%20Grid

Description:

Faucets: Scheduling on Clusters and Across the Grid Presenter: Sameer Kumar Team: Sanjay Kal , Sameer Kumar, Sindhura Bandhakavi, Justin Meyer – PowerPoint PPT presentation

Number of Views:175
Avg rating:3.0/5.0
Slides: 46
Provided by: illi100
Category:

less

Transcript and Presenter's Notes

Title: Faucets:%20Scheduling%20on%20Clusters%20and%20Across%20the%20Grid


1
Faucets Scheduling on Clusters and Across the
Grid
  • Presenter Sameer Kumar
  • Team Sanjay Kalé, Sameer Kumar, Sindhura
    Bandhakavi, Justin Meyer
  • Parallel Programming Laboratory
  • Department of Computer Science
  • University of Illinois at Urbana-Champaign
  • http//charm.cs.uiuc.edu/

2
Outline
  • High-level Description
  • Motivation
  • Faucets, Cluster Bartering
  • Adaptive jobs, Adaptive queuing system (AQS)
  • Demo
  • Usage and Installation
  • How to write an adaptive program
  • Installing and Using the AQS
  • Adding your cluster to an existing faucets server
  • Installing a faucets server

3
Motivation
  • Demand for high end compute power, but
  • Dispersed
  • Which machine would give me back my results
    quickest?
  • Hard to use
  • Use ssh to login, ftp files, decide queue, create
    script, submit
  • Because of the hassle, users just submit same
    script to same machine even if a better
    alternative exists
  • Monitor a running job
  • Low operational efficiency of existing computing
    systems

4
Solution 1 Faucets
  • Motivation 1 dispersed, hard to use
  • Central source of compute power
  • Users
  • Providers of compute resources
  • User account not needed on every resource
  • Match users and providers
  • Market economy ?
  • Cluster bartering
  • QoS requirements, contracts and bidding systems
  • GUI or web-based interface
  • Submission
  • Monitoring

5
Faucets
Parallel systems need to maximize their
efficiency!
Cluster
Job Submission
Cluster
Job Monitor
Cluster
http//charm.cs.uiuc.edu/research/faucets
6
Motivation 2 Inefficient Utilization
16 Processor system
Current Job Schedulers can have low system
utilization !
7
Solution Adaptive Jobs
  • Jobs that can shrink or expand the number of
    processors they are running on at runtime
  • Improve system utilization and response time
  • Properties
  • Min_pe,
  • related to the memory requirements of the job
  • Max_pe,
  • related to speedup
  • Scheduler can take advantage of this adaptivity

8
Two Adaptive Jobs
16 Processor system
9
Adaptive Job Scheduler
  • Maximize system utilization and minimize response
    time
  • Scheduling decisions
  • Shrink existing jobs when a new job arrives
  • Expand jobs to use all processors when a job
    finishes
  • Processor map sent to the job
  • Bit vector specifying which processors a job is
    allowed to use
  • 00011100 (use 3 4 and 5!)
  • Handles regular (non-adaptive) jobs

10
Outline
  • High-level description
  • Motivation
  • Faucets, cluster bartering
  • Adaptive jobs, adaptive queuing system (AQS)
  • Demo
  • Usage and installation
  • How to write an adaptive program
  • Installing and using the AQS
  • Adding your cluster to an existing faucets server
  • Installing a faucets server

11
SystemOverview
CLUSTER
CLUSTER DAEMON
ADAPTIVE Q SYSTEM
PE
PE
PE
FAUCETS SERVER
GUI CLIENT (or) Web Browser
CLUSTER
12
GUI Client
CLUSTER
CLUSTER DAEMON
ADAPTIVE Q SYSTEM
PE
PE
PE
FAUCETS SERVER
GUI CLIENT (or) Web Browser (or) Command-line Clie
nt
CLUSTER
13
Secure Communication
  • SSL communication
  • Certificate for Faucets Server
  • public key distributed on web page, in code
  • One certificate for each CD
  • Future Globus

14
GUI Client
  • One JAR file
  • Runs on Win32 platform
  • Faucets Server Certificate included in code.
  • GUI client gets CD certificates from CS

15
(No Transcript)
16
(No Transcript)
17
Perf Monitor
18
(No Transcript)
19
Adaptive Jobs
CLUSTER
CLUSTER DAEMON
LOCAL SCHEDULER
PE
PE
PE
FAUCETS SERVER
GUI CLIENT (or) Web Browser
CLUSTER
20
Adaptive Job Framework
  • Applications written in AMPI or Charm
  • Scheduler controls the processor map for each job
  • Processor map is used by the jobs load balancer

21
Charm
  • Charm object based virtualization
  • Program written as a large number of objects
    which can migrate
  • Number of objects typically much larger than
    processors
  • Load-balancer can remap objects
  • Measurement based load balancing

22
Adaptive Charm Programs
  • Charm program is adaptive automatically if an
    adaptive load-balancing strategy is used
  • Currently CommLB and RandcentLB are adaptive
  • Compile with balancer CommLB

23
MPI Jobs
  • How do we make MPI jobs adaptive?
  • AMPI
  • AMPI maps the MPI processes to user level threads
    which can migrate
  • Each thread is embedded in a charm object, thus
    allowing load balancing and shrink-expand

24
Writing Adaptive AMPI Programs
  • Build AMPI with an adaptive load balancing
    strategies
  • Call MPI_MIGRATE() at regular intervals in each
    MPI process, because it will not listen to the
    processor map otherwise
  • Use specific load-balancers

25
Shrink Expand Overhead
Performance for MD program with 10MB migrated
data per processor on NCSA Platinum
26
Adaptive Queuing System
CLUSTER
CLUSTER DAEMON
ADAPTIVE Q SYSTEM
PE
PE
PE
FAUCETS SERVER
GUI CLIENT (or) Web Browser
CLUSTER
27
AQS Features
  • Multithreaded
  • Reliable and robust
  • Tested on Linux clusters at UIUC
  • Supports most features of standard queuing
    systems
  • Has the ability to manage adaptive jobs currently
    implemented in Charm and MPI
  • For more details check out
  • http//charm.cs.uiuc.edu/research/faucets/faucets.
    html

28
Components
  • Database
  • Job scheduler
  • Compute cluster

29
Installing Database
  • Download latest version of MySql
  • http//www.mysql.com/
  • Install, then
  • mysqlgt create database ltdbnamegt
  • mysqlgt use ltdbnamegt
  • mysqlgt create table jobInfo (id mediumint primary
    key NOT NULL DEFAULT '0' auto_increment, ..)
  • mysqlgt grant all on . to ltusergt identified by
    ltpasswdgt

30
Installing Scheduler
  • cd charm/net-linux/pgms/scheduler
  • make scheduler make client
  • Edit Makefile, put correct path to MySql
  • Running scheduler as root
  • su
  • chown root scheduler
  • chmod s scheduler
  • ./startScheduler

31
Installing Scheduler, contd.
  • Edit the startScheduler file
  • Edit Database to match ltdbnamegt used earlier.
  • Edit PORT to point to port of the scheduler
  • Edit DATABASE_HOST DATABASE_USER and
    DATABASE_PASSWD to point to the database host,
    user and password
  • NODELIST points to the nodelist for the scheduler

32
Configuring The Cluster
  • User must have access to the cluster only through
    the queuing system
  • Each node runs an rsh daemon
  • Access to rsh through a restrictive group
  • Job switches to the rsh group before running the
    job
  • only head node can rsh to the other nodes
  • rsh disabled on the compute nodes
  • All connections through unix sockets

33
Using the AQS locally
  • frun runs a job interactively
  • fsub submits a batch job
  • fkill kills the job
  • fjobs list the running and queued jobs

34
Scheduling Events
  • When
  • Job arrival
  • Job completion
  • Job requests change of number of processors
  • Job suspension
  • Scheduling Strategy
  • A plugable component that makes decisions on
    which jobs to schedule

35
Scheduling Strategy Studied
  • Similar to equipartitioning N Islam et al
  • On job arrival and job completion
  • All running jobs and the new one are allocated
    their minimum number of processors
  • Leftover processors are shared equally subject to
    each job's maximum processor usage
  • If it is not possible to allocate the new job its
    minimum number of processors, it is queued

36
Scheduler Performance
Simulation results on 64 processors with mean job
execution time of 64.5 sec
?Arrival Rate, MRTMean Response Time
UtilizationProcessor utilization, Load Factor
(lf)Execution Time?
37
Experimental Results
Experiments on Linux cluster on 64 processors and
mean job execution time of 60 sec
38
Adding a Cluster to Faucets
39
CLUSTER
CLUSTER DAEMON
LOCAL SCHEDULER
PE
PE
PE
FAUCETS SERVER
GUI CLIENT (or) Web Browser
CLUSTER
40
Adding new cluster
  • Prerequisites
  • Install Charm
  • Install Adaptive Queuing System
  • Then
  • Download the faucets software
  • http//charm.cs.uiuc.edu/
  • Compile the cluster daemon (CD)
  • cd faucets/cd make
  • Run the cluster daemon (CD)
  • cd ..
  • java cd.ClusterDaemon ltcentral servergt ltcentral
    server portgt -p ltClusterDaemon portgt ltworking dirgt

41
Installing a Faucets Server
42
CLUSTER
CLUSTER DAEMON
LOCAL SCHEDULER
PE
PE
PE
FAUCETS SERVER
GUI CLIENT (or) Web Browser
CLUSTER
43
Installing a Faucets Server
  • Install MySQL
  • create tables
  • grant permissions
  • Download JDBC driver
  • http//mmmysql.sourceforge.net/
  • Install CS
  • download faucets code and unpack
  • cd faucets/cs make
  • Edit faucets/cs/db.properties
  • cd faucets
  • java -cp ./path/to/mm.mysql-2.0.8-bin.jar
    TheServer

44
Installing Appspector
  • Installation is a little involved
  • Each application needs a display module written
    in Java
  • Contact us if you want to install

45
Summary and Future Work
  • Showed you how to use and install the
    Charm/AMPI adaptive job system
  • Download at http//charm.cs.uiuc.edu/research/fauc
    ets
  • Future
  • Extend the system to other parallel machines
  • Eliminate residual processes
  • Integrate the scheduler with Globus
  • More comprehensive QoS contracts being developed
  • Sophisticated bidding schemes for the faucets
    framewor
Write a Comment
User Comments (0)
About PowerShow.com