Title: Analysis of cooperation in multiorganization Scheduling
1Analysis of cooperation in multi-organization
Scheduling
- Pierre-François Dutot (Grenoble University)
- Krzysztof Rzadca (Polish-Japanese school, Warsaw)
- Fanny Pascual (LIP6, Paris)
- Denis Trystram (Grenoble University and INRIA)
- NCST, CIRM, may 13, 2008
2Goal
The evolution of high-performance execution
platforms leads to distributed entities
(organizations) which have their own local
rules. Our goal is to investigate the
possibility of cooperation for a better use of a
global computing system made from a collection of
autonomous clusters. Work partially supported
by the Coregrid Network of Excellence of the EC.
3Main result
We show in this work that it is always possible
to produce an efficient collaborative solution
(i.e. with a theoretical performance guarantee)
that respects the organizations selfish
objectives. A new algorithm with theoretical
worst case analysis plus experiments for a case
study (each cluster has the same objective
makespan).
4Target platforms
Our view of computational grids is a collection
of independent clusters belonging to distinct
organizations.
5Computational model(one cluster)
Independent applications are submitted locally on
a cluster. The are represented by a precedence
task graph. An application is a parallel rigid
job. Let us remind briefly the model (close to
what presented Andrei Tchernykh this morning)
See Feitelson for more details and classification.
6Local queue of submitted jobs
J1
J2
J3
7Job
8(No Transcript)
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13overhead
Computational area
Rigid jobs the number of processors is fixed.
14Runtime pi
of required processors qi
15Runtime pi
of required processors qi
high jobs (those which require more than m/2
processors) and low jobs (the others).
16Scheduling rigid jobsPacking algorithms
Scheduling independent rigid jobs may be solved
as a 2D packing Problem (strip packing). List
algorithm (off-line).
m
17n organizations.
J1
J2
J3
Organization k
m processors (identical for the sake of
presentation)
k
18 19 20n organizations (here, n3, red 1, green 2 and
blue 3)
Cmax(Ok) organization
Cmax(Mk) cluster
Each organization k aims at minimizing its
own makespan max(Ci,k) We want also that the
global makespan is minimized
21n organizations (here, n3, red 1, green 2 and
blue 3)
Cmax(O3) Cmax(M1)
Cmax(Ok) organization
Cmax(Mk) cluster
Cmax(O2)
Each organization k aims at minimizing its
own makespan max(Ci,k) We want also that the
global makespan (max over all local ones) is
minimized
22Problem statement
MOSP minimization of the global makespan
under the constraint that no local schedule is
increased. Consequence taking the restricted
instance n1 (one organization) and m2 with
sequential jobs, the problem is the classical 2
machines problem which is NP-hard. Thus, MOSP is
NP-hard.
23Multi-organizations
Motivation A non-cooperative solution is that
all the organizations compute their local jobs
( my job first policy). However, such a
solution is arbitrarly far from the global
optimal (it grows to infinity with the number of
organizations n). See next example with n3 for
jobs of unit length.
O1
O1
O2
O2
O3
O3
with cooperation (optimal)
no cooperation
24Preliminary results
Single organization scheduling (packing).
Resource constraint list algorithm (Garey-Graham
1975).
- List-scheduling (2-1/m) approximation ratio
- HF Highest First schedules (sort the jobs by
decreasing number of required processors). Same
theoretical guaranty but better from the
practical point of view.
25Analysis of HF (single cluster)
Proposition. All HF schedules have the same
structure which consists in two consecutive zones
of high (I) and low (II) utilization. Proof. (2
steps) By contracdiction, no high job appears
after zone (II) starts
low utilization zone (II)
high utilization zone
(I) (more than 50 of processors are busy)
26Collaboration between clusters can substantially
improve the results.
Other more sophisticated algorithms than the
simple load balancing are possible matching
certain types of jobs may lead to bilaterally
profitable solutions.
no cooperation
with cooperation
O1
O1
O2
O2
27If we can not worsen any local makespan, the
global optimum can not be reached.
local
globally optimal
2
1
O1
O1
1
1
1
2
2
2
O2
O2
2
2
28If we can not worsen any local makespan, the
global optimum can not be reached.
local
globally optimal
2
1
O1
O1
1
1
1
2
2
2
O2
O2
2
2
1
2
O1
1
best solution that does not increase Cmax(O1)
O2
2
2
29If we can not worsen any local makespan, the
global optimum can not be reached.
- Lower bound on approximation ratio greater than
3/2.
2
1
O1
O1
1
1
1
2
2
2
O2
O2
2
2
1
2
O1
1
best solution that does not increase Cmax(O1)
O2
2
2
30Multi-Organization Load-Balancing
1 Each cluster is running local jobs with Highest
First LB max (pmax,W/nm) 2.
Unschedule all jobs that finish after 3LB. 3.
Divide them into 2 sets (Ljobs and Hjobs) 4. Sort
each set according to the Highest first order 5.
Schedule the jobs of Hjobs backwards from 3LBon
all possible clusters 6. Then, fill the gaps with
Ljobs in a greedy manner
31Hjob
Ljob
let consider a cluster whose last job finishes
before 3LB
3LB
32Hjob
Ljob
3LB
33Hjob
Ljob
3LB
34Hjob
Ljob
3LB
35Ljob
3LB
36Ljob
3LB
37 Notice that the centralized scheduling
mechanism isnot work stealing, moreover, the
idea is to changeas minimum as we can the local
schedules.
38Feasibility (insight)
Zone (I)
Zone (II)
3LB
39Sketch of analysis
Proof by contradiction let us assume that it is
not feasible, and call x the first job that does
not fit in a cluster.
Case 1 x is a small job. Global surface
argument Case 2 x is a high job. Much more
complicated, see the paper for technical details
403-approximation (by construction)This bound is
tight
Local HF schedules
413-approximation (by construction)This bound is
tight
Optimal (global) schedule
423-approximation (by construction)This bound is
tight
Multi-organization load-balancing
43Improvement
We add an extra load-balancing procedure
O1
O2
Local schedules
O3
O4
O5
O1
O2
Multi-org LB
O3
O4
O5
O1
O2
O3
Compact
O4
O5
O1
O2
O3
load balance
O4
O5
44Some experiments
45Conclusion
- We proved in this work that cooperation may help
for a better global performance (for the
makespan). - We designed a 3-approximation algorithm.
- It can be extended to any size of organizations
(with an extra hypothesis). -
46Conclusion
- We proved in this work that cooperation may help
for a better global performance (for the
makespan). - We designed a 3-approximation algorithm.
- It can be extended to any size of organizations
(with an extra hypothesis). - Based on this, it remains a lot of interesting
open problems, including the study of the problem
for different local policies or objectives
(Daniel Cordeiro) -
47Thanks for attentionDo you have any questions?
48Using Game Theory?
We propose here a standard approach using
Combinatorial Optimization. Cooperative Game
Theory may also be usefull, but it assumes that
players (organizations) can communicate and form
coalitions. The members of the coalitions split
the sum of their playoff after the end of the
game. We assume here a centralized mechanism and
no communication between organizations.