Title: A1260061248gyRmi
1V J M - Virtual Job Model A Deadlock Free
Resource Co-allocation Model for Cross Domain
Parallel Jobs Zhaohui Ding, Da Ma, Xiaohui
Wei, College of Computer Science Technology,
Jilin University, China, 130012 Wilfred W. Li,
San Diego SuperComputer Center, University of
California, San Diego, CA, USA, 92093
Abstract Although more and more scientists start
to take advantages of grid technologies to
facilitate their researches, running parallel
jobs crossing domains in a grid environment is
still a challenge. Even MPICH-G2 is able to run
MPI parallel applications on crossing domain
resources, however, the resource competing may
cause deadlock and other serious problems. In
this poster, we introduced a virtual job model
(VJM) which achieves synchronized resource
co-allocation for cross-domain parallel
applications. VJM is able to prevent the resource
allocation deadlock caused by resource competing
of multiple parallel jobs. As VJM does not rely
on the advance reservation feature, it can work
with almost all kinds of local schedulers. We
have implemented a prototype of VJM in
meta-scheduler CSF4 and the experiment results
are also given out in the poster.
When required resources of all pending
submitted jobs at same time exceed the total of
available resource, resource deadlock may occur.
In a grid environment composed of 2 clusters A
and B, each with 3 compute nodes, two accounts,
user1 and user2, are set up. On cluster A,
user1s priority is higher than user2s, while on
cluster B, user2s priority is higher than
user1s. We then submit a parallel job which
requires 6 machines by user1 and user2 at same
time (we represent them a and ß). With MPICH-G2,
neither of the two jobs can start. Moreover, the
resource deadlock results in the two clusters
becoming unavailable. See figure 5-(1) On the
other hand, see figure5-(2), with the VJM,
although obtained the resource of C2, the virtual
jobs of ß will be terminated and re-assigned by
VJMgr since the resource of C1 is not obtained by
ß. Finally, the virtual jobs of a obtained all
the 6 resource and the real sub-jobs of a
synchronized start successfully.
takes IP address as unique local identifier.
However, a large number of IP addresses may
results in inefficiency, moreover, the IP
addresses of the resource set of a job determined
by meta-scheduling fails to consider the local
scheduling policies. VJM can improve the
deadlock prevention with VJM, we divide the
deadlock prevention into two-phase and
hierarchical the IP addresses cluster selection
phase and host selection phase. First, the
master hosts IP address and the available
resource number of clusters are generally
knowable. Second, as presented in previous
section, the virtual job can obtain hosts
information (IP address) and report it back to
meta-scheduler after startup. See figure
below.
Design of Virtual Job Model (VJM) The
synchronized resource allocation is an important
issue to support cross-domain parallel jobs in
different grids. MPICH-G23 is a grid enabled
MPI implementation to enable a user to run MPI
programs across multiple sites. However, as
resource competing of multiple parallel jobs may
result in deadlock, MPICH-G2 can not guarantee
resource co-allocation successfully. Advanced
reservation4 may be used at different sites but
is not scalable. VJM is a new
meta-scheduling model designed to resolve
existing problems in synchronized resource
co-allocation. Before starting the actual job,
VJM dispatches a virtual job to the candidate
clusters to acquire the resources for real
parallel jobs. By using the virtual job, resource
selection optimization and deadlock prevention
are achieved. In consideration of the
heterogeneity and autonomy of the grid
environment, the key design principles of VJM are
to require only minimal common features supported
by all local resource managers, (no resource
reservation), and to require no extra components
or changes at any local site. VJM Architecture
VJM consists of three stages, resource
availability check stage, resource co-allocation
stage, and job startup stage, see Figure
2. First the
meta-scheduler performs resource availability
check (pre-check mechanism in CSF4) before
sending resource allocation requests to local
schedulers. After the pre-check, the
meta-scheduler makes a temporary decision about
on which clusters to execute the parallel job and
how to distribute the sub-jobs among them
according its meta-scheduling policies. Since the
meta-scheduler is not the owner of local cluster
resources, it cannot allocate resources directly.
Hence, a virtual job mechanism is introduced to
co-allocate the resource for a parallel job.
At resource allocation stage, the virtual jobs
are dispatched to the clusters instead of the
real jobs. Virtual jobs responsibility is to
obtain the guaranteed resources for the real job
and report the resource allocation status in each
cluster to the meta-scheduler. A virtual job
manager (VJmgr) is created for all virtual jobs
to collect the resource allocation information
and manage a virtual resource pool. The
virtual job will be scheduled by local scheduler
like a normal serial job. As a virtual job gets
the resource and starts up, it will send a
READY notification to VJmgr, register the
resource status to virtual resource pool and wait
for instructions from the VJmgr. During the
startup stage, after the VJmgr has received
sufficient READY notifications from the
virtual jobs, a STARTUP instruction is sent to
every virtual job that has registered in resource
pool to startup the corresponding real job.
Deadlock Prevention When the jobs were
forwarded to local resource sites by
meta-scheduler, they will not start up
immediately but wait to be scheduled in the local
queues. 5 proposed the Order-Based deadlock
prevention protocol that
Introduction The emerging grid computing
technologies are enabling the creation of virtual
organizations and enterprises for sharing
distributed resources to solve large-scale
problems in many research fields. More and more
scientists in various fields start to take
advantage of grid technologies to facilitate
their research. However, running parallel
applications that require synchronized resource
allocation in grids is still a challenge.
Many grid applications have resource
co-allocation requirements, which can be
satisfied only by simultaneously acquiring
multiple resources from distributed locations
1. Such applications need to co-allocate
distributed grid resources that span multiple
administrative domains, on which the specific
local policies are enforced. In such an
environment, the synchronized resource allocation
is much more complicated than in a single domain.
First, the availability and capability of
the grid resources are dynamic change. Second, in
a grid system, the jobs forwarded to local
resource sites by meta-scheduler will not start
up immediately but wait to be scheduled in the
local queues. Moreover, the waiting time is
unpredictable as it lies on the resource
availability and the local scheduling policies
etc. Third, the requirement of simultaneously
acquiring multiple resources of the parallel
applications indicates the potentiality of
resource allocation deadlock. For example, See
figure below In this poster, we
introduced a synchronized resource co-allocation
model for crossing domain parallel jobs, which is
called Virtual Job Model (VJM for short). VJM
is designed to fulfill resource co-allocation for
meta-scheduling, whose key idea is to submit
virtual jobs to the candidate clusters to acquire
the resources for real parallel jobs. After the
required resource of virtual jobs are allocated,
the meta-scheduler will schedule real jobs to
these reserved grid resources. Compared
with previous works, VJM has two advantages.
First, VJM can guarantee the resource
availability but not require local scheduler
supporting resource advance reservation so that
it can work with most local schedulers. Second,
VJM respects both the global scheduling policies
and the local scheduling policies. In
addition to illustrating the concepts and
principles of VJM, we also give out a prototype
of VJM in meta-scheduler CSF4 2.
Experiments First, to verify the VJMs
dependability and performance for cross-domain
parallel jobs, we deployed a Gfarm data grid
environment consisted of 3 clusters, see table1
below. The test application is mpiblast-g2, the
MPI version of the bioinformatics application
BLAST 6, compiled with MPICH-G2. Three test
cases simulate the grid status under idle (all
nodes are idle), moderate (requested resource
less than number of idle nodes) and busy
(requested resource more than number of idle
nodes) conditions. The jobs are submitted using
either MPICH-G2 or CSF4, with increasing number
of CPUs. Holder jobs which run a simple sleep
30 are submitted periodically to simulate jobs
submitted locally, resulting in a queue length of
2 on one or two of the clusters. For the three
cases, the resource co-allocation time were
showed in the figure.
Conclusion and Future Work A virtual job
model is proposed for efficient and stable
execution of cross-domain parallel jobs. The
model can guarantee parallel job synchronous
startup and free from resource allocation dead
lock. Moreover, it is highly compatible as it
dose not require that local scheduler supporting
resource advanced reservation. The future
researches will focus on further optimization of
VJM. First, the NAT techniques can be used in VJM
to support private IP clusters. Second, we are
interested in candidate resource selection and
the job/resource match algorithm basing on VJM.
Finally, we also hope to provide an independent
VJM implementation as a library.
Funding Source The work is supported by Jilin
University under Grant No.419070200053 and Grant
No.420010302338, CNSF under Grant No.60473099 and
NSF of Jilin Province under Grant No. 20060532
and No.20040119. W.W. Li wish to acknowledge
PRAGMA as supported by NSF Grant No.INT-0216895
and INT-0314015.
Reference 1 K. Czajkowski, I. Foster, N.
Karonis, C. Kesselman, S. Martin, W. Smith, and
S. Tuecke. A resource management architecture for
metacomputing systems. In Proceedings of the
IPPS/SPDP 98 Workshop on Job Scheduling
Strategies for Parallel Processing, pages 6282,
1998. 2 X. Wei, Z. Ding, S. Yuan, C. Hou,
and H. Li, "CSF4 A WSRF Compliant
Meta-Scheduler," presented at International
Conference 06' on Grid Computing and
Applications, Las Vegas, USA., 2006. 3
Nicholas T. Karonis, Brian Toonen, Ian Foster, A
MPICH-G2 A Grid-enabled implementation of the
Message Passing Interface, Journal of Parallel
and Distributed Computing. 4 K. Czajkowski,
I. Foster, and C. Kesselman. Resource
Co-Allocation in Computational Grids. Proceedings
of the Eighth IEEE International Symposium on
High Performance Distributed Computing (HPDC-8),
pp. 219-228, 1999. 5 Jonghun Park. A
Scalable Protocol for Deadlock and Livelock Free
Co-Allocation of Resources in Internet Computing.
2003 Symposium on Applications and the Internet
(SAINT'03), January 2003. 6 S. F. Altschul,
T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang,
W. Miller, and D. J. Lipman, "Gapped BLAST and
PSI-BLAST a new generation of protein database
search programs," Nucleic Acids Res, vol. 25, pp.
3389-402, 1997.