Title: Sky Computing on FutureGrid and Grid
1Sky Computing on FutureGrid and Grid5000
Pierre Riteau1, Mauricio Tsugawa2, Andrea Matsunaga2, Jose Fortes2, Tim Freeman3, David LaBissoniere4, Kate Keahey3,4
1 Université de Rennes 1, IRISA/INRIA Rennes Bretagne Atlantique 2 University of Florida 3 Argonne National Labs 4 University of Chicago Computation Institute
Introduction
Architecture
VM Image Propagation Mechanisms
- To deploy virtual clusters, each VM requires an
independent replica of a common VM image. Nimbus
transfers a copy of the required VM image to each
VM host (a step called propagation), using SCP
from a single repository. This propagation scheme
doesnt scale with the number of VMs as it is
limited by the repository disk or network
bandwidth. To overcome this problem, we developed
two new propagation mechanisms. - The first one leverages the TakTuk and Kastafior
tools developed at INRIA to create a broadcast
chain used to transfer image data. The second one
relies on Copy-on-Write capabilities of the Xen
hypervisor.
- Sky computing is an emerging computing model
where resources from multiple cloud providers are
leveraged to create large scale distributed
infrastructures.
- Our Sky Computing deployment makes use of
- Xen to minimize platform (hardware and operating
system stack) differences - Nimbus to offer VM provisioning and
contextualization services (contextualization
automatically assigns roles and configures VMs) - ViNe, a virtual network based on an IP-overlay,
to enable all-to-all communication between
virtual machines spread across multiple clouds - Hadoop for parallel fault-tolerant execution and
dynamic cluster extension
- This work uses resources across two experimental
projects FutureGrid and Grid5000. This
showcases not only the capabilities of the
experimental platforms, but also their emerging
collaboration. - The two platforms are used to create a Sky
Computing environment. To validate our approach
in a real-world scenario, we run a MapReduce
version of a popular bioinformatics application
(BLAST). However, any kind of distributed
application can be run on these infrastructures.
Distributed Application (e.g. MPI BLAST)
ViNe
Nimbus
Nimbus
Cloud A
Cloud B
Nimbus
Cloud C
Experimental Testbeds
Scalability
- FutureGrid is an experimental testbed for grid
and cloud research. It is distributed over 6
sites in the US and offers more than 5,000 cores. - Grid5000 is an experimental testbed for research
in large-scale parallel and distributed systems.
It is distributed over 9 sites in France and
offers more than 5,500 cores.
- The above graph compares instantiation times of
virtual clusters using different propagation
mechanisms. In the SCP and TakTuk cases, the
image is compressed and is 2.2GB in size (12 GB
uncompressed). In the QCOW case, the 12GB image
is pre-propagated on all hypervisors. Propagation
consists in creating a new Copy-On-Write volume
and contextualizing the virtual cluster.
- We deployed a Sky Computing infrastructure
consisting of 1114 CPU cores (457 VMs)
distributed over 3 sites in FutureGrid and 3
sites in Grid5000 (OGF-29 demo, Chicago, IL,
June 2010).
ViNe router
San Diego
Rennes
Grid5000 firewall
VMs
Conclusion
University of Florida
Lille
- The Sky Computing model allows the creation of
large scale infrastructures using resources from
multiple cloud providers. These infrastructures
are able to run embarrassingly parallel
computation with high performance. Our work shows
how it is possible to federate multiple
infrastructures and improve the speed of virtual
cluster creation, using experimental testbeds in
the US and in France as an example.
Grid5000
FutureGrid
Queue ViNe Router
University of Chicago
Sophia
Grid5000
FutureGrid
Sponsors and Acknowledgments This work is
supported in part by the National Science
Foundation under Grants No. OCI-0910812,
IIP-0758596 and CNS-0821622 and in part by the
MCS Division subprogram of the Office of Advanced
Scientific Computing Research, SciDAC Program,
Office of Science, U.S. Department of Energy,
under Contract DE-AC02-06CH11357. The authors
also acknowledge the support of the BellSouth
Foundation. Any opinions, findings and
conclusions or recommendations expressed in this
material are those of the authors and do not
necessarily reflect the views of the National
Science Foundation or BellSouth Foundation.
Experiments were carried out using the Grid'5000
experimental testbed, being developed under the
INRIA ALADDIN development action with support
from CNRS, RENATER and several Universities as
well as other funding bodies (see
https//www.grid5000.fr).