First Steps in the Clouds - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

First Steps in the Clouds

Description:

Operates on resources that can support jobs as well as VMs ... Wrappers for submission operation, scheduler signals to operate on VMs ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 16
Provided by: Office2004237
Category:
Tags: clouds | first | steps | vms

less

Transcript and Presenter's Notes

Title: First Steps in the Clouds


1
First Steps in the Clouds
  • Kate Keahey
  • keahey_at_mcs.anl.gov
  • University of Chicago
  • Argonne National Laboratory

2
Why Clouds?
  • Resource consumers
  • Individual users or Virtual Organization
  • Requirements
  • Customized environments for their
    services/applications
  • Services/applications can be short-lived
  • New environments/services deployed quickly and
    often
  • Resource providers
  • Own and operate physical resources
  • Requirements
  • Ability to monitor and control their resources
  • Provide resources at reasonable operational cost
  • Protection from activities performed by resource
    consumer
  • Consumers need to be able to lease (potentially
    for short-term) platforms that they can customize
    and control

3
Cloud Computing for Grid CommunitiesThe STAR
Application Use Case
4
The STAR Application
  • Complex experimental application codes
  • Developed over more than 10 years, by more than
    100 scientists, comprises 2 M lines of C and
    Fortran code
  • www.star.bnl.gov
  • Require complex, customized environments
  • Rely heavily on the right combination of compiler
    versions and available libraries
  • Dynamically load external libraries depending on
    the task to be performed
  • Environment validation
  • To ensure reproducibility and result uniformity
    across environments
  • Why do we need a cloud?
  • Resources with the right configuration are hard
    to find
  • A VM-based cloud gives us the required control

5
Running STAR in a Cloud
  • First Challenge finding VM-enabled resources
  • Amazon Elastic Compute Cloud (EC2)
  • More Challenges
  • Can we use X.509 certs to submit to a cloud? Can
    we use Grid access protocols? How much manual
    configuration do we need to do for a cluster that
    we need for 4 hours? How do we integrate the
    cluster into the Grid infrastructure?
  • Workspace Service
  • X.509 certificates are mapped to a project
    account
  • Grid access protocols
  • Creating a virtual cluster dynamically
  • Contextualization (cluster context) the cluster
    node VMs find out about each other and integrate
    that information at boot time
  • Integrating the cluster into the Grid
  • Contextualization (grid context) cluster is
    configured with appropriate host certs,
    gridmapfiles, etc.

6
with thanks to Jerome Lauret and Doug Olson of
the STAR project
with thanks to Jerome Lauret and Doug Olson of
the STAR project, presented at CHEP07
Running jobs 230
Running jobs 150
Running jobs 150
Running jobs 142
Running jobs 124
Running jobs 109
Running jobs 94
Running jobs 73
Running jobs 42
Running jobs 0
VWS/EC2
BNL
Running jobs 300
Running jobs 300
Running jobs 300
Running jobs 282
Running jobs 243
Running jobs 221
Running jobs 195
Running jobs 140
Running jobs 76
Running jobs 0
WSU
Fermi
Running jobs 150
Running jobs 200
Running jobs 195
Running jobs 183
Running jobs 152
Running jobs 136
Running jobs 96
Running jobs 54
Running jobs 37
Running jobs 0
Running jobs 50
Running jobs 50
Running jobs 42
Running jobs 39
Running jobs 34
Running jobs 27
Running jobs 21
Running jobs 15
Running jobs 9
Running jobs 0
PDSF
Job Completion
File Recovery
7
with thanks to Jerome Lauret and Doug Olson of
the STAR project
with thanks to Jerome Lauret and Doug Olson of
the STAR project, presented at CHEP07
Nersc PDSF
EC2 (via Workspace Service)
WSU
Accelerated display of a workflow job state Y
job number, X job state
8
What Did We Learn?
  • Performance was not an issue
  • The real comparison is having a resource to run
    on vs not having a resource to run on
  • Contextualization is key for dynamic virtual
    cluster deployment
  • Next steps a more challenging application

9
Cloud Computing for Grid Providers Building the
Science Cloud at the University of Chicago
10
Challenges
  • Virtualization adoption has been relatively slow
    among Grid Providers
  • Challenge integrating VMs into current
    provisioning models
  • Integrate into a site without disrupting the
    current operation of resources
  • I.e., be able to run jobs as well as VMs
  • Non-invasive from the perspective of currently
    used tools
  • E.g., no modification to the currently used
    schedulers and resource managers
  • Can be used alongside the current mode of
    operation
  • Batch jobs
  • Represent as small a change as possible
  • Operate within familiar metaphors
  • Avoid error-generating complexity

11
Roll Your Own Cloud
  • The Workspace Pilot
  • Operates on resources that can support jobs as
    well as VMs
  • E.g., have been booted into Xen domain 0
  • Non-invasive extension to batch schedulers (e.g.,
    PBS)
  • Wrappers for submission operation, scheduler
    signals to operate on VMs
  • Glidein approach submits a pilot program that
    prepares a resource slot for VM deployment
  • E.g., adjusts Xen domain 0 memory
  • Comes with administrator tools
  • E.g., kill-all

12
Workspace Pilot in Action
Level 1 provision raw resources
Level 2 provision VMs
Workspace Service
Xen dom0
LRM/PBS
Xen dom0
Xen dom0
VMs are decomissioned
raw resources are decomissioned
13
The Pilot Program
  • Uses Xen balloon driver to reduce/restore domain0
    memory so that guest domains (VMs) can be
    deployed
  • Secure VM deployment
  • The pilot requires sudo privilege and thus can be
    used only with site administrators approval
  • The workspace service provides fine-grained
    authorization for all requests
  • Signal handling
  • SIGTERM pilot exceeded its allotted time
  • Notifies VWS, allows it to clean up
  • After a configurable time period takes things
    into its hands.
  • Default policy one VM per physical node
  • Available for download
  • Workspace Release 1.3.1
  • http//workspace.globus.org/downloads/index.html

14
Nimbus _at_ UC
  • What is it?
  • The Science Cloud at University of Chicago
  • UC TeraPort cluster configured with the workspace
    pilot
  • Currently 16 nodes
  • What can it do for me?
  • Allow you to lease out a cluster of VMs
  • Who can use it?
  • Members of scientific community
  • In as much as usage policies will allow
  • What do I need to do if I want to use it?
  • Contact us keahey_at_mcs.anl.gov
  • You will need a VM image (we can help and know
    others who can), a certificate, and a simple
    client

15
Cloud Interoperability
  • Moving an app from a hardware platform to a cloud
    is relatively hard
  • Need to develop a VM image, learn about cloud
    computing, figure our logistics
  • Moving between clouds
  • E.g., STAR app EC2-gtScience Cloud and vice versa
    is very easy
  • Rough consensus on the interfaces needed to
    provision resources in the cloud
  • OGF gridvit-wg
  • Chairs Erol Bozak, Wolfgang Reichert
  • Define the requirements for integration of Grid
    architecture with system virtualization platforms
  • Exploring the impact of virtualization on Grid
    use cases
  • Exploring the relationship with standards (DMTF,
    etc.)
Write a Comment
User Comments (0)
About PowerShow.com