Title: Globus Virtual Workspaces
1Globus Virtual Workspaces
- HEPiX Fall 2007, St Louis
- Kate Keahey
- Argonne National Laboratory
- University of Chicago
- keahey_at_mcs.anl.gov
2Why Virtual Workspaces?
- Quality of Service
- We get batch-style provisioning
- One size fits all
- Side-effect of job scheduling
- We need advance reservations, urgent computing,
periodic, best-effort, and others - Separation of job scheduling and resource
management - E.g. workflow-based apps and batch apps have
different needs - Quality of Life
- We have I have a 100 nodes I cannot use
- Complex applications
- Hard to install
- Require validation
- Separation of environment preparation and
resources leasing
3What are Virtual Workspaces?
- A dynamically provisioned environment
- Environment definition we get exactly the
(software) environment we need on demand. - Resource allocation Provision the resources the
workspace needs (CPUs, memory, disk, bandwidth,
availability), allowing for dynamic renegotiation
to reflect changing requirements and conditions. - Implementation
- Traditional means publishing, automated
configuration, coarse-grained enforcement - Virtual Machines encapsulated configuration and
fine-grained enforcement
Paper Virtual Workspaces Achieving Quality of
Service and Quality of Life in the Grid
4Virtual Machines (Xen)
- Open source
- Paravirtualization
- The Good high-performance
- The Bad difficult to run proprietary OSs, and to
mix 32-bit and 64-bit kernels (VT needed) - Xen terminology
- Domain0 (the host),
- DomainU (user domain, the guest)
5Deploying WorkspacesRemotely
Pool node
Pool node
Pool node
VWS Service
Pool node
Pool node
Pool node
- Workspace
- Workspace metadata
- Pointer to the image
- Logistics information
- Deployment request
- CPU, memory, node count, etc.
Pool node
Pool node
Pool node
Pool node
Pool node
Pool node
6Interacting with Workspaces
The workspace service publishes information on
each workspace as standard WSRF
Resource Properties.
Pool node
Pool node
Pool node
VWS Service
Pool node
Pool node
Pool node
Users can query those properties to find
out information about their workspace (e.g. what
IP the workspace was bound to)
Pool node
Pool node
Pool node
Pool node
Pool node
Pool node
Users can interact directly with their workspaces
the same way the would with a physical machine.
Trusted Computing Base (TCB)
7Workspace Service Components
Workspace WSRF front-end that allows clients to
deploy and manage virtual workspaces
VWS Service
Pool node
Pool node
Pool node
Workspace back-end
Pool node
Pool node
Pool node
Resource manager for a pool of physical
nodes Deploys and manages Workspaces on the
nodes
Pool node
Pool node
Pool node
Each node must have a VMM (Xen)? installed, as
well as the workspace control program that
manages individual nodes
Pool node
Pool node
Pool node
Contextualization creates a common context for a
virtual cluster
Trusted Computing Base (TCB)
8Workspace Service Components
- GT4 WSRF front-end
- Leverages GT core and services, notifications,
security, etc. - Follows the OGF WS-Agreement provisioning model
- Publishes available lease terms
- Provides lease descriptions
- Workspace Resource Manager (back-end)
- Currently focused on Xen
- Works with multiple Resource Managers
- Workspace Control
- Contextualization
- Put the virtual appliance in its deployment
context - Current release 1.3, available at
- http//workspace.globus.org
9Workspace Resource Managers
- Default resource manager (basic slot fitting)
- Commercial datacenter technology would also fit
- Amazon Elastic Compute Cloud (EC2)
- EC2 Selling cycles as Xen VMs
- Software similar to Workspace Service
- No virtual clusters, contextualization,
fine-grain allocations, etc. - Grid credential admission - EC2 charging model
- STAR 100 node VM run
10Virtual Workspaces for STAR
- STAR image configuration
- A virtual cluster composed of an OSG headnode and
STAR worker nodes - Using the workspace service over EC2 to provision
resources - Allocations of up to 100 nodes
- Dynamically contextualized for out-of-the-box
cluster
11Workspace Resource Managers
- Default resource manager (basic slot fitting)
- Commercial datacenter technology would also fit
- Amazon Elastic Compute Cloud (EC2)
- EC2 Selling cycles as Xen VMs
- Software similar to Workspace Service
- No virtual clusters, contextualization,
fine-grain allocations, etc. - Grid credential admission - EC2 charging model
- STAR 100 node VM run
- Workspace Pilot
- Integrating VMs into current provisioning models
- Long-term solutions
- Interleaving soft and hard leases
- Providing better articulated leasing models
- Developed in the context of existing schedulers
12Providing Resources The Workspace Pilot
- Challenge find the simplest way to integrate VMs
into current provisioning models - Glide-ins (Condor) poor mans resource leasing
- Best-effort semantics submit a job pilot that
claims resources but does not run a job
- The Workspace Pilot
- Resources booted to dom0
- Pilot adjusts memory
- VWS leases slots to VMs
- Kill-all facility
13Workspace Control
- VM control
- Starting, stopping etc.
- To be replaced by Xen API
- Integrating into the network
- Assigning MAC addresses and IP addresses
- DHCP Delivery tool
- Building up a trusted networking layer
- VM image propagation
- Image management and reconstruction
- creating blank partitions
- Talks to the workspace service via ssh
14Security Issues
- Secure admission of appliances/workspaces
- The appliance vendor configures the appliance,
asserts its properties and signs them to the
appliance - Security and other updates, configuration and
versioning assertions, disallowing offsite root
access, etc. - The appliance deployer validates the signature
and matches the assertions to policies - SC05 Poster Making your workspace secure
establishing trust with VMs in the Grid - Secure networking
- Controlling spoofing
- Isolating networks between different VM groups
- Traffic monitoring
15So -- youve deployed some VMs Now what?
- Do they have public IP addresses?
- Do they actually represent something useful?
- I need an OSG cluster
- How do the VMs find out about each other?
- Can they share storage?
- Do they have host certificates?
- And gridmapfile?
- And all the other things that will integrate them
into my VO?
16Virtual Clusters
- Challenge what is a virtual cluster?
- A more complex virtual machine
- Networking, shared storage, etc. that will be
portable across sites and implementations - Available at the same time and sharing a common
context - Example
- A set of worker nodes with some edge services in
front and NFS-based shared storage - Solution management of ensembles and sharing
- Ensemble deployment, EPR management
- Flexible, configurable cluster deployment
- Networking
- Edge Services have public IPs
- Worker nodes are on a private network shared with
the Edge Services - Exporting and sharing a common context
- Configuring and joining context
Paper Virtual Clusters for Grid Communities,
CCGrid 2006
17Contextualization
- Challenge Putting a VM in the deployment context
of the Grid, site, and other VMs - Assigning and sharing IP addresses, name
resolution, application-level configuration, etc.
- Solution Management of Common Context
- Configuration-dependent
- providesrequires
- Common understanding between the image vendor
and deployer - Mechanisms for securely delivering the required
information to images across different
implementations
contextualization agent
Common Context
IP hostname pk
Paper A Scalable Approach To Deploying And
Managing Appliances, TeraGrid conference 2007
18Where Do VM Images Come From?
- Appliance providers
- Appliance providers configure, manage, attest
images - Contextualization collaboration between
appliance vendors and appliance deployers - Appliance providers
- rPath
- Recipe-style configuration (create a project,
choose packages, cook, build the software
appliance_ - Freely available online, many appliances
- http//www.rpath.com/rbuilder/
- Bcfg2
- Incrementally constructed configuration profiles
- Configuration analysis capabilities
- http//trac.mcs.anl.gov/projects/bcfg2
19Image Management
- Image partitions
- Efficiency
- Security
- Flexibility
- Partition management on deployment
- Partition caching and generation
- Partition sharing
- Mounting
20Workspace Ecosystem
21Parting Thoughts
- VMs are the raw materials from which a working
system can be built - But we still have to build it!
- Technical challenges taking one step at a time
- Social/procedural challenges
- Division of labor
- Resource providers
- Appliance providers
- Can we build trust between these two groups?
- If you have a specific problem, give us a call
- http//workspace.globus.org
- In our copious spare time we also do research
- Migration, fine-grained enforcement, resource
management, load balancing, migration in time,
lots of one-offs - VTDC07 (co-located with SC07)
22Acknowledgements
- Workspace team
- Kate Keahey
- Tim Freeman
- Borja Sotomayor
- Funding
- NSF SDCI Missing Links
- NSF CSR Virtual Playgrounds
- DOE CEDPS Project
- With thanks to many collaborators
- Jerome Lauret (STAR, BNL), Doug Olson (STAR,
LBNL), Marty Wesley (rPath), Stu Gott (rPath),
Ken Van Dine (rPath), Predrag Buncic (Alice,
CERN), Haavard Bjerke (CERN), Rick Bradshaw
(Bcfg2, ANL), Narayan Desai (Bcfg2, ANL), Duncan
Penfold-Brown (Atlas,uvic), Ian Gable (Atlas,
uvic), David Grundy (Atlas, uvic), Ti Leggit
(University of Chicago), Greg Cross (University
of Chicago), Mike Papka (University of
Chicago/ANL)
23with thanks to Jerome Lauret and Doug Olson of
the STAR project
Running jobs 230
Running jobs 150
Running jobs 150
Running jobs 142
Running jobs 124
Running jobs 109
Running jobs 94
Running jobs 73
Running jobs 42
Running jobs 0
VWS/EC2
BNL
Running jobs 300
Running jobs 300
Running jobs 300
Running jobs 282
Running jobs 243
Running jobs 221
Running jobs 195
Running jobs 140
Running jobs 76
Running jobs 0
WSU
Fermi
Running jobs 150
Running jobs 200
Running jobs 195
Running jobs 183
Running jobs 152
Running jobs 136
Running jobs 96
Running jobs 54
Running jobs 37
Running jobs 0
Running jobs 50
Running jobs 50
Running jobs 42
Running jobs 39
Running jobs 34
Running jobs 27
Running jobs 21
Running jobs 15
Running jobs 9
Running jobs 0
PDSF
Job Completion
File Recovery
24with thanks to Jerome Lauret and Doug Olson of
the STAR project
with thanks to Jerome Lauret and Doug Olson of
the STAR project
Nersc PDSF
EC2 (via Workspace Service)
WSU
Accelerated display of a workflow job state Y
job number, X job state