Title: Edge Services Framework (ESF) in Open Science Grid
1An Edge Services Framework (ESF) for EGEE, LCG,
OSG
The XVth International Conference on Computing
in High Energy and Nuclear Physics
(CHEP06) February 15, 2006 TIFR, Mumbai
Abhishek Singh Rana UC San Diego rana_at_fnal.gov
Frank Würthwein UC San Diego fkw_at_fnal.gov
2Authors (ESF mailing list)
RANA, Abhishek Singh (University of California,
San Diego, CA, USA) WUERTHWEIN, Frank (University
of California, San Diego, CA, USA) GARDNER,
Robert (University of Chicago, IL, USA) KEAHEY,
Kate (Argonne National Laboratory, IL,
USA) FREEMAN, Timothy (Argonne National
Laboratory, IL, USA) VANIACHINE, Alexandre
(Argonne National Laboratory, IL, USA) HOLZMAN,
Burt (Fermi National Accelerator Laboratory, IL,
USA)
MALON, David (Argonne National Laboratory, IL,
USA) MAY, Ed (Argonne National Laboratory, IL,
USA) POPESCU, Razvan (Brookhaven National
Laboratory, Upton, NY, USA) SOTOMAYOR, Borja
(University of Chicago, IL, USA) SHANK, Jim
(Boston University, MA, USA) LAURE, Erwin (CERN
(European Organization for Nuclear Research),
Geneva, Switzerland) BIRD, Ian (CERN (European
Organization for Nuclear Research), Geneva,
Switzerland) SCHULZ, Markus (CERN (European
Organization for Nuclear Research), Geneva,
Switzerland) FIELD, Laurence (CERN (European
Organization for Nuclear Research), Geneva,
Switzerland) PORDES, Ruth (Fermi National
Accelerator Laboratory, IL, USA) SKOW, Dane
(Fermi National Accelerator Laboratory, IL,
USA) LITMAATH, Maarten (CERN (European
Organization for Nuclear Research), Geneva,
Switzerland) CAMPANA, Simone (CERN (European
Organization for Nuclear Research), Geneva,
Switzerland) WENAUS, Torre (Brookhaven National
Laboratory, Upton, NY, USA) SMITH, David (CERN
(European Organization for Nuclear Research),
Geneva, Switzerland) BLUMENFELD, Barry (Johns
Hopkins University, Baltimore, MD, USA) MARTIN,
Stuart (Argonne National Laboratory, IL, USA) DE,
Kaushik (The University of Texas, Arlington, TX,
USA) VRANICAR, Matthew (PIOCON, IL, USA) WEICHER,
John (PIOCON, IL, USA) SMITH, Preston (Purdue
University, IN, USA) WANG, Shaowen (University of
Iowa)
3Outline
- ESF Activity
- ESF Phase 1
- Concepts and Design
- ESF future direction
- Xen overview
- Phase 1
- Status
- Next Steps
4Vision
5Can there be a shared Services Frameworkthat
makes site admins happy?
- No login access to strangers.
- Isolation of services.
- VOs cant affect each other.
- VOs receive a strictly controlled environment.
- Encapsulation of services.
- Service instances can receive security review by
site before they get installed. - Explore solution based on virtual machines.
6OSG-ESF Activity
- Started in September 2005.
- Physicists, Computer Scientists Engineers,
Software Architects. - Chairs Kate Keahey and Abhishek Singh Rana.
- Workspace Services Architecture and Design
- Globus Alliance and UC San Diego.
- Edge Services Implementations
- USATLAS Teams at U Chicago and ANL.
- USCMS Teams at UC San Diego and FNAL.
- Mailing List and Discussion Forum
- osg-edgeservices_at_opensciencegrid.org
- Web collaborative area
- http//osg.ivdgl.org/twiki/bin/view/EdgeServices
- http//www.opensciencegrid.org/esf
7ESF - Phase 1
8No ESF - Phase 0
SE
CE
Site
9No ESF - Phase 0
Static Deployment of VO Services on a Site
SE
CE
CMS
ATLAS
CDF
Site
10ESF?
SE
CE
Site
11ESF - Phase 1
Snapshot of ES Wafers implemented as Virtual
Workspaces
ESF
ATLAS
CMS
SE
CE
CDF
Guest VO
Site
12An attempt at ESF Terminology
- Edge Services Wafer (ES Wafer)
- A specific instance of a dynamically-created VM
(workspace) is called an Edge Services Wafer. - An ES Wafer can have several Edge Services
running. - A VO can have multiple ES Wafers up at a Site.
- Edge Services Slot (ES Slot)
- An ES Slot has hardware characteristics specified
by the Site Admin. - An ES Slot can be leased by a VO to host an ES
Wafer. - Edge Service (ES)
- A VO-specific service instantiated by a VO in a
Wafer. - Workspace Service (WS)
- Service at a Site that allows VOs to instantiate
ES Wafers in ES Slots.
13ESF - Phase 1
Snapshot of ES Wafers implemented as Virtual
Workspaces
GT4 Workspace Service VMM
Dynamically deployed ES Wafers for each VO
ESF
Wafer images stored in SE
ATLAS
CMS
SE
CE
CDF
Guest VO
Site
Compute nodes and Storage nodes
14User jobs at Compute nodes using ES Wafers for
VO Edge Services
ESF
ATLAS
CMS
SE
CE
CDF
Guest VO
Site
15VO Admin transporting/storing ES image to a
remote Site....Deploying ES using image stored
in Sites local repository
16ESF - Phase 1
RoleVO Admin
CMS
ESF
SE
CE
Site
17ESF - Phase 1
RoleVO Admin
CMS
ESF
PEP
SE
CE
Site
18ESF - Phase 1
RoleVO Admin
CMS
ESF
SE
CE
Site
19ESF - Phase 1
RoleVO Admin
ESF
SE
CE
Site
20ESF - Phase 1
RoleVO Admin
PEP
ESF
SE
CE
Site
21ESF - Phase 1
RoleVO Admin
ESF
SE
CE
Site
22ESF - Phase 1
RoleVO Admin
ESF
PEP
SE
CE
Site
23ESF - Phase 1
RoleVO Admin
ESF
CMS
SE
CE
Site
24ESF - Phase 1
RoleVO Admin
ESF
CMS
SE
CE
Site
25ESF - Phase 1
RoleVO Admin
ESF
CMS
SE
CE
Site
26ESF - Phase 1
RoleVO Admin
ESF
CMS
SE
CE
ES Wafer (Multiple VO Services at a Sites Edge)
Site
27A VO User using ES..
28ESF - Phase 1
RoleVO User
ESF
CMS
SE
CE
Site
29ESF - Phase 1
RoleVO User
ESF
CMS
PEP
SE
CE
Site
30ESF - Phase 1
RoleVO User
ESF
CMS
SE
CE
Site
31ESF - Phase 1
RoleVO User
ESF
CMS
SE
CE
PEP
Site
32ESF - Phase 1
RoleVO User
ESF
CMS
SE
CE
Resource Slice (User execution environment at a
WN)
Site
33ESF - Phase 1
RoleVO User
ESF
CMS
SE
CE
Site
34ESF - Phase 1
RoleVO User
ESF
CMS
SE
CE
PEP
Site
35ESF - Phase 1
RoleVO User
ESF
CMS
SE
CE
Site
36ESF - future direction
37ESF - future direction
- Same concept.
- Deeploy a cluster of ES slots that are fully
schedulable by any VO allowed at the grid site.
38ESF - future direction
ESF
Brokering and Scheduling by Edge Services
Framework
Dynamically deployed ES Wafers for many VOs
ATLAS2
CMS
ATLAS1
CDF
Cluster of ES Slots with different properties
Site
39Xen overview
Public Network
Private Network
Virtual Machine Monitor (VMM)
Hardware
40Phase 1 on OSG
41Phase 1 on OSG
- ATLAS CMS procure one piece of hardware at
their Sites on OSG that runs ESF (called ESF
node). - Dual CPU recommended.
- 4GB RAM (Xen2 has no PAE support, Xen3 has.).
- Site administrators install
- Xen (Xen 2.0.7, Xen 3.0.0).
- GT4 Workspace Service.
- VO administrators use ESF to fire-up Xen VMs that
instantiate VO Services Edge Services in an ES
Wafer. - A single ESF node hosts ES Wafers for both ATLAS
CMS.
42Site Administrator Responsibilities
- Deploy
- Xen.
- Custom kernel for domain 0 (Grub bootloader
required). - Custom kernel for domain U.
- Prepare RAMdisk image if needed.
- GT4.
- GT4 Workspace Service.
- Provision
- One public IP, One private IP per VM.
- Host certificates per VM.
- Disk space per VM.
- Declare available ES Slots and their properties
to ESF.
43VO Administrator Responsibilities
- Fetch a standard OS filesystem image from a
central ESF repository. - Deploy the desired service on OS filesystem
image. Thus, prepare (freeze) ES Wafer instance. - Develop portable methods to dynamically configure
all networking properties at a remote Site.
Package these. - Prepare this image into file for transport.
- SRMCP the image to remote Sites SE.
- Use ESF to fire-up Xen VM with VO-Services (ES
Wafer) at remote Site, from image file in remote
SE, using role based authorization. - Advertise the running Edge Services as needed.
44Status
- New features added to GT4 Workspace Service.
- First prototype of ESF with Integration-testbed
(Xen2.0.7) consisting of sites at ANL, FNAL,
UCSD, U Chicago and a Production-testbed
(Xen3.0.0) with a site at UCSD. - Pure OS Filesystem Images SL3.0.3, SL4, LTS 3,
LTS 4, FC4, CentOS4. - USCMS Edge Service FroNTier (Squid db).
- USATLAS Edge Service DASH (MySQL db)
- General Edge Service A subset of OSG 0.4 CE.
- Stress/throughput testing performed at ANL and
UCSD. - Based on parts of above results, a publication
submitted for peer-review to IEEE HPDC-15.
45Partial list of features added to GT4 WSS(WSS
Release VM Technology Preview 1.1)
- Support for a new, allocate networking method
that allows the workspace service administrator
to specify pools of IP addresses (and DNS
information) which are then assigned to virtual
machines on deployment. - The resource properties have been extended to
publish deployment information about a
workspace, such as its IP address. - Workspace metadata validation has been extended
to support requirement checking for specific
architecture, Xen version, and CPU. The workspace
factory advertises the supported qualities as a
resource property the requirement section of
workspace metadata is checked against the
supported set. - The workspace service can now accept and process
VOMS credentials and GridShib SAML attributes. - Support for Xen3 has been added.
- The workspace client interface has been extended
to enable subscribing for notifications and
specifying the resource allocation information
at command-line. - Installation has been improved. The client now
requires only a minimal installation (as opposed
to the full service installation).
46Next Steps
- Verify performance, functionality, robustness.
- Gain production use experience.
- CDF is capable of failover operations between
multiple squids, thus allowing production use
experience without negative impact on users. - Example squid use cases
- DB cache (FroNTier)
- Application tarball serving (see glideCAF
OSG-CAF presentations) - Parrot based CDF software mounts.
- Further evolve GT4 Workspace Service design.
- Widen deployment to more USCMS and USATLAS sites,
using CMS ATLAS services as use cases.
47www.opensciencegrid.org/esf
48Thank You.