Title: GlobusWorld
1GlobusWorld 4Globus Execution ServicesWhat's
New in 4.0 and 4.2 GRAM, What's Planned for the
FutureStuart Martin - UC/ANLVirtual Machine
Management ServicesKate Keahey -
UC/ANLExperiences with the use of GRAM in the
LEAD portalSuresh Marru - Indiana UniversityThe
GridWay MetaschedulerRubén S. Montero -
Universidad Complutense de MadridSwift and
FalkonIoan Raicu - UC
2What's New in 4.0 and 4.2 GRAM, What's Planned
for the Future
- Stuart Martin
- Argonne National Lab
3Session Overview
- Overview of GRAM
- High-level client examples
- Service Interface Details
- Goals for Service Quality
- Future Plans
4What is GRAM?
- GRAM is a Globus Toolkit component
- For Grid job management
- GRAM is a unifying remote interface to Resource
Managers - Yet preserves local site security/control
- GRAM is for stateful job control
- Reliable create operation
- Asynchronous monitoring and control
- Remote credential management
- Remote file staging and file cleanup
- GRAM is middleware
- End Users Beware!!
5Grid Job Management Goals
- Provide a service to securely
- Create an environment for a job
- Stage files to/from environment
- Cause execution of job process(es)
- Via various local resource managers
- Monitor execution
- Signal important state changes to client
- Enable client access to output files
- Streaming access during execution
6Traditional Interaction
- Satisfies many users and use cases
- TACCs Ranger (62976 cores!) is the Costco of HTC
-), one stop shopping, why do we need more?
Local Jobs
Scheduler (e.g., PBS)
Compute Nodes
Resource A
6
7GRAM Benefit
remoteGRAM4Jobs
- Add remote execution capability
- Enable client/devices to manage jobs from off of
the cluster (PDA)
gramJob API
Local Jobs
GRAM4 Service
Scheduler (e.g., PBS)
Compute Nodes
Resource A
7
8GRAM Benefit
- Provides scheduler abstraction
GRAM4Jobs
gramJob API
Local Jobs
Local Jobs
GRAM4 Service
GRAM4 Service
Scheduler (e.g., PBS)
Scheduler (e.g., LSF)
Compute Nodes
Compute Nodes
Resource A
Resource B
8
9GRAM Benefit
- Scalable job management
- Interoperablility
GRAM4 jobs
gramJob API
9
10Users/Applications Job Brokers, Portals,
Command line tools, etc.
GRAM WSDLs Job Description Schema (executable,
args, env, )
WSRF standard interfaces for subscription,
notification, destruction, EPRs
GRAM4
Resource Managers PBS, Condor, LSF, SGE,
Loadleveler, Fork
11GT4 and Web Services
Custom Web Services
Custom WSRF Services
GT4WSRF Web Services
Registry and Admin
GT4 Container(e.g., Apache Axis)
WS-A, WSRF, WS-Notification
WSDL, SOAP, WS-Security
12Higher-level Clients and User Examples
13caBIG and Globus
- caGrid is built on top of Globus 4 WSRF Java Core
and Security
14caBIG - TeraGridIntegration
- Leave caGrid service infrastructure as is with
the exception of the analytical services.
globus
15caBIG - TeraGrid Gateway
16caBIG - TeraGrid Gateway
17Hierarchical Clustering Results
18 Condor-G Architecture
Personal Condor
Remote Resource
19GridWay Components
DRMAA library
CLI
Job Submission Job Monitoring Job Control Job
Migration
GridWay Core
Request Manager
Job Pool
Host Pool
Dispatch Manager
Scheduler
Execution Manager
Transfer Manager
Information Manager
Job Preparation Job Termination Job Migration
Resource Discovery Resource Monitoring
Information Services
File Transfer Services
Execution Services
20 GridWay / Condor-G Benefit
- Scalable job management
- Throttling
- Metascheduling
GridWay jobs
gramJob API
20
21Falkon Overview
- Falkons goal enable the rapid and efficient
execution of many independent jobs on large
compute clusters combines streamlined task
dispatching, resource provisioning, and data
diffusion
- Dynamic Resource Provisioning ? uses GRAM4 to
provision resources on-demand based on load (i.e.
wait queue length)
22AstroGrid-D
- GEO600
- eScience group at Albert Einstein Institute
- part of LIGO, running the same Einstein _at_ home
app, but on grid resources (not user desktops) - Currently using D-Grid and OSG resources
- In production since Oct 07
- All jobs submitted using WS-GRAM (globusrun-ws)
- Averaging 4000 jobs per day
23AstroGrid-D Performance
- 1 as reported on Einstein_at_home top users
- http//einstein.phys.uwm.edu/top_users.php
24 AstroGrid-D
- Scalable job management
- Throttling
- (Metascheduling)
GEO600 jobs
GEO600
globusrun-ws
- Uses Cron
- Save state (EPRs) onthe file system
24
25 Nektar-G2 Large Scale Flow
Simulationson TeraGrid
- The new version of Nektar-G2 features direct
coupling between 3D domains. - Increased volume of data transfer between 3D
blocks requires high level of parallelism. - MPIg is used for intra-site and inter-site
communications on TeraGrid.
UC/ANL
NCSA
IU
RENO
SDSC
26MPIg
- Grid-enabled MPI
- Co-scheduling
- GUR GARS
- Rendezvous
mpirun
GUR
globusrun-ws
GRAM4
multi
GRAM4
GARS
R
R
R
Scheduler
Scheduler
Compute Nodes
Compute Nodes
P
P
P
P
P
P
P
P
Resource A
Resource B
26
27GRAM4Usage Statistics
- Averaging 692K per month in 2008
- 40 unique domains reported jobs in March 08
- e.g. .edu, .net, .gov, .org, .de, .es, .uk, .jp,
- Additional Details - http//www-unix.mcs.anl.gov/
liming/stats/
28Service Interface Details
29Job Submission Options
- Optional file staging
- Transfer files in before job execution
- Transfer files out after job execution
- Optional file streaming
- Monitor files during job execution
- Optional credential delegation
- Create, refresh, and terminate delegations
- For use by job process
- For use by GRAM to do optional file staging
30Job Submission Monitoring
- Monitor job lifecycle
- GRAM and scheduler states for job
- StageIn, Pending, Active, Suspended, StageOut,
Cleanup, Done, Failed - Job execution status
- Return codes
- Multiple monitoring methods
- Simple query for current state
- Asynchronous notifications to client
31Secure Submission Model
- Secure submit protocol
- PKI authentication
- Authorization and mapping
- Based on Grid ID
- Further authorization by scheduler
- Based on local user ID
- Secure control/cancel
- Also PKI authenticated
- Owner has rights to his jobs and not others
32Secure Execution Model
- After authorization
- Execute job securely
- User account sandboxing of processes
- Initialization of sandbox credentials
- Client-delegated credentials
- Multiple levels of audit possible
- Container
- GRAM Service Auditing
- Sudo
- Local resource manager
33Secure Staging Model
- Before and after sandboxed execution
- Perform secure file transfers
- Create RFT request
- To local or remote RFT service
- PKI authentication and delegation
- In turn, RFT controls GridFTP
- Using delegated client credentials
- GridFTP
- PKI authentication
- Authorization and mapping by local policy files
- further authorization by FTP/unix perms
34GRAM4 Architecture
Service host(s) and compute element(s)
SEG
Job events
GT4 Java Container
Compute element
GRAM services
Local job control
GRAM services
Local scheduler
Job functions
sudo
GRAM adapter
Delegate
Transfer request
Client
Delegation
Delegate
GridFTP
User job
RFT File Transfer
FTP control
FTP data
Remote storage element(s)
GridFTP
35GRAM4 Architecture
Service host(s) and compute element(s)
SEG
Job events
GT4 Java Container
Compute element
GRAM services
Local job control
GRAM services
Local scheduler
Job functions
sudo
GRAM adapter
Delegate
Transfer request
Client
Delegation
Delegate
GridFTP
User job
RFT File Transfer
FTP control
FTP data
Remote storage element(s)
Same delegated credential can be Made available
to the user application
GridFTP
36GRAM4 Architecture
Service host(s) and compute element(s)
SEG
Job events
GT4 Java Container
Compute element
GRAM services
Local job control
GRAM services
Local scheduler
Job functions
sudo
GRAM adapter
Delegate
Transfer request
Client
Delegation
Delegate
GridFTP
User job
RFT File Transfer
FTP control
FTP data
Remote storage element(s)
Same delegated credential can be used to
authenticate with RFT
GridFTP
37GRAM4 Architecture
Service host(s) and compute element(s)
SEG
Job events
GT4 Java Container
Compute element
GRAM services
Local job control
GRAM services
Local scheduler
Job functions
sudo
GRAM adapter
Delegate
Transfer request
Client
Delegation
Delegate
GridFTP
User job
RFT File Transfer
FTP control
FTP data
Remote storage element(s)
Same delegated credential can be used to
authenticate with GridFTP
GridFTP
38GRAM Service Auditing
- As of 4.0.5, GRAM generates a DB record per job
- Submitters DN, mapped user id, grid job id,
local job id, stageIn job id, - Part of TeraGrids solution to support community
credential access from gateways - Provides the capability for a TG grid user to get
TG usage info using a grid job id (generated from
GRAM EPR) - Audit DB records provide join between grid job id
and local TG accounting DB (job records)
39TG Gateway Job Accounting
TeraGrid Resource Provider (RP)
-No Changes required to AMIE-DAI provides
virtualization for audit and accounting DBs
GT4 Java Container
Core Audit Table
Core
Deleg Audit Table
Delegation
RFT Audit Table
RFT
Client / Gateway
sudo
RM adapter
Create Job Get EPR
Control Jobwith EPR
MJFS
Resource Manager
RM log
- Query Using Grid JID
SEG
MEJS
GRAM Audit Table
RM Accounting
- Reply with Accounting record
User Job(s)
OGSA DAI
Local AMIE Accounting
Locally convert EPR to Grid JID
AMIE upload
Central TG Accounting DB
40 Goals for Service Quality
41GRAM should
- add little to no overhead over to the underlying
batch system - Load cpu, memory, disk
- be able to manage as many jobs as the underlying
batch system - Max concurrency
- be able to keep a large cluster filled with jobs
- Throughput
- be reliable
- Not fall over / crash
- Service initiated recovery
42Effect on Load
- UCSD testing results, Jan 08, Terrence Martin
- Version GRAM4 in GT 4.0.5
- Scenario Condor-G submitting 1000 jobs to the
same GRAM4 service - Service load was comparable to GRAM2
- Client memory was significantly higher than GRAM2
- This was improved in 4.0.7, saving cloned EPRs
reduced memory significantly - Client load was higher than GRAM2
- Probably mostly due to the inefficient use of
memory - Retesting with 4.0.7 is coming
43Max Concurrency
- Total jobs a GRAM4 service can manage at one time
without failure - Early 4.0 testing achieved 32,000
- Has not been an issue
44Throughput
- Number of jobs GRAM4 can process per second
- We are improving and formalizing our throughput
testing - 1000 Condor-G jobs with
- file staging, unique job dir and file cleanup to
Condor as the LRM - Achieved processing 1 job every 2 seconds
- file staging, unique job dir and file cleanup to
Fork as the LRM - Achieved processing 1 job every 1 second
- No staging, no delegation to Fork as the LRM
- Achieved processing 2 jobs every 1 second
45Throughput Results
- Average seconds (3 runs) fork 1000 jobs
- Condor-g to GRAM to Condor LRM
46Job Bursts
- Many simultaneous job submissions
- Recent TeraGrid testing showed limit around 200
simultaneous jobs (upcoming TG08 paper) - Most basic job was tested
- no staging, no delegation jobs
- Are the error conditions acceptable?
- Job should be rejected or timeout before
overloading the service container or service host - For tests gt 200, many client-side timeouts, which
is ok - but also other service/container issues that
need further investigation
47Reliability
- Service/Container should not fall over / crash
- Recent TeraGrid testing also showed a limit at
10,000 jobs - Repeated 200 simultaneous job submission tests,
until 10,000 jobs in total, results in container
requiring to be restarted - needs investigation - OSG/CMS/ATLAS Goal
- handle 10k jobs from 2-3 users from 2-3 Condor-G
instances - reliably - Basic service recovery tests are successful, more
sophisticated tests need to be written
48GRAM4 Improvements in 4.2 Series
- Job termination interface
- Asynchronous model replaces blocking destroy
operation - Prevents a core thread being consumed per destroy
- New final job states
- userCancelDone, userCancelFailed
- Better Job lifetime controls for users and admins
- maxJobLifetime, jobTTLAfterProcessing
- Notification interface for GramJob API
- Now supports a single client notification
consumer for many job submissions
49GRAM4 Improvements in 4.2 Series
- Improved documentation
- Multi-threaded client example
- Using the GRAM4 throughput-tester program
- Useful for higher-level clients as optimized code
examples - GridWay, Condor-G, Science Gateways, Portals
- CEDPS logging
- Service logging formatted to be parsable /
scriptable - Related service events are linked with IDs
- Core session ID ltgt security authorization ltgt job
submission - Migration guides
- GRAM2 to 4
- GRAM4 from 4.0 to 4.2
50GRAM4 DependenciesImprovements in 4.2 Series
- Java WS Core
- Upgrade to final WSRF specifications
- Makes 4.0 and 4.2 series wire incompatible
- Multiple service deployments during transition
- Notification performance Improvements
- Also in 4.0 series since 4.0.6
- http connection caching
- Maybe only useful for submission and
notifications - Must reuse client stub to benefit
51GRAM4 DependenciesImprovements in 4.2 Series
- Java Authorization Framework
- Pluggable PIPs, PDPs and combining algorithm
- More sophisticated authorization capabilities for
GRAM4 - GridShib
52GRAM4 DependenciesImprovements in 4.2 Series
- RFT
- GridFTP connection caching
- manage a cache of GridFTP server connections
amongst all RFT requests - improve performance and reliability for GRAM4
file staging jobs, 30 in some tests - Also in 4.0 series since 4.0.6
- Exponential backoff and retry improvements
53Dev.Globus - Open Development
- Globus governance model based on Apache
- Developers (committers) control direction of
software components (projects) - dev.globus.org
- GRAM project
- dev.globus.org/wiki/GRAM
- Email lists
- gram-user, -dev, -announce, -commit
- GT project email lists
- gt-user, gt-dev
54Documentation
- 4.0.x GRAM documentation
- Guides admin, user, developer, overview, public
interface - www.globus.org/toolkit/docs/4.0/execution/wsgram/
- 4.2.x GRAM documentation (coming end May 08)
- www.globus.org/toolkit/docs/4.2/execution/wsgram/
- Main 4.0.x documentation
- www.globus.org/toolkit/docs/4.0/
- Download, release notes, links to all GT
projects/ components
55Future Plans
56GRAM Auditing v2
- Strong community feedback
- Feature requests from OSG, SURA, D-Grid, APAC
- Getting development contribution from Shawn McKee
(University of Michigan) - Adding actions and timestamps needed for a
complete view of service interactions - Improve security
- Remove need for update privs to DB, inserts only
- Desire for integration with LRM accounting
- GRAM Fork accounting wanted
- Design complete system, but leave accounting
integration to others - Targeting 4.2 Q408
57v2 Auditing tables
GT4 Java Container
Core Audit Table
- After request is received
- Include client hostname/IP address
Core
Deleg Audit Table
Delegation
RFT Audit Table
RFT
Sec Audit Table
Security
- After request is authorized
- Include credential attributes
MJFS
GRAM Audit Table(s)
- After job is received
- After job is submitted
- After job is done
- After job termination started
- Client initiated
- Lifetime expired
- Service processing failed
MEJS
58Improve Reliability
- Improve tests to simulate user scenarios
- Work with TG and OSG users on the current issues
and limitations - Make improvements where necessary
- GRAM4, RFT, Java WS Core, higher-level clients
59Prologue / Epilogue?
- A recent request for the ability to have GRAM4
define and run a prologue and epilogue - Maps to the LRM capability for this same
functionality. - Along with the LRM job submission include
- Prologue
- A program run before the main application
execution - Epilogue
- A program run after the main application
execution - Seems like valuable functionality, needs
investigation for use with various LRMs - http//bugzilla.globus.org/bugzilla/show_bug.cgi?i
d5698
60Standards Compliance
- JSDL
- Hopefully can resume JSDL work in 2008
- OGSA-BES
- No specific plans yet for support
61Globus Advance Reservation Service (GARS)
62Globus Advance Reservation Service (GARS)
- Globus incubator project
- Enable GT4 users to create and manage advance
reservations of compute nodes. - Leverage advance reservation support provided by
LRMs via adapter interface (ala GRAM) - MOAB, Catalina, MAUI
- Leverage GT4 Authn and Authz security model /
callouts - Service middleware for co-schedulers
- Complementary with execution services like GRAM
- SC07 demod alpha version of GARS
- GUR co-scheduled 2 GARS running on TeraGrid RPs
- Testbed running on TeraGrid
- UC/ANL, SDSC, NCSA
63GARS Approach
ARFS
LRM Adapter
LRM
create reservations
ARS
manage
Client
J1
J1
J3
J3
create jobs
J1
J1
J3
MJFS
J2
J2
manage
MJS
Compute Cluster
- Client creates
- a reservation for 4 nodes
- a 1st job (J1) with 4 processes
- a 2nd job (J2) with 2 processes
- a 3rd job (J3) with 3 processes
GT WSRF Container
64Thanks to the GRAM developers!
- Martin Feller - UC
- Joe Bester - ANL
- John Bresnahan - ANL
- Ravi Madduri - ANL
- Dina Sulakhe - UC
- Plus the entire GT dev team