Title: gLite
1gLite
2Grid?
- Grid
- Many machines
- Across many locations and administrative domains
- Grid middleware runs on different systems
- High Performance Computer / Cluster
- High capacity Storage
- ?Meet the need of scientific computing
- Virtual Organizations correspond to a group
- of people, who desire to share computing, data
- or software resources
- Grid trusts VOs
- Users join a VO
- Virtual organisation contributes resources
- negotiates access
- Additional services also enable the grid
- Operation
- Dissemination
Internet
3Authentication, Authorisation (AA)
Users in many locations and organisations
Access services (user interface) i.e. logon,
upload credentials, run m/w commands
build on Grid Security InfrastructureEncryption
and Data Integrity, Authentication and
Authorization
Gate keeping Authenticate users and give
permissions
Resources in many locations and organizations
System software
Operating system
Local scheduler
File system
NFS,
Hardware
Computing clusters,
Network resources
Data storage
dCache, HPSS, CASTOR
PBS, Condor, LSF,
4Basic Job Management
Users
- Tools for
- Submit jobs to a CE
- Monitor jobs
- Get outputs
- Transfer files to CE
- Transfer files between CE and SE (Data storage)
How do I run a job on a compute element (CE) ?
(CE batch queue)
Resources
Compute elements
Data storage
Network resources
5Information service (IS)
Users
- Information Service (IS)
- Resources such as CE and SE report their status
to IS - Grid services query IS before running jobs
How do I know which CE could run my job? Which is
free?
Resources
Compute elements
Data storage
Network resources
6File management
Users
- Storage
- Transfer
- Replication management
Weve terabytes of data in files.
My data are in files, and Ive terabytes
Our data are in files, and Ive terabytes
Resources
Compute elements
Data storage
Network resources
7gLite - main components
User Interface (UI) The place where users access
the Grid
Computing Element (CE) A batch queue on a sites
computers where the users job is executed
Storage Element (SE) Provides (large-scale)
storage for files
8Current production middleware
9Interface and Logging Bookkeeping
- User Interface (UI)
- a specialized computer with gLite client software
- Command Line Interface to gLite
- Provides API for own programs
- Authentication Authorization
- via X509 certificate
- Credential proxy is used to fullfil AA
requirements - system is based on Virtual Organization
Membership Service (VOMS) - VO orchestration
- Logging and Bookkeeping (LB)
- holds job states
- billing
10Network Server (NS)
- Usage
- gLite jobs are written in the Job Description
Language (JDL) - JDL-based jobs are send Resource Broker (RB)
- JDL can contain information about
- Input Sandbox files to be send with the job or
registered job in a storage element - Output Sandbox files containing results (created
by job application)
11Resource Broker (RB)
- Resource Broker (RB)
- Network Server (NS) is part of the Resource
Broker (RB) system - NS is responsible for the coordination of
sandboxes - NS takes care about proxy propagation
- Performs matchmaking select best CE and
associated SE for your job - gets available CEs and SEs from information index
- incorporates replica catalogue information
- one data file can have several replicas on
different sites - matchmaking can be modified by user via
- rank expression
- requirements field
12How determines the RB a proper CE?
- RB ranks (user based) available CEs by querieng
the TopBDII - Only VO specific sites are evaluated
- Job Description Language (JDL) offers expression
facilities to rank sites - rank by default the soonest execution time
- Rank - other.GlueCEStateEstimatedResponseTime
- requirements
- Requirements RegExp(".grid.uni-dortmund.de",
other.GlueCEUniqueId) - User specific resource selection during command
line job submittion - Resources with the same rank are selected randomly
13Different flavours lcg-CE vs. glite-CE
- lcg-CE is long time in production mode
- High acceptance
- High job through put
- Used by most sides
- Preferred by DGI Referenzinstallation
- glite-CE still in development state
- Has to be improved for high job through put
- Partially based on the CondorC
- ready for the upcoming GT4 on CEs
- Using Condor glide-in concept
- New concepts and architecture, which we will
address
14Job Exection
- Resource Broker
- Matchmaking selectes an appropiate CE (and maybe
an associated SE) - Sends job description and additional Input
Sandbox to the selected CE - Computing Element
- Gate Keeper of the CE ensures AA of user job
- CE is responsible for data stage in (Input
SandBox) - CE and batch system produce output
- LB gets information from CE about job state
15CLI glite-job- commands (NS)
- glite-job-list-match
- Test your newly written JDL
- List information about available CEs and SEs
- Try your filter ? rank and requirements
expression - glite-job-submit
- Submit JDL to Resource Broker (RB)
- glite-job-status
- Fetch information about the jobs current state
- Small in comparision to glite-job-logging-info
- Contacts LB
- glite-job-output
- Successfully finished jobs can collect their
resulting output
16Job State Machine
Submitted job has entered the system (via UI)
but is not yet transferred to Network Server for
processing
Waiting job accepted by NS and waiting for
Workload Manager processing or being processed by
WMHelper modules.
Ready job processed by WM but not yet
transferred to the CE (local batch system queue).
Scheduled job waiting in the queue on the CE.
Running job is running.
Done job exited or considered to be in a
terminal state by CondorC (e.g., submission to CE
has failed in an unrecoverable way).
Aborted job processing was aborted by WMS
(waiting in the WM queue or CE for too long,
expiration of user credentials).
Cancelled job has been successfully canceled on
user request.
Cleared output sandbox was transferred to the
user or removed due to the timeout.
17Example hostname.jdl
gt cat hostname.jdl Type Job JobType
Normal Executable /bin/sh/ Arguments
start_hostname.sh StdError
stderr.log StdOutput stdout.log InputSand
box start_hostname.sh OutputSandbox
stderr.log,stdout.log RetryCount 7
gt cat start_hostname.sh !/bin/sh sleep
5 hostname f
18Example hostname.jdl
- gt glite-job-list-match hostname.jdl
- Selected Virtual Organisation name (from proxy
certificate extension) gilda - Connecting to host glite-rb.ct.infn.it, port 7772
- COMPUTING ELEMENT IDs
LIST - The following CE(s) matching your job
requirements have been found - CEId
- dgt01.ui.savba.sk2119/jobmanager-lcgpbs-infinite
- dgt01.ui.savba.sk2119/jobmanager-lcgpbs-long
- dgt01.ui.savba.sk2119/jobmanager-lcgpbs-short
- egee008.cnaf.infn.it2119/blah-pbs-infinite
- egee008.cnaf.infn.it2119/blah-pbs-long
- egee008.cnaf.infn.it2119/blah-pbs-short
-
19Example hostname.jdl
- gt glite-job-submit -o jobid hostname.jdl
- Selected Virtual Organisation name (from proxy
certificate extension) gildaConnecting to host
glite-rb.ct.infn.it, port 7772Logging to host
glite-rb.ct.infn.it, port 9002
glite-job-submit Success
The job has been successfully submitted to
the Network Server.Use glite-job-status command
to check job current status. Your job identifier
is - - https//glite-rb.ct.infn.it9000/Lb6LIhD9
3S7VYz1RVbCP8A - The job identifier has been saved in the
following file /home/fscibi/gLite/Other/jobid
- gt cat jobid
- Submitted Job Idshttps//glite-rb.ct.infn.i
t9000/Lb6LIhD93S7VYz1RVbCP8A
20Example hostname.jdl
- gt glite-job-status -i jobid
- BOOKKEEPING INFORMATION
- Status info for the Job https//glite-rb.ct.infn
.it9000/Lb6LIhD93S7VYz1RVbCP8A - Current Status Done (Success)
- Exit code 0
- Status Reason Job terminated successfully
- Destination grid004.iucc.ac.il2119/job
manager-lcgpbs-short - Submitted Mon Apr 3 122728 2006
CEST
21Example hostname.jdl
- gt glite-job-output -i jobid
- Retrieving files from host glite-rb.ct.infn.it (
for https//glite-rb.ct.infn.it9000/Lb6LIhD93S7VY
z1RVbCP8A )
- JOB GET OUTPUT OUTCOME
- Output sandbox files for the job
- - https//glite-rb.ct.infn.it9000/Lb6LIhD93S7VYz
1RVbCP8A - have been successfully retrieved and stored in
the directory - /tmp/glite/glite-ui/fscibi_Lb6LIhD93S7VYz1RVbCP8A
- gt glite-job-output -i jobid --dir ltdirnamegt
22Example hostname.jdl
- gt glite-job-status -v 3 -i jobid
BOOKKEEPING INFORMATION - Status info for the Job https//glite-rb.ct.infn
.it9000/Lb6LIhD93S7VYz1RVbCP8ACurrent Status
Cleared Status Reason user retrieved
output sandboxDestination
grid004.iucc.ac.il2119/jobmanager-lcgpbs-shortSu
bmitted Mon Apr 3 122728 2006
CEST---- stateEnterTimes Submitted
Mon Apr 3 122728 2006 CEST
Waiting Mon Apr 3 122737 2006
CEST Ready Mon Apr 3
122742 2006 CEST Scheduled Mon
Apr 3 122801 2006 CEST Running
Mon Apr 3 122855 2006 CEST Done
Mon Apr 3 123037 2006 CEST
Cleared Mon Apr 3 153639 2006
CEST Aborted
--- Cancelled ---
Unknown ---
23Extended JDL example
- gt cat sphere.jdl
- author giuseppe.larocca_at_ct.infn.it
- Type "Job"
- JobType "Normal"
- Executable "/bin/sh"
- MyProxyServer"lxshare0207.cern.ch"
- StdOutput "sphere.out"
- StdError "sphere.err"
- InputSandbox "start_sphere.sh","sphere1.pov","s
phere1.ini" - OutputSandbox "sphere.out","sphere.err","final_
sphere.gif" - RetryCount 7
- Arguments "start_sphere.sh"
- Requirements
- Member("POVRAY-3.5",other.GlueHostApplicatio
nSoftwareRunTimeEnvironment)
24Workload Manager Proxy (WMProxy)
- new service providing access to the gLite
Workload Management System (WMS) functionality
through a simple Web Services based interface. - has been designed to handle a large number of
requests for job submission - it provides additional features such as bulk
submission and the support for shared and
compressed sandboxes for compound jobs. - Its the natural replacement of the NS in the
passage to the SOA approach. - Support DAG submission support
- all JDL conversions are performed on the server
- a single submission for several jobs
25Workload Manager Proxy (WMProxy)
- All new request types can be monitored and
controlled through a single handle (the request
id) - each sub-jobs can be however followed-up and
controlled independently through its own id - Smarter WMS client commands/API
- allow submission of DAGs, collections and
parametric jobs exploiting the concept of shared
sandbox - allow automatic generation and submission of
collections and DAGs from sets of JDL files
located in user specified directories on the UI
26CLI WMProxy commands
- The commands to interact with WMProxy Service
are - glite-wms-job-submit ltjdl_filegt
- glite-wms-job-list-match ltjdl_filegt
- glite-wms-job-cancel ltjob_Idsgt
- glite-wms-job-output ltjob_Idsgt
27WMProxy submitting a collection of jobs
- Place all JLDs to be submitted in a directory
- for example ./Collect
- voms-proxy-init --voms gilda
- glite-wms-job-delegate-proxy d DelegString
- glite-job-submit d DelegString o myJIDs
--collection ./Collect - glite-wms-job-status i myJIDs
- glite-wms-job-output i myJIDs
28gLite Grid Middleware Services
Access
API
CLI
Security Services
Authorization
Information Monitoring
Services
Application Monitoring
Information Monitoring
Auditing
Authentication
Data Management
Workload Mgmt Services
MetadataCatalog
JobProvenance
Package Manager
File ReplicaCatalog
Accounting
StorageElement
DataMovement
ComputingElement
WorkloadManagement
Connectivity
29gLite Job Management Services
DATA MANAGEMENT
Job Adapter
WM
Dagman
WM PROXY
TASK QUEUE
MATCHMAKER
Condor
Job Controller
LOGGING BOOKEPING
User Interface
JOB MONITORING SUBMISSION
NETWORK SERVER
Job is done!
Job is running!
ACCESS POLICY MANAGEMENT
INFORMATION SUPERMARKET
INFORMATION UPDATER
Wait, there are not enough resourses for this job
available!
OK, now the job resource requirement is
fullfilled!
INFORMATIONSYSTEM
Computing Element
30WMS components
- WMS components handling the job during its
lifetime and performing the submission - Network Server (NS)
- is responsible for
- Accepting incoming requests from the UI
- Authenticates the user
- Obtains a delegated full proxy from the use
- Enqueues the job to the Workload Management
- WorkLoad Manager (WM)
- Is responsible for
- Calling the Matchmaker to find the resource which
fits best the job requirements - Interacting with Information System and File
catalog - Calculates the ranking of all the matchmaked
resource CondorC - Information Supermarket (ISM)
- is responsible for
- basically consists of a repository of resource
information that is available in read only mode
to the matchmaking engine
31WMS components
- WMS components handling the job during its
lifetime and performing the submission - Job Adapter
- is responsible for
- making the final touches to the JDL expression
for a job, before it is passed to CondorC for the
actual submission - creating the job wrapper script that creates the
appropriate execution environment in the CE
worker node - transfer of the input and of the output sandboxes
- Job Controller (JC)
- Is responsible for
- Converts the condor submit file into ClassAd
- hands over the job to CondorC
- CondorC
- responsible for
- performing the actual job management operations
- job submission, job removal
- Log Monitor
- is responsible for
- watching the CondorC log file
- intercepting interesting events concerning active
jobs - events affecting the job state machine
32WMS ? CE glide-in (current gLite Release)
Gatekeeper (GT 2.4.x)
CE Monitor
CondorC (glide-in/GAHP)
update IS
fork via Globus
start job
sched fork
Blahpd (GAHP)
CondorC Jobmanager
Gridmanager
CE
WMS
PBS/Torque
LRMS
Condor
LSF
33Computing Element Components
- Gatekeeper
- Grants access to the CE.
- Authenticate users and map users to local
accounts - forks the globus-jobmanager
- globus-jobmanager
- Fork Condor-C (in CE) to help submit jobs to
batch systems. - BLAPHD (Batch Local ASCII Helper Protocol Daemon)
- Offer an unique interface for condor-c(in CE) to
submit jobs to different batch systems - BLAPHD commands are used by Condor-C (in CE) to
submit jobs to the batch system. - Batch System
- handles the job execution on the available local
worker nodes - Batch System consists of
- TORQUE (a.k.a. OpenPBS) resource manager
- MAUI job scheduler
- A cluster MUST be homogeneous
- Worker nodes
- Hosts executing the jobs
- Responsible for downloading and uploading jobs
data from or to WMS or SE
34Overview Packages Job Flow
Manager
WMS Core component, instantiates Helper
Helper
selectable by environment variable
Jobsubmission
JDL
Finds job-compatible CE
Broker
Selector
Returns best fitting CE
Queries dCache SEs for file state
dCache EIS
35The Grid Data Management Challenge
- Heterogeneity
- Data are stored on different storage systems
using different access technologies - Distribution
- Data are stored in different locations in most
cases there is no shared file system or common
namespace - Data need to be moved between different locations
- Need common interface to storage resources
- Storage Resource Manager (SRM)
- Need to keep track where data is stored
- File and Replica Catalogs
- Need scheduled, reliable file transfer
- File transfer and placement services
36Data Management Services Overview
- Storage Element
- save data and provide a common interface
- Storage Resource Manager(SRM)
- Castor, dCache, DPM,
- Native Access protocols
- rfio, dcap, nfs,
- Transfer protocols
- gsiftp, ftp,
- Catalogs
- keep track where data is stored
- File Catalog
- Replica Catalog
- File Authorization Service
- Metadata CatalogFile Transfer
- schedules reliable file transfer
- Data Scheduler
- only designs exist so far
37SRM in an example
- She is running a job which needs
- Data for physics event reconstruction
- Simulated Data
- Some data analysis files
- She will write files remotely too
They are at CERN In dCache
They are at Fermilab In a disk array (US)
They are at Nikhef in a classic SE (NL)
38SRM in an example
- SRM talks to them on your behalf
- SRM will even allocate space for your files
- SRM will use transfer protocols to send your
- files there
dCache Own system, own protocols and parameters
You as a user need to know all the systems!!!
classic SE Independent system from dCache or
Castor
SRM
Castor No connection with dCache or classic SE
39Storage Resource Management
- Data are stored on disk pool servers or Mass
Storage Systems - Storage resource management needs to take into
account - Transparent access to files (migration to/from
disk pool) - File pinning
- Space reservation
- File status notification
- Life time management
- SRM (Storage Resource Manager) takes care of all
these details - SRM is a Grid Service that takes care of local
storage interaction and provides a Grid interface
to outside world
40Grid Storage Requirements
- Manage local storage and interface to Mass
Storage Systems like - HPSS, CASTOR, DiskeXtender (UNITREE),
- Provide an SRM interface
- Support basic file transfer protocols
- GridFTP mandatory
- Others if available (https, ftp, etc)
- Support a native I/O access protocol
- POSIX (like) I/O client library for
- direct access of data
41What is a catalog?
File Catalog
SE
SE
SE
42Files replicas Name Conventions (LFC)
- Symbolic Link in logical filename space
- Logical File Name (LFN)
- An alias created by a user to refer to some item
of data, e.g. lfncms/20030203/run2/track1 - Globally Unique Identifier (GUID)
- A non-human-readable unique identifier for an
item of data, e.g. - guidf81d4fae-7dec-11d0-a765-00a0c91e6bf6
- Site URL (SURL) (or Physical File Name (PFN) or
Site FN) - The location of an actual piece of data on a
storage system, e.g. srm//pcrd24.cern.ch/flatfil
es/cms/output10_1 (SRM)
sfn//lxshare0209.cern.ch/data/alice/ntuples.dat
(Classic SE) - Transport URL (TURL)
- Temporary locator of a replica access protocol
understood by a SE, e.g. - rfio//lxshare0209.cern.ch//data/alice/ntuples.d
at
SRM
File and Replica Catalog
Symbolic Link 1
Physical File SURL 1
TURL 1
. .
. .
. .
GUID
LFN
Symbolic Link n
Physical File SURL n
TURL n
43The LHC File Catalog (LFC)
- It keeps track of the location of copies
(replicas) of Grid files - LFN acts as main key in the database. It has
- Symbolic links to it (additional LFNs)
- Unique Identifier (GUID)
- System metadata
- Information on replicas
- One field of user metadata
44Data Management CLIs APIs
- lcg_utils lcg- commands lcg_ API calls
- Provide (all) the functionality needed by the LCG
user - Transparent interaction with file catalogs and
storage interfaces when needed - Abstraction from technology of specific
implementations - Grid File Access Library (GFAL) API
- Adds file I/O and explicit catalog interaction
functionality - Still provides the abstraction and transparency
of lcg_utils - edg-gridftp tools CLI
- Complete the lcg_utils with low level GridFTP
operations - Functionality available as API in GFAL
- Generalized as lcg- commands