Title: EGEE and gLite
1EGEE and gLite
- Kyung-Lang Park
- (2005. 9. 6)
2Definitions
- LCG project LHC Computing Grid Project
- LHC Large Hadron Collider at CERN
- EGEE project Enabling Grids for E-Science in
Europe - LCG-2 EGEE middleware based on GT2
- gLite EGEE middleware based on WS
- GILDA Training infrastructure
3Contents
- EGEE Overview
- gLite Overview
- GILDA Practicals
- Conclusion
4EGEE Overview
5Motivation
- Science is becoming increasingly digital and
needs to deal with increasing amounts of data - Particle Physics
- Large amount of data produced
- Large worldwide organized collaborations
- Large Hadron Collider (LHC) at CERN
- 40 million collisions per second
- 100,000 of todays fastest PC processors
- The solution the Grid
- HEP LHC Computing Grid project (LCG)
- Close integration of LCG and EGEE projects
6The largest e-Infrastructure EGEE
- Objectives
- consistent, robust and secure service grid
infrastructure - improving and maintaining the middleware
- attracting new resources and users from industry
as well as science - Structure
- 71 leading institutions in 27 countries,
federated in regional Grids - leveraging national and regional grid activities
worldwide - funded by the EU with 32 M Euros for first 2
years starting 1st April 2004
7Service Usage
- VOs and users on the production service
- Active VOs
- HEP 4 LHC, D0, CDF, Zeus, Babar
- Biomed
- ESR (Earth Sciences)
- Computational chemistry
- Magic (Astronomy)
- EGEODE (Geo-Physics)
- Registered users in these VO 600
- Many local VOs, supported by their ROCs
- Scale of work performed
- LHC Data challenges 2004
- gt1 M SI2K years of CPU time (1000 CPU years)
- 400 TB of data generated, moved and stored
- 1 VO achieved 4000 simultaneous jobs (4 times
CERN grid capacity)
Number of jobs processed per month (April
2004-April 2005)
8EGEE infrastructure usage
- Average job duration January 2005 June 2005
for the main VOs
9EGEE pilot applications
- High-Energy Physics (HEP)
- Provides computing infrastructure (LCG)
- Challenging
- thousands of processors world-wide
- generating petabytes of data
- chaotic use of grid with individual user
analysis (thousands of users interactively
operating within experiment VOs) - Biomedical Applications
- Similar computing and data storage requirements
- Major additional challenge
- security privacy
10EGEE pilot applications
- Bioinformatics
- BioMed
- GPS_at_
- xmipp_Mlrefine
- Drug Discovery
- Medical Imaging
- Generic applications
- Earth sciences applications
11gLite Overview
12Grid middleware
- The Grid relies on advanced software, called
middleware, which interfaces between resources
and the applications - The GRID middleware
- Finds convenient places for the application to
be run - Optimises use of resources
- Organises efficient access to data
- Deals with authentication to the different sites
that are used - Runs the job monitors progress
- Recovers from problems
- Transfers the result back to the scientist
13EGEE Middleware gLite
- First release of gLite end of March 2005
- Focus on providing users early access to
prototype - Release 1.3 in Aug 05
- Intended to replace present middleware with
production quality services - Aims to address present shortcomings and advanced
needs from applications - Developed from existing components
- Interoperability Co-existence with deployed
infrastructure - Robust Performance Fault Tolerance
- Open source license
- Prototyping short development cycles for fast
user feedback - Initial web-services based prototypes being tested
gLite-2
gLite-1
LCG-2
LCG-1
Globus 2 based
Web services based
14gLite and computation
- Jobs are
- (as in LCG) run from batch queues, termed
computing elements CEs - Described in Job Description Language
- gLite also supports
- Interactive jobs
- Jobs run in batch mode listener receives
messages from CE - Parallelism using MPI
- MPI jobs can run on CEs that support MPInot
across administrative domains (not MPICH-G) - Workflow (DAGs, from Condor)
- Checkpointing
- Partitioned jobs (soon) e.g. Monte-Carlo
15gLite and data
- Simple data
- Files
- Requires
- Replica files
- Move data to computation
- Virtual filesystems
- Metadata for files
- File transfer
- These services are amongst those provided in gLite
- Structured data
- RDBMS, XML databases
- Require extendable middleware tools to support
- computation near to data
- easy access, controlled by AA
- integration and federation
- Hence OGSA-DAIDAI Data Access and Integration
- OGSA-DAI is NOT currently being ported to gLite
16EGEE middlewares face to face
- LCG
- Security
- GSI
- Job Management
- Condor Globus
- CE, WN
- Logging Bookkeeping
- Data Management
- LCG services
- Information Monitoring
- BDII (evolution of MDS)
- Grid Access
- CLI API
- Operating system
- gLite
- Security
- GSI and VOMS
- Job Management
- Condor Globus blahp
- CE, WN
- Logging Bookkeeping
- Job Provenance
- Package management
- Data Management
- LFC
- gLite-I/O FiReMan
- Information Monitoring
- BDII
- R-GMA Service Discovery
- Grid Access
- CLI API Web Services
- Easier installation / configuration
- Currently Scientific LINUX, will be available on
others, incl. Windows
WS non-WS
non-WS
17gLite components overview
Near Future
Access Services
Grid AccessService
API
CLI
now
Security Services
Information Monitoring Services
Authorization
Auditing
Information Monitoring
Job Monitoring
Service Monitoring
Authentication
Dynamic Connectivity
Service Discovery
Data Services
Job Management Services
MetadataCatalog
File ReplicaCatalog
JobProvenance
PackageManager
Accounting
StorageElement
DataMovement
ComputingElement
WorkloadManagement
Site Proxy
18Overview of gLite JMS
- Job Management Services
- main services related to job management/execution
are - computing element
- job management (job submission, job control,
etc.), but it must also provideprovision of
information about its characteristics and status - workload management
- core component discussed in details
- Accounting
- special case as it will eventually take into
account - computing, storage and network resources
- job provenance
- keep track of the definition of submitted jobs,
execution conditions and environment, and
important points of the job life cycle for a long
period - debugging, post-mortem analysis, comparison of
job execution - package manager
- automates the process of installing, upgrading,
configuring, and removing software packages from
a shared area on a grid site. - extension of a traditional package management
system to a Grid
19Architecture Overview
Resource Broker Node (Workload Manager, WM)
Job status
Storage Element
20WMSs Architecture
4
5
1
2
3
21Jobs State Machine
22gLite deployment scenario
23 24The GILDA t-Infrastructure
- Why t-infrastructure?
- e-Infrastructure for production
- t-Infrastrcuture for training
- Need guaranteed response for tutorials limit the
vulnerability of production systems - use training grid
- have training CA
- able to change middleware to prepare participants
for future releases on production system - Also
- need safe resources for installation training
- easy entry point for new communities
25The GILDA project(https//gilda.ct.infn.it)
26The GILDA Test-bed(https//gilda.ct.infn.it/testb
ed.html)
15 sites in 3 continents !
27The GILDA Services(https//gilda.ct.infn.it/testb
ed.html)
Ready for gLite !
28WMS layout in GILDA
RB LCG
GILDA site
GILDA site
GILDA site
29GRID Security the players
Grid
30Digital certificates
- The goal of authorization and autentication of
users and resources is done through digital
certificates, in X.509 format - Certification Authority (CA)
- Issue Digital Certificates for users and machines
- Check the identity and the personal data of the
requestor - Registration Authorities (RAs) do the actual
validation - CAs periodically publish a list of compromised
certificates - Certificate Revocation Lists (CRL) contain all
the revoked certificates yet to expire - CA certificates are self-signed
- For each player, a CA guarantees its autenticity
with a certificate
31Certificate Use
- Digital certificates are split in public/private
keys - Public key is spread along the net, while the
private stays encripted on the disk - Default location for public/private keys is
HOME/.globus (attention to file permissions) - ls -l HOME/.globus
- -rw-r--r-- 1 local local 1143 Jun 30
1601 usercert.pem - -r-------- 1 local local 963 Jun 30
1601 userkey.pem
32Verify your certificate
- To get information on your certificate, run
- gt openssl x509 -in .globus/usercert.pem noout
-text - Certificate
- Data
- Version 3 (0x2)
- Serial Number 1783 (0x6f7)
- Signature Algorithm md5WithRSAEncryption
- Issuer CIT, OGILDA, CNGILDA
Certification Authority - Validity
- Not Before Jun 30 071413 2005 GMT
- Not After Jul 30 071413 2005 GMT
- Subject CIT, OGILDA, OUPersonal
Certificate, LSEOUL, CNSEOUL20/Emailroberto.bar
bera_at_ct.infn.it - ......
33X.509 proxy certificates
- GSI extension to X.509 Identity Certificates
- signed by the normal end entity cert (or by
another proxy) - Support some important features
- Delegation and Mutual authentication
- Has a limited lifetime (minimized risk of
compromised credentials) - It is created by the grid-proxy-init command
- gt grid-proxy-init
- Your identity /CIT/OGILDA/OUPersonal
Certificate/LSEOUL/CNSEOUL20/Emailroberto.barbe
ra_at_ct.infn.it - Enter GRID pass phrase for this identity
- Creating proxy ...................................
.............................. Done - Your proxy is valid until Mon Jul 18 071428
2005
Grid Pass Phrase SEOUL
34Inspecting your proxy
- By grid-proxy-info you can inspect info about
your proxy - gtgrid-proxy-info -all
- subject /CIT/OGILDA/OUPersonal
Certificate/LSEOUL/CNSEOUL20/Emailroberto.barbe
ra_at_ct.infn.it/CNproxy - issuer /CIT/OGILDA/OUPersonal
Certificate/LSEOUL/CNSEOUL20/Emailroberto.barbe
ra_at_ct.infn.it - identity /CIT/OGILDA/OUPersonal
Certificate/LSEOUL/CNSEOUL20/Emailroberto.barbe
ra_at_ct.infn.it - type full legacy globus proxy
- strength 512 bits
- path /tmp/x509up_u500
- timeleft 115724
35Long term proxy
- Proxy has limited lifetime (default is 12 h)
- Bad idea to have longer proxy
- However, a grid task might need to use a proxy
for a much longer time - Grid jobs in HEP Data Challenges last up to 2
days - myproxy server
- Allows to create and store a long term proxy
certificate - -s lthost_namegt specifies the hostname of MyProxy
server - -l ltusergt define user that will own remote
credentials - myproxy-init -s lthost_namegt -l ltusergt
- myproxy-info -s lthost_namegt -l ltusergt
- Get information about stored long living proxy
- myproxy-get-delegation -s lthost_namegt -l ltusergt
- Get a new proxy from MyProxy server
- myproxy-destroy -l ltusergt -s lthost_namegt
- Destroy the credential into the server
- Check out the myproxy-xxx --help option
- A dedicated service on the RB can renew
automatically the proxy - contacts the myproxy server
36Store credentials on MyProxy Server
- gt grid-proxy-destroy remove local credentials
- gt myproxy-init -s grid001.ct.infn.it l
ltUniqueUsernamegt - Your identity /CIT/OGILDA/OUPersonal
Certificate/L - SEOUL/CNltUniqueUsernamegt/Email
- roberto.barbera_at_ct.infn.it
- Enter GRID pass phrase for this identity
- Creating proxy .......................Done
- Proxy Verify OK
- Your proxy is valid until Sun Jul 24 185344
2005 - Enter MyProxy pass phrase
- Verifying password - Enter MyProxy pass phrase
- A proxy valid for 168 hours (7.0 days) for user
ltUniqueUsernamegt now exists on grid001.ct.infn.it.
- Now your credentials are stored on MyProxy
server, and are available - for delegation or renewal by RB.
- ATTENTION! ltUniqueUsernamegt MUST BE your PERSONAL
- username
37Get delegation
- gt myproxy-get-delegation -s grid001.ct.infn.it -l
ltUniqueUsergt - Enter MyProxy pass phrase
- A proxy has been received for user ltUniqueUsergt
in /tmp/x509up_u500 - gt grid-proxy-info -all
- subject /CIT/OGILDA/OUPersonal
Certificate/LSEOUL/CN ltUniqueUsergt/Emailroberto
.barbera_at_ct.infn.it - /CNproxy/CNproxy/CNproxy
- issuer /CIT/OGILDA/OUPersonal
Certificate/LSEOUL/CN ltUniqueUsergt/Emailroberto
.barbera_at_ct.infn.it - /CNproxy/CNproxy
- identity /CIT/OGILDA/OUPersonal
Certificate/LSEOUL/CN ltUniqueUsergt/Emailroberto
.barbera_at_ct.infn.it - type full legacy globus proxy
- strength 512 bits
- path /tmp/x509up_u500
- timeleft 115658
38Workload Managements System
- The user interacts with Grid via a Workload
Management System (WMS) - The Goal of WMS is the distributed scheduling
and resource management in a Grid environment. - What does it allow Grid users to do?
- To submit their jobs
- To execute them on the best resources
- The WMS tries to optimize the usage of resources
- To get information about their status
- To retrieve their output
39JDL
- Information to be specified when a job has to be
submitted - Job characteristics
- Job requirements and preferences on the computing
resources - Also including software dependencies
- Job data requirements
- Information specified using a Job Description
Language (JDL) - Based upon Condors CLASSified ADvertisement
language (ClassAd) - Fully extensible language
- A ClassAd
- Constructed with the classad construction
operator - It is a sequence of attributes separated by
semi-colon (). - So, the JDL allows definition of a set of
attribute, the WMS takes into account when making
its scheduling decision -
40Job Preparation
- An attribute is a pair (key, value), where value
can be a Boolean, an Integer, a list of strings,
.... - ltattributegt ltvaluegt
- In case of literal string for values
- if a string itself contains double quotes, they
must be escaped with a backslash - Arguments " \"Hello\" 10"
- the character ' cannot be specified in the JDL
- special characters such as , , gt, lt are only
allowed - if specified inside a quoted string
- if preceded by triple \
- Arguments "-f file1\\\file2"
- Comments must be preceded by a sharp character
() or have to follow the C syntax - The JDL is sensitive to blank characters and tabs
- they should not follow the semicolon () at the
end of a line
41Job Description Language
- The supported attributes are grouped in two
categories - Job Attributes
- Define the job itself
- Resources
- Taken into account by the RB for carrying out the
matchmaking algorithm (to choose the best
resource where to submit the job) - Computing Resource
- Used to build expressions of Requirements and/or
Rank attributes by the user - Have to be prefixed with other.
- Data and Storage resources (see talk Job Services
With Data Requirements) - Input data to process, SE where to store output
data, protocols spoken by application when
accessing SEs
42JDL Relevant Attributes
JobType Normal (simple, sequential job),
Interactive, MPICH, Checkpointable Or
combination of them Executable (mandatory) The
command name Arguments (optional) Job command
line arguments StdInput, StdOutput, StdError
(optional) Standard input/output/error of the
job InputSandbox (optional) List of files on the
UI local disk needed by the job for running The
listed files will automatically staged to the
remote resource OutputSandbox (optional) List of
files, generated by the job, which have to be
retrieved VirtualOrganisation (optional) A
different way to specify the VO of the user
43Job Submission
- glite-job-submit performs the job submission to
the WMS
Usage glite-job-submit options ltjdl filegt
Principal Options --vo ltvo namegt perform
submission with a different VO than the UI
default one --output, -o ltoutput filegt save
jobId on a file, instead of STDIN --resource, -r
ltresource valuegt, specify the resource for
execution (needs the GLUE UniqueId of the queue,
obtainable with list-match) --debug show function
calls and parameters
44Job life cycle check
- glite-job-status ltjob idgt
- check job execution status
- glite-job-output ltjob idgt
- If job status is done, allows output
retrieve - glite-job-cancel ltjob idgt
- perform job deletion
- All of these commands accepts (with the option i
ltfilegt) input from a file. - glite-job-status -i myjobId
45JDL -- Example
-
- Type "Job"
- JobType "Normal"
- Executable "/bin/bash"
- StdOutput std.out"
- StdError std.err"
- InputSandbox yourscript.sh"
- OutputSandbox std.err",std.out"
- Arguments "yourscript.sh"
46Job Requirements
- Requirements
- Job requirements on the resources
- Specified using GLUE attributes of resources
published in the Information Service - Its value is a boolean expression
- Only one requirements can be specified
- if there are more than one, only the last one is
taken into account - If you need several Requirements, combine them
through logical operators (, , !, .....). - If not specified, default value defined in UI
configuration file is considered - Default other.GlueCEStateStatus "Production"
(the resource has to be able to accept jobs and
dispatch them on WNs)
47JDL Requirements
- Insert a requirement to parse only the short
queues. - Requirements (other.GlueCEPolicyMaxWallClockTime
gt 720) - Insert a requirement to parse only the long
queues. - Requirements (other.GlueCEPolicyMaxWallClockTime
gt 1440) - Insert a requirement to parse only the infinite
queues. - Requirements (other.GlueCEPolicyMaxWallClockTime
gt 2880) - Insert a requirement to stear the execution on a
particular CE Queue. - Requirements other.GlueCEUniqueID
"grid010.ct.infn.it2119/jobmanager-lcgpbs-long"
48Job Submission
- glite-job-list-match allows to check the
suitable resources for execution - No job submission is performed, just listmatch
is performed
- Usage glite-job-list-match options ltjdl filegt
- Principal Options
- --vo ltvo namegt perform list-match with a
different VO than the UI default one - --rank show resources in order of ranking
- --output, -o ltoutput filegt redirect output on a
file, instead of STDIN - --debug show function calls and parameters
-
49JDL -- Requirements
-
- Type "Job"
- JobType "Normal"
- Executable "/bin/sh"
- StdOutput "povray_cubo.out"
- StdError "povray_cubo.err"
- InputSandbox "start_povray_cubo.sh","cubo.pov"
- OutputSandbox "povray_cubo.out","povray_cubo.er
r","cubo.png" - RetryCount 7
- Arguments "start_povray_cubo.sh"
- Requirements Member("POVRAY-3.5",other.GlueHostA
pplicationSoftwareRunTimeEnvironment)
50Start_povray_cubo.sh
- !/bin/bash
- mv cubo.pov OBJECT.POV rename input file
- /usr/bin/povray /usr/share/povray-3.5/ini/res800.i
ni run povray - mv OBJECT.png cubo.png rename output file
51From Phase I to II
- From 1st EGEE EU Review in February 2005
- The reviewers found the overall performance of
the project very good. - remarkable achievement to set up this
consortium, to realize appropriate structures to
provide the necessary leadership, and to cope
with changing requirements. - EGEE I
- Large scale deployment of EGEE infrastructure to
deliver production level Grid services with
selected number of applications - EGEE II
- Natural continuation of the projects first phase
- Emphasis on providing an infrastructure for
e-Science - ? increased support for applications
- ? increased multidisciplinary Grid
infrastructure - ? more involvement from Industry
- Extending the Grid infrastructure world-wide
- ? increased international collaboration
- (Asia-Pacific is already a partner!)
52Conclusion
- EGEE is an open project to construct
e-infrastructure - gLite is production-level grid middleware of EGEE
- Well-defined architecture and reliable software
- Towards service-oriented architecture
- Migrate from LCG to gLite incrementally
- gLite Condor GTK 2.0 (?)
- Focus on data grid
- Doesnt support multiple site MPI jobs