Title: The%20Use%20of%20Condor%20in%20the%20gLite%20Grid%20Middleware
1The Use of Condor in the gLite Grid Middleware
- Erwin Laure
- Condor Week
- 14-15 March 2005
2Contents
- Overview on EGEE and gLite
- gLite and Condor
- Future plans
3The EGEE Project
- EU funded (2 years until March 2006)
- EGEE offers the largest production grid facility
in the world open to many applications (HEP,
BioMedical, generic) - Existing production service based on LCG
- Next generation open source web-services
middleware being re-engineered taking into
account production/ deployment/ management needs - Well-defined, distributed support structure to
provide eInfrastructure that is available to many
application domains - Middleware Activity
- Re-engineer and harden Grid middleware
- Provide production quality middleware
Collaborations
Global Grid
Operations, Support and training
Network infrastructure(GÉANT )
www.eu-egee.org
4EGEE Activities
- 48 service activities (Grid Operations, Support
and Management, Network Resource Provision) - 24 middleware re-engineering (Quality
Assurance, Security, Network Services
Development) - 28 networking (Management, Dissemination and
Outreach, User Training and Education,
Application Identification and Support, Policy
and International Cooperation)
Emphasis in EGEE is on operating a
production grid and supporting the end-users
5Computing Resources Feb 2005
- Country providing resources
- Country anticipating joining
- In LCG-2
- 113 sites, 30 countries
- gt10,000 cpu
- 5 PB storage
- Includes non-EGEE sites
- 9 countries
- 18 sites
6gLite Grid Middlewareguiding principles
- Service oriented approach
- Allow for multiple interoperable implementations
- Lightweight (existing) services
- Easily and quickly deployable
- Use existing services where possible
- Condor, EDG, Globus, LCG,
- Portable
- Being built on Scientific Linux and Windows
- Security
- Sites and Applications
- Performance/Scalability Resilience/Fault
Tolerance - Comparable to deployed infrastructure
- Co-existence with deployed infrastructure
- Co-existence with LCG-2 and OSG (US) are
essential for the EGEE Grid services - Site autonomy
- Reduce dependence on global, central services
- Open source license
7From Development to Product
- Fast prototyping approach
- Small scale testbed (initially CERN and
Wisconsin) - Single out individual components for deployment
on pre-production service (originally LCG-2/EGEE0
based) - These components need to go through integration
and testing - To ensure they are deployable and basically work
8gLite Services and Responsible Clusters
JRA3
UK
Access Services
Grid AccessService
API
CERN
IT/CZ
Security Services
Authorization
Information Monitoring
Services
ApplicationMonitoring
Information Monitoring
Auditing
Authentication
Data Services
Job Management Services
MetadataCatalog
JobProvenance
PackageManager
File ReplicaCatalog
Accounting
StorageElement
DataManagement
ComputingElement
WorkloadManagement
Site Proxy
9gLite Services for Release 1
JRA3
UK
Access Services
Grid AccessService
API
CERN
IT/CZ
Security Services
Authorization
Information Monitoring
Services
Application Monitoring
Information Monitoring
Auditing
Focus on key servicesRelease date is March 31st
2005
Authentication
Data Services
Job Management Services
MetadataCatalog
JobProvenance
PackageManager
File ReplicaCatalog
Accounting
StorageElement
DataManagement
ComputingElement
WorkloadManagement
Site Proxy
10Condor and gLite
- Design team including representatives from
Middleware providers (AliEn, Condor, EDG,
Globus,) including US partners produced
middleware architecture and design. - Takes into account input and experiences from
applications, operations, and related projects - DJRA1.1 EGEE Middleware Architecture (June
2004) - https//edms.cern.ch/document/476451/
- DJRA1.2 EGEE Middleware Design (August 2004)
- https//edms.cern.ch/document/487871/
- Wisconsin is one of the sites of the development
prototype - Using Condor pool as backend
- Using Globus RLS
- Use VDT distribution of Condor and Globus
11WMS
12The current gLite CE
- Collaboration of INFN, Univ. of Chicago, Univ. of
Wisconsin-Madison, and the EGEE security activity
(JRA3)
Submitjob
CEMon
Notifications
Condor-C
Blahpd
CE
Localbatchsystem
LSF
PBS/Torque
Condor
13Data Management Services
- Efficient and reliable data storage, movement,
and retrieval on the infrastructure - Storage Element
- Reliable file storage (SRM based storage systems)
- Posix-like file access (gLite I/O)
- Transfer (gridFTP)
- File and Replica Catalog
- Resolves logical filenames (LFN) to physical
location of files (URL understood by SRM) and
storage elements - Hierarchical File system like view in LFN space
- Single catalog or distributed catalog (under
development) deployment possibilities - File Transfer and Placement Service
- Reliable file transfer and transactional
interactions with catalogs - Stork being evaluated
- Data Scheduler
- Scheduled data transfer in the same spirit as
jobs are being scheduled taking into account e.g.
network characteristics (collaboration with JRA4) - Under development
- Metadata Catalog
- Limited metadata can be attached to the File and
Replica Catalog - Interface to application specific catalogs have
been defined
Data Scheduler
VOs
Catalog
Catalog
Catalog
Site boundary
VOs
FPS
FPS
FPS
FPS
TransferAgent
SRM
GridFTP
I/O
Storage Element
14Evolutions foreseen in 2005
- Here follows a list of main topics we still need
to address, details and - other topics need to be worked out with
operations and applications - WMS
- WS Interface
- Better support for bulk job submission
- CE
- Head node monitoring
- Guard and if necessary pause/resume services
running on the head node - SUDO service
- Currently one Condor-C instance needed per user
- Condor-C should run under a VO user and submit
jobs via a sudo service to the local batch system - This is being done in collaboration with Condor
and Globus
15Evolutions foreseen in 2005
- Catalogs
- Distributed and single deployment options
- FTS/FPS
- Channel management
- Clarify the role of Stork
- Data Scheduler
- broker for data transfer jobs
- R-GMA Information system
- Web service version
- Package Manager
Second major gLite release foreseen at the end
of 2005
16Summary
- Contributions of Condor to gLite span whole
process - Design prototyping product
- International collaborations in Grid middleware
are essential - Grid middleware cannot be developed within a
single corner - Effective exchange of ideas, requirements,
solutions and technologies - Coordinated development of new capabilities
- Open communication channels
- Early detection of differences and disagreements
- Condor link to OSG is very important to gLite
- A common integration and testing project is being
defined - gLite process, in particular through the strong
interactions between European (EGEE) and US
(Condor and Globus) projects is a first step
towards truly international collaborative
middleware development
17More information
- http//www.glite.org
- http//cern.ch/egee-jra1