Title: Middleware Development and Deployment Status
1Middleware Development and Deployment Status
Tony Doyle
2Contents
- What are the Challenges?
- What is the scale?
- How does the Grid work?
- What is the status of (EGEE) middleware
development? - What is the deployment status?
- What is GridPP doing as part of the International
effort? - What was GridPP1?
- Is GridPP a Grid?
- What is planned for GridPP2?
- What lies ahead?
- Summary
- Why? What? How? When?
3Science generates data and might require a Grid?
Earth Observation
Bioinformatics
Astronomy
Digital Curation
Healthcare
?
Collaborative Engineering
4What are the challenges?
- Must
- share data between thousands of scientists with
multiple interests - link major (Tier-0 Tier-1) and minor (Tier-1
Tier-2) computer centres - ensure all data accessible anywhere, anytime
- grow rapidly, yet remain reliable for more than a
decade - cope with different management policies of
different centres - ensure data security
- be up and running routinely by 2007
5What are the challenges?
2. Software efficiency
1. Software process
3. Deployment planning
4. Link centres
10. Policies
5. Share data
Data Management, Security and Sharing
8. Analyse data
9. Accounting
6. Manage data
7. Install software
6Tier-1 Scale
Step-1.. financial planning
Step-2.. Compare to (e.g. Tier-1) expt.
requirements
Ian Foster / Carl Kesselman "A computational
Grid is a hardware and software infrastructure
that provides dependable, consistent, pervasive
and inexpensive access to high-end computational
capabilities."
Step-3.. Conclude that more than one centre is
needed
Step-4.. A Grid?
Currently network performance doubles every
year (or so) for unit cost.
7What is the Grid? Hour Glass
I. Experiment Layer e.g. Portals
II. Application Middleware e.g. Metadata
III. Grid Middleware e.g. Information Services
IV. Facilities and Fabrics e.g. Storage Services
8How do I start? http//www.gridpp.ac.uk/start/
- Getting started as a Grid user
- Quick start guide for LCG2GridPP guide to
starting as a user of the Large Hadron Collider
Computing Grid. - Getting an e-science certificateIn order to use
the Grid you need a Grid certificate. This page
introduces the UK e-Science Certification
Authority, which issues cerficates to users. You
can get a certificate from here. - Using the LHC Computing Grid (LCG)CERN's guide
on the steps you need to take in order to become
a user of the LCG. This includes contact details
for support. - LCG user scenarioThis describes in a practical
way the steps a user has to follow to send and
run jobs on LCG and to retrieve and process the
output successfully. - Currently being improved..
9Job Submission(behind the scenes)
Replica Catalogue
Information Service
Resource Broker
Author. Authen.
Job Submission Service
Logging Book-keeping
Compute Element
10Enabling Grids for E-sciencE
- Deliver a 24/7 Grid service to European science
- build a consistent, robust and secure Grid
network that will attract additional computing
resources. - continuously improve and maintain the middleware
in order to deliver a reliable service to users. - attract new users from industry as well as
science and ensure they receive the high standard
of training and support they need. - 100 million euros/4years, funded by EU
- gt400 software engineers service support
- 70 European partners
11Prototype MiddlewareStatus Plans (I)
- Workload Management
- AliEn TaskQueue
- EDG WMS (plus new TaskQueue and Information
Supermarket) - EDG LB
- Computing Element
- Globus Gatekeeper LCAS/LCMAPS
- Dynamic accounts (from Globus)
- CondorC
- Interfaces to LSF/PBS (blahp)
- Pull components
- AliEn CE
- gLite CEmon (being configured)
Blue deployed on development testbed Red
proposed
12Prototype MiddlewareStatus Plans (II)
- Storage Element
- Existing SRM implementations
- dCache, Castor,
- FNAL LCG DPM
- gLite-I/O (re-factored AliEn-I/O)
- Catalogs
- AliEn FileCatalog global catalog
- gLite Replica Catalog local catalog
- Catalog update (messaging)
- FiReMan Interface
- RLS (globus)
- Data Scheduling
- File Transfer Service (StorkGridFTP)
- File Placement Service
- Data Scheduler
- Metadata Catalog
- Simple interface defined (AliEnBioMed)
- Information Monitoring
- R-GMA web service version multi-VO
support
13Prototype MiddlewareStatus Plans (III)
- Security
- VOMS as Attribute Authority and VO mgmt
- myProxy as proxy store
- GSI security and VOMS attributes as enforcement
- fine-grained authorization (e.g. ACLs)
- globus to provide a set-uid service on CE
- Accounting
- EDG DGAS (not used yet)
- User Interface
- AliEn shell
- CLIs and APIs
- GAS
- Catalogs
- Integrate remaining services
- Package manager
- Prototype based on AliEn backend
- evolve to final architecture agreed with ARDA
team
14CB
PMB
Deployment Board
User Board
Tier1/Tier2, Testbeds, Rollout Service specificat
ion provision
Requirements Application Development User feedb
ack
Metadata
Storage
Workload
Network
Security
Info. Mon.
15Middleware Development
Network Monitoring
Configuration Management
Grid Data Management
Storage Interfaces
Information Services
Security
16Application Development
ATLAS
LHCb
CMS
SAMGrid (FermiLab)
BaBar (SLAC)
QCDGrid
PhenoGrid
17GridPP Deployment Status
GridPP deployment is part of LCG (Currently the
largest Grid in the world) The future Grid in
the UK is dependent upon LCG releases
- Three Grids on Global scale in HEP (similar
functionality) - sites CPUs
- LCG (GridPP) 90 (15) 8700 (1500)
- Grid3 USA 29 2800
- NorduGrid 30 3200
18LCG Overview
- By 2007
- 100,000 CPUs
- - More than 100 institutes worldwide
- building on complex middleware being developed
in advanced Grid technology projects, both in
Europe (Glite) and in the USA (VDT) - prototype went live in September 2003 in 12
countries - Extensively tested by the LHC experiments during
this summer
19Deployment Status (26/10/04)
- Incremental releases significant improvements in
reliability, performance and scalability - within the limits of the current architecture
- scalability is much better than expected a year
ago - Many more nodes and processors than anticipated
- installation problems of last year overcome
- many small sites have contributed to MC
productions - Full-scale testing as part of this years data
challenges - GridPP The Grid becomes a reality widely
reported -
British Embassy (USA)
British Embassy (Russia)
Technology Sites
20Data Challenges
- Ongoing..
- Grid and non-Grid Production
- Grid now significant
- ALICE - 35 CPU Years
- Phase 1 done
- Phase 2 ongoing
LCG
- CMS - 75 M events and 150 TB first of this
years Grid data challenges
Entering Grid Production Phase..
21Data Challenge
- 7.7 M GEANT4 events and 22 TB
- UK 20 of LCG
- Ongoing..
- (3) Grid Production
- 150 CPU years so far
- Largest total computing requirement
- Small fraction of what ATLAS need..
Entering Grid Production Phase..
22LHCb Data Challenge
- 424 CPU years (4,000 kSI2k months), 186M events
- UKs input significant (gt1/4 total)
- LCG(UK) resource
- Tier-1 7.7
- Tier-2 sites
- London 3.9
- South 2.3
- North 1.4
- DIRAC
- Imperial 2.0
- L'pool 3.1
- Oxford 0.1
- ScotGrid 5.1
Entering Grid Production Phase..
23Paradigm ShiftTransition to Grid
424 CPU Years
Jun 8020 25 of DC04
May 8911 11 of DC04
Aug 2773 42 of DC04
Jul 7723 22 of DC04
24More Applications
- ZEUS uses LCG
- needs the Grid to respond to increasing demand
for MC production - 5 million Geant events on Grid since August 2004
- QCDGrid
- For UKQCD
- Currently a 4-site data grid
- Key technologies used
- - Globus Toolkit 2.4
- - European DataGrid
- eXist XML database
- managing a few hundred gigabytes of data
25Issues
First large-scale Grid production problems
being addressed at all levels
LCG-2 MIDDLEWARE PROBLEMS AND REQUIREMENTS FOR
LHC EXPERIMENT DATA CHALLENGES
https//edms.cern.ch/file/495809/2.2/LCG2-Limitati
ons_and_Requirements.pdf
26Is GridPP a Grid?
5
- Coordinates resources that are not subject to
centralized control - using standard, open, general-purpose protocols
and interfaces - to deliver nontrivial qualities of service
- YES.
- This is why development and maintenance of LCG
is important. - YES.
- VDT (Globus/Condor-G) EDG/EGEE(Glite) meet
this requirement. - YES.
- LHC experiments data challenges over the summer
of 2004.
http//www-fp.mcs.anl.gov/foster/Articles/WhatIsT
heGrid.pdf
http//agenda.cern.ch/fullAgenda.php?idaa042133
27What was GridPP1?
- A team that built a working prototype grid of
significant scale - gt 1,500 (7,300) CPUs
- gt 500 (6,500) TB of storage
- gt 1000 (6,000) simultaneous jobs
- A complex project where 82 of the 190 tasks for
the first three years were completed
A Success The achievement of something desired,
planned, or attempted
28Aims for GridPP2? From Prototype to Production
BaBarGrid
BaBar
EGEE
SAMGrid
D0
CDF
ATLAS
LHCb
EDG
ARDA
GANGA
LCG
CMS
ALICE
LCG
CERN Tier-0 Centre
CERN Prototype Tier-0 Centre
CERN Computer Centre
UK Tier-1/A Centre
UK Prototype Tier-1/A Centre
RAL Computer Centre
4 UK Tier-2 Centres
19 UK Institutes
4 UK Prototype Tier-2 Centres
Separate Experiments, Resources, Multiple Accounts
Prototype Grids
'One' Production Grid
2004
2007
2001
29Planning GridPP2 ProjectMap
Structures agreed and in place (except LCG
phase-2)
30What lies ahead? Some mountain climbing..
Annual data storage 12-14 PetaBytes per year
CD stack with 1 year LHC data ( 20 km)
100 Million SPECint2000
Importance of step-by-step planning
Pre-plan your trip, carry an ice axe and crampons
and arrange for a guide
Concorde (15 km)
In production terms, weve made base camp
? 100,000 PCs (3 GHz Pentium 4)
We are here (1 km)
Quantitatively, were 9 of the way there in
terms of CPU (9,000 ex 100,000) and disk (3 ex
12-143 years)
31- Why? 2. What?
- 3. How? 4. When?
- From Particle Physics perspective the Grid is
- 1. needed to utilise large-scale computing
resources efficiently and securely - 2. a) a working prototype running today on large
testbed(s) - b) about seamless discovery of computing
resources - c) using evolving standards for interoperation
- d) the basis for computing in the 21st Century
- e) not (yet) as transparent or robust as
end-users need - 3. see the GridPP getting started pages
- (two-day EGEE training courses available)
- a) Now, at prototype level, for simple(r)
applications (e.g. experiment Monte Carlo
production) - b) September 2007 for more complex applications
(e.g. data analysis) ready for LHC