Title: Report from DataGrid Project Review
1Report from DataGrid Project Review
- Fabrizio Gagliardi
- Project Leader
- Fabrizio.Gagliardi_at_cern.ch
2Major Review Goals
- Important to get approval for a number of
variations from original plans - refocus on production testbed releases driven by
applications (HEPCAL) - synchronization with LCG timeline and plans
- multiple testbeds (development, application)
- financial status of the project
- M/W development plans
- dissemination activity
- support for future EU projects (EGEE)
3DataGRID project priorities refocused
After initial middleware development and testbed
deployment, effort has been refocused on quality
and stability
- Quality Policy Statement published
- http//eu-datagrid.web.cern.ch/eu-datagrid/WP12/de
fault.htm - List of priorities defined at a project retreat
- http//documents.cern.ch/age?a021130
- Followed-up at the last project conference
- http//www.tomiexpress.hu/datagrid/
- Show-stoppers found by users on the application
testbed were the highest priority - Incremental improvements driven by the needs of
the applications (HEPCAL)
4Project Status at the time of the review
- EDG currently provides a set of middleware
services - Job Data Management
- GRID Network monitoring
- Security, Authentication Authorization tools
- Fabric Management
- EDG release 1.4 currently deployed to the
EDG-Testbeds - 15 sites in application testbed actively used by
application groups - Core sites CERN(CH), RAL(UK), NIKHEF(NL),
CNAF(I), CC-Lyon(F) - EDG sw also deployed at total of 40 sites via
CrossGrid, DataTAG and national grid projects - Many applications ported to EDG testbeds and
actively being used - Intense middleware development continuously
going-on
5Relationship with LHC ComputingGrid project (LCG)
- DataGrid is contributing to LCG
- LCG release 1 (July 2003) will deploy EDG 2.0,
VDT 1.1.7 (iVDGL et al.) and GLUE schema (DataTAG
et al.) - LCG is contributing to DataGrid
- Testbed support and infrastructure
- Access to more computing resources in HEP centers
- Testing and verification
- Reinforce the testing group and maintain a cert.
testbed - Fabric management and mware development
- Interaction with US colleagues
- LCG needs are helping to guide synergy with US
projects
LCGgrid deployment for HEP
Advantages for DataGrid better support for
Condor Globus synchronization with other grid
projects
GLUEcommon information schema for
interoperability
6Application Testbed Resources
Site Country CPUs Storage
CC-IN2P3 FR 620 192 GB
CERN CH 138 1321 GB
CNAF IT 48 1300 GB
Ecole Poly. FR 6 220 GB
Imperial Coll. UK 92 450 GB
Liverpool UK 2 10 GB
Manchester UK 9 15 GB
NIKHEF NL 142 433 GB
Oxford UK 1 30 GB
Padova IT 11 666 GB
RAL UK 6 332 GB
SARA NL 0 10000 GB
TOTAL 5 1075 14969 GB
also Dev. TB 200 TB including tape also Dev. TB 200 TB including tape also Dev. TB 200 TB including tape also Dev. TB 200 TB including tape
- Since Last Year
- Improved software (EDG 1.4.3).
- Doubled sites. More waiting
- Australia, Taiwan, USA (U. Wisc.), UK Sites,
INFN, French sites, CrossGrid, - Significantly more CPU/Storage.
- Hidden Infrastructure
- MDS Hierarchy, Resource Brokers, User Interfaces,
VO Replica Catalogs, VO Membership Servers,
Certificate Authorities
7History-relating applications work to TB versions
Version Date Date Date
1.1.2 27 Feb 2002
1.1.3 02 Apr 2002
1.1.4 04 Apr 2002
1.2.a1 11 Apr 2002
1.2.b1 31 May 2002
1.2.0 12 Aug 2002
1.2.1 04 Sep 2002
1.2.2 09 Sep 2002
1.2.3 25 Oct 2002
1.3.0 08 Nov 2002
1.3.1 19 Nov 2002
1.3.2 20 Nov 2002
1.3.3 21 Nov 2002
1.3.4 25 Nov 2002
1.4.0 06 Dec 2002
1.4.1 07 Jan 2003
1.4.2 09 Jan 2003
1.4.3 14 Jan 2003
- Successes
- Matchmaking/Job Mgt.
- Basic Data Mgt.
- Known Problems
- High Rate Submissions
- Long FTP Transfers
Replica Manager LCAS/EDG gatekeeper MyProxy LCFGng
- Known Problems
- GASS Cache Coherency
- Race Conditions in Gatekeeper
- Unstable MDS
ATLAS commence phase1 tests
- Problems with long jobs
- Instability in MDS
- Long file transfers unreliable
Mixed Globus 2.0/2.2 RB/JSS Upgrade
- Successes
- Improved MDS Stability
- FTP Transfers OK
- Known Problems
- Interactions with RC
CMS start stress tests Nov 30 which continue
till Dec 20
RC Changes
- Real Use by Applications!
- Limitations
- Resource Exhaustion
- Size of Logical Collections
BDII
CMS and Atlas evaluate 1.4.3
8Applications and outreach
- Major progress with the three application
domains - WP8 HEP
- WP9 Earth Observation
- WP10 Biomedical
- Intense dissemination, outreach and training
(WP11-WP12) - Tutorials for users wishing to "gridify" their
applications - 2002 9 sessions,200 people trained
- 2003 10 sessions foreseen
- DAY1
- Introduction to Grid computing and overview of
the DataGrid project - Security
- Testbed overview
- Job Submission
- lunch
- hands-on exercises job submission
- DAY2
- Data Management
- Fabric mgmt sw distribution installation
- Applications and Use cases
- Future Directions
- lunch
- hands-on exercises data mgmt
http//hep-proj-grid-tutorials.web.cern.ch/hep-pro
j-grid-tutorials/
9WP8 (HEP applications)
- WP8 pioneering work in developing Grid solutions
for the HEP community has led to a very large
scale international HEP specific Grid project
(LHC Computing Grid Project www.cern.ch/lcg ) - LCG will deploy EDG software for their production
testbed Summer 2003 - Joint teams for testing, support and
certification - Synchronisation of timescales and objectives with
LCG important to ensure dissemination and
exploitations of the results well beyond the end
of the EDG Project - EDG technology adopted by LCG will have a more
general applicability to other sciences as
demonstrated by the HEPCAL (Common Use Cases for
a HEP Common Application Layer) exercise
10WP8 Achievements
- Developed use cases and published HEPCAL
document, being used as a reference by EDG and
LCG for future middleware developments - Continuing validation of middleware with generic
testing by the EIPs (loose cannons) funded effort
in WP8 - Use of middleware by Atlas and CMS in Data
Challenge activities with joint Experiment/EDG
Task Forces - Has provided vital feedback to EDG for essential
developments in data management, information
systems and workload management - For CMS work provided 260K events for essential
physics studies - Substantial unfunded effort used here
- All 6 experiments ( Babar and D0 have joined WP8)
have developed their infrastructure for
distributed computing, together with interfaces
to EDG middleware - Active participation in EDG tutorial development
and presentations
11 Atlas (August and Dec/Jan) CMS (Dec)
Evaluations(DETAILED PAPER IN PREPARATION)
- RESULTS
- Could distribute and run CMS s/w in EDG
environment - Generated 250K events for physics with 10,000
jobs in 3 week period - OBSERVATIONS
- Were able to quickly add new sites to provide
extra resources - Fast turnaround in bug fixing and installing new
software - Test was labour intensive (since software was
developing and the overall system was fragile) - EDG 2.0 should fix the major problems providing a
system suitable for full integration in
distributed production
- RESULTS
- Atlas software was used in the EDG Grid
environment - Several hundred simulation jobs of length 4-24
hours were executed , data was replicated using
grid tools - Results of simulation agreed with non-Grid
runs - OBSERVATIONS
- Good interaction with EDG middleware providers
and with WP6/8 - With a substantial effort it was possible to
perform the jobs - Showed up bugs and performance limitations (fixed
or to be fixed in EDG 2.0) - We need EDG 2.0 release for use in large scale
data challenges
12General Issues
- Due to their disperse geographical locations,
several WPs have limited reach in effectively
re-assigning resources to new project goals - Conflict between releasing new functionality and
supporting production test bed for applications - In-depth support for the testbed relies on the
same human resources that were working on the EDG
2.0 components (which also address performance
issues) - This system-level support for the DataGrid
integrated software subtracts significant
resources from WP1 and other WPs - The need for more support at the project level is
therefore felt
13WP12 (Project Management)
- Reinforcement of the Project Office (Deputy
Project Manager, Deputy Technical coordinator,
second administrator) - Architecture group (ATF) re-launched
- Globus support contract activated with Argonne
(ANL) and being processed with Univ. S.
California's Info Sciences Inst. (ISI) - Software license established
- Co-ordination and collaboration with other
projects - RN Geant, LCG, DataTAG/iVDGL, PPDG/GriPhyN,
CrossGrid, GRIDSTART - Quality group launched and coordinated
- Launched application task forces (Atlas and CMS)
successfully managed by the applications and
coordinated by WP8 - Major contribution to dissemination and standards
(GGF, conferences, EDG tutorials) Deliverables
related to the second testbed major release
rescheduled
14Review Conclusions
- Difficulties arise from finding balance between
support of the current s/w and effort devoted
towards advance solutions and migration to new
emerging standards - Important progress made in functionality and
performance of software and testbed(s) - Pioneered Grid technology adopted by many
projects including LCG for one of the largest
scientific enterprises to date - Exploring further Grid major deployment
activities in FP6 - Fulfilling its role of EU Grid flagship project
15EU reviewers feedback
- Congratulations for a good review.
- Good presentations and no "Murphy's law for the
demos. An impressive job. - This success reflects the interest of all the
partners involved. - Congratulates the project management for taking
the risk of concentrating on production quality. - Would like to see the promise fulfilled of no
relevant loss of functionality by the end of the
project.
16EU Recommendations
- Establish the cross application work-group to get
feedback to middleware get common application
layer and potential synergy. This group needs
clear and measurable objectives. - WP4 (fabric mgmt) -highly appreciated - good
results with excellentpotential - products needs
to be promoted outside project. - WP11 (dissemination) - significant improvement to
last year. Needs extraeffort in last year with
measurable objectives. Expand on the
industrialforum and dissemination. Good to see
publications but more introductory material (e.g.
a book on the project) would be welcome. - WP9 (Earth Observation) - started late but
recovery is in progress. Stillroom for
improvements which they expect to be exploited
during the year.
17EU Recommendations
- Explore branding opportunities in relation to
Globus (testing, packagingetc.). Ensure
relationships with Globus are better formalised. - Continue and extend work through GGF (OGSA).
- Formalize scalable and supportable testbed
infrastructure to exploit further the testbeds
and middleware. - Security policies to be developed quickly to
support industrialexploitation. - Cost claims - not much diversion from project
plan and we expect the project commissioner to
follow this up. - Congratulations again to the project management
for an excellent job.
183rd year schedule
- March
- D6.6,8.3,9.3,10.3 evaluation reports
(rescheduled) - D7.6 Security design report
- May
- EDG 2.0 release deployed
- subsequent improvements based on application
feedback - Project conference in Barcelona
- June
- D11.6 Report of the 2nd annual conf. and industry
Grid Forum workshop - July
- D9.4 EO application platform interface
- September
- EDG 2.x release deployed
- D1.6,2.5,3.5,4.5,5.5,6.7 sw and doc.
- Final project conference in Heidelberg
- December
- D11.7 Report on final project conference
- D11.9 Report on contributions to international
standards - D1.7,2.6,3.6,4.6,5.6,6.8,7.7 Final evaluation
reports - D8.4,9.5,10.4 Application demos and final reports
- D12.19 Third annual report
- Early 2004
- Final project review
final testbed
19Conclusions
- Important milestone passed
- Major re-orientation of the project accepted
- EDG M/W being released to LCG for LCG-1 release
- Need to develop further plans with LCG and in
view of future project EGEE - Need to accommodate other applications (in
agreement with LCG) - Plan long term support of EDG developments (after
2003) - Major opportunity for further EU funding (EGEE)
- EDG was launched by HEPCC, they can be happy and
proud - We hope to repeat the same success with EGEE!