Report from DataGrid Project Review PowerPoint PPT Presentation

presentation player overlay
1 / 19
About This Presentation
Transcript and Presenter's Notes

Title: Report from DataGrid Project Review


1
Report from DataGrid Project Review
  • Fabrizio Gagliardi
  • Project Leader
  • Fabrizio.Gagliardi_at_cern.ch

2
Major Review Goals
  • Important to get approval for a number of
    variations from original plans
  • refocus on production testbed releases driven by
    applications (HEPCAL)
  • synchronization with LCG timeline and plans
  • multiple testbeds (development, application)
  • financial status of the project
  • M/W development plans
  • dissemination activity
  • support for future EU projects (EGEE)

3
DataGRID project priorities refocused
After initial middleware development and testbed
deployment, effort has been refocused on quality
and stability
  • Quality Policy Statement published
  • http//eu-datagrid.web.cern.ch/eu-datagrid/WP12/de
    fault.htm
  • List of priorities defined at a project retreat
  • http//documents.cern.ch/age?a021130
  • Followed-up at the last project conference
  • http//www.tomiexpress.hu/datagrid/
  • Show-stoppers found by users on the application
    testbed were the highest priority
  • Incremental improvements driven by the needs of
    the applications (HEPCAL)

4
Project Status at the time of the review
  • EDG currently provides a set of middleware
    services
  • Job Data Management
  • GRID Network monitoring
  • Security, Authentication Authorization tools
  • Fabric Management
  • EDG release 1.4 currently deployed to the
    EDG-Testbeds
  • 15 sites in application testbed actively used by
    application groups
  • Core sites CERN(CH), RAL(UK), NIKHEF(NL),
    CNAF(I), CC-Lyon(F)
  • EDG sw also deployed at total of 40 sites via
    CrossGrid, DataTAG and national grid projects
  • Many applications ported to EDG testbeds and
    actively being used
  • Intense middleware development continuously
    going-on

5
Relationship with LHC ComputingGrid project (LCG)
  • DataGrid is contributing to LCG
  • LCG release 1 (July 2003) will deploy EDG 2.0,
    VDT 1.1.7 (iVDGL et al.) and GLUE schema (DataTAG
    et al.)
  • LCG is contributing to DataGrid
  • Testbed support and infrastructure
  • Access to more computing resources in HEP centers
  • Testing and verification
  • Reinforce the testing group and maintain a cert.
    testbed
  • Fabric management and mware development
  • Interaction with US colleagues
  • LCG needs are helping to guide synergy with US
    projects

LCGgrid deployment for HEP
Advantages for DataGrid better support for
Condor Globus synchronization with other grid
projects
GLUEcommon information schema for
interoperability
6
Application Testbed Resources
Site Country CPUs Storage
CC-IN2P3 FR 620 192 GB
CERN CH 138 1321 GB
CNAF IT 48 1300 GB
Ecole Poly. FR 6 220 GB
Imperial Coll. UK 92 450 GB
Liverpool UK 2 10 GB
Manchester UK 9 15 GB
NIKHEF NL 142 433 GB
Oxford UK 1 30 GB
Padova IT 11 666 GB
RAL UK 6 332 GB
SARA NL 0 10000 GB
TOTAL 5 1075 14969 GB
also Dev. TB 200 TB including tape also Dev. TB 200 TB including tape also Dev. TB 200 TB including tape also Dev. TB 200 TB including tape
  • Since Last Year
  • Improved software (EDG 1.4.3).
  • Doubled sites. More waiting
  • Australia, Taiwan, USA (U. Wisc.), UK Sites,
    INFN, French sites, CrossGrid,
  • Significantly more CPU/Storage.
  • Hidden Infrastructure
  • MDS Hierarchy, Resource Brokers, User Interfaces,
    VO Replica Catalogs, VO Membership Servers,
    Certificate Authorities

7
History-relating applications work to TB versions
Version Date Date Date
1.1.2 27 Feb 2002
1.1.3 02 Apr 2002
1.1.4 04 Apr 2002
1.2.a1 11 Apr 2002
1.2.b1 31 May 2002
1.2.0 12 Aug 2002
1.2.1 04 Sep 2002
1.2.2 09 Sep 2002
1.2.3 25 Oct 2002
1.3.0 08 Nov 2002
1.3.1 19 Nov 2002
1.3.2 20 Nov 2002
1.3.3 21 Nov 2002
1.3.4 25 Nov 2002
1.4.0 06 Dec 2002
1.4.1 07 Jan 2003
1.4.2 09 Jan 2003
1.4.3 14 Jan 2003
  • Successes
  • Matchmaking/Job Mgt.
  • Basic Data Mgt.
  • Known Problems
  • High Rate Submissions
  • Long FTP Transfers

Replica Manager LCAS/EDG gatekeeper MyProxy LCFGng
  • Known Problems
  • GASS Cache Coherency
  • Race Conditions in Gatekeeper
  • Unstable MDS

ATLAS commence phase1 tests
  • Problems with long jobs
  • Instability in MDS
  • Long file transfers unreliable

Mixed Globus 2.0/2.2 RB/JSS Upgrade
  • Successes
  • Improved MDS Stability
  • FTP Transfers OK
  • Known Problems
  • Interactions with RC

CMS start stress tests Nov 30 which continue
till Dec 20
RC Changes
  • Real Use by Applications!
  • Limitations
  • Resource Exhaustion
  • Size of Logical Collections

BDII
CMS and Atlas evaluate 1.4.3
8
Applications and outreach
  • Major progress with the three application
    domains
  • WP8 HEP
  • WP9 Earth Observation
  • WP10 Biomedical
  • Intense dissemination, outreach and training
    (WP11-WP12)
  • Tutorials for users wishing to "gridify" their
    applications
  • 2002 9 sessions,200 people trained
  • 2003 10 sessions foreseen
  • DAY1
  • Introduction to Grid computing and overview of
    the DataGrid project
  • Security
  • Testbed overview
  • Job Submission
  • lunch
  • hands-on exercises job submission
  • DAY2
  • Data Management
  • Fabric mgmt sw distribution installation
  • Applications and Use cases
  • Future Directions
  • lunch
  • hands-on exercises data mgmt

http//hep-proj-grid-tutorials.web.cern.ch/hep-pro
j-grid-tutorials/
9
WP8 (HEP applications)
  • WP8 pioneering work in developing Grid solutions
    for the HEP community has led to a very large
    scale international HEP specific Grid project
    (LHC Computing Grid Project www.cern.ch/lcg )
  • LCG will deploy EDG software for their production
    testbed Summer 2003
  • Joint teams for testing, support and
    certification
  • Synchronisation of timescales and objectives with
    LCG important to ensure dissemination and
    exploitations of the results well beyond the end
    of the EDG Project
  • EDG technology adopted by LCG will have a more
    general applicability to other sciences as
    demonstrated by the HEPCAL (Common Use Cases for
    a HEP Common Application Layer) exercise

10
WP8 Achievements
  • Developed use cases and published HEPCAL
    document, being used as a reference by EDG and
    LCG for future middleware developments
  • Continuing validation of middleware with generic
    testing by the EIPs (loose cannons) funded effort
    in WP8
  • Use of middleware by Atlas and CMS in Data
    Challenge activities with joint Experiment/EDG
    Task Forces
  • Has provided vital feedback to EDG for essential
    developments in data management, information
    systems and workload management
  • For CMS work provided 260K events for essential
    physics studies
  • Substantial unfunded effort used here
  • All 6 experiments ( Babar and D0 have joined WP8)
    have developed their infrastructure for
    distributed computing, together with interfaces
    to EDG middleware
  • Active participation in EDG tutorial development
    and presentations

11
Atlas (August and Dec/Jan) CMS (Dec)
Evaluations(DETAILED PAPER IN PREPARATION)
  • RESULTS
  • Could distribute and run CMS s/w in EDG
    environment
  • Generated 250K events for physics with 10,000
    jobs in 3 week period
  • OBSERVATIONS
  • Were able to quickly add new sites to provide
    extra resources
  • Fast turnaround in bug fixing and installing new
    software
  • Test was labour intensive (since software was
    developing and the overall system was fragile)
  • EDG 2.0 should fix the major problems providing a
    system suitable for full integration in
    distributed production
  • RESULTS
  • Atlas software was used in the EDG Grid
    environment
  • Several hundred simulation jobs of length 4-24
    hours were executed , data was replicated using
    grid tools
  • Results of simulation agreed with non-Grid
    runs
  • OBSERVATIONS
  • Good interaction with EDG middleware providers
    and with WP6/8
  • With a substantial effort it was possible to
    perform the jobs
  • Showed up bugs and performance limitations (fixed
    or to be fixed in EDG 2.0)
  • We need EDG 2.0 release for use in large scale
    data challenges

12
General Issues
  • Due to their disperse geographical locations,
    several WPs have limited reach in effectively
    re-assigning resources to new project goals
  • Conflict between releasing new functionality and
    supporting production test bed for applications
  • In-depth support for the testbed relies on the
    same human resources that were working on the EDG
    2.0 components (which also address performance
    issues)
  • This system-level support for the DataGrid
    integrated software subtracts significant
    resources from WP1 and other WPs
  • The need for more support at the project level is
    therefore felt

13
WP12 (Project Management)
  • Reinforcement of the Project Office (Deputy
    Project Manager, Deputy Technical coordinator,
    second administrator)
  • Architecture group (ATF) re-launched
  • Globus support contract activated with Argonne
    (ANL) and being processed with Univ. S.
    California's Info Sciences Inst. (ISI)
  • Software license established
  • Co-ordination and collaboration with other
    projects
  • RN Geant, LCG, DataTAG/iVDGL, PPDG/GriPhyN,
    CrossGrid, GRIDSTART
  • Quality group launched and coordinated
  • Launched application task forces (Atlas and CMS)
    successfully managed by the applications and
    coordinated by WP8
  • Major contribution to dissemination and standards
    (GGF, conferences, EDG tutorials) Deliverables
    related to the second testbed major release
    rescheduled

14
Review Conclusions
  • Difficulties arise from finding balance between
    support of the current s/w and effort devoted
    towards advance solutions and migration to new
    emerging standards
  • Important progress made in functionality and
    performance of software and testbed(s)
  • Pioneered Grid technology adopted by many
    projects including LCG for one of the largest
    scientific enterprises to date
  • Exploring further Grid major deployment
    activities in FP6
  • Fulfilling its role of EU Grid flagship project

15
EU reviewers feedback
  • Congratulations for a good review.
  • Good presentations and no "Murphy's law for the
    demos. An impressive job.
  • This success reflects the interest of all the
    partners involved.
  • Congratulates the project management for taking
    the risk of concentrating on production quality.
  • Would like to see the promise fulfilled of no
    relevant loss of functionality by the end of the
    project.

16
EU Recommendations
  • Establish the cross application work-group to get
    feedback to middleware get common application
    layer and potential synergy. This group needs
    clear and measurable objectives.
  • WP4 (fabric mgmt) -highly appreciated - good
    results with excellentpotential - products needs
    to be promoted outside project.
  • WP11 (dissemination) - significant improvement to
    last year. Needs extraeffort in last year with
    measurable objectives. Expand on the
    industrialforum and dissemination. Good to see
    publications but more introductory material (e.g.
    a book on the project) would be welcome.
  • WP9 (Earth Observation) - started late but
    recovery is in progress. Stillroom for
    improvements which they expect to be exploited
    during the year.

17
EU Recommendations
  • Explore branding opportunities in relation to
    Globus (testing, packagingetc.). Ensure
    relationships with Globus are better formalised.
  • Continue and extend work through GGF (OGSA).
  • Formalize scalable and supportable testbed
    infrastructure to exploit further the testbeds
    and middleware.
  • Security policies to be developed quickly to
    support industrialexploitation.
  • Cost claims - not much diversion from project
    plan and we expect the project commissioner to
    follow this up.
  • Congratulations again to the project management
    for an excellent job.

18
3rd year schedule
  • March
  • D6.6,8.3,9.3,10.3 evaluation reports
    (rescheduled)
  • D7.6 Security design report
  • May
  • EDG 2.0 release deployed
  • subsequent improvements based on application
    feedback
  • Project conference in Barcelona
  • June
  • D11.6 Report of the 2nd annual conf. and industry
    Grid Forum workshop
  • July
  • D9.4 EO application platform interface
  • September
  • EDG 2.x release deployed
  • D1.6,2.5,3.5,4.5,5.5,6.7 sw and doc.
  • Final project conference in Heidelberg
  • December
  • D11.7 Report on final project conference
  • D11.9 Report on contributions to international
    standards
  • D1.7,2.6,3.6,4.6,5.6,6.8,7.7 Final evaluation
    reports
  • D8.4,9.5,10.4 Application demos and final reports
  • D12.19 Third annual report
  • Early 2004
  • Final project review

final testbed
19
Conclusions
  • Important milestone passed
  • Major re-orientation of the project accepted
  • EDG M/W being released to LCG for LCG-1 release
  • Need to develop further plans with LCG and in
    view of future project EGEE
  • Need to accommodate other applications (in
    agreement with LCG)
  • Plan long term support of EDG developments (after
    2003)
  • Major opportunity for further EU funding (EGEE)
  • EDG was launched by HEPCC, they can be happy and
    proud
  • We hope to repeat the same success with EGEE!
Write a Comment
User Comments (0)
About PowerShow.com