The EGEE Project Status - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

The EGEE Project Status

Description:

Goal of EGEE: develop a service grid infrastructure which is available to ... Seismology. Grid search engines. Stock market simulators. Digital video etc. ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 35
Provided by: fab111
Category:

less

Transcript and Presenter's Notes

Title: The EGEE Project Status


1
The EGEE Project Status
  • Ian Bird
  • EGEE Operations Manager
  • CERN
  • Geneva, Switzerland

ISGC, Taipei 27thApril 2005
2
Contents
  • The EGEE Project
  • Overview and Structure
  • Grid Operations
  • Middleware
  • Networking Activities
  • Applications
  • HEP
  • Biomedical
  • Summary

3
EGEE goals
  • Goal of EGEE develop a service grid
    infrastructure which is available to scientists
    24 hours-a-day
  • The project concentrates on
  • building a consistent, robust and secure Grid
    network that will attract additional computing
    resources
  • continuously improve and maintain the middleware
    in order to deliver a reliable service to users
  • attracting new users from industry as well as
    science and ensure they receive the high standard
    of training and support they need

4
EGEE
  • EGEE is the largest Grid
  • infrastructure project in Europe
  • 70 leading institutions in 27 countries,
    federated in regional Grids
  • Leveraging national and regional grid activities
  • 32 M Euros EU funding for initially 2 years
    starting 1st April 2004
  • EU review, February 2005 successful
  • Preparing 2nd phase of the project proposal to
    EU Grid call September 2005
  • Promoting scientific partnership outside EU

5
EGEE Activities
  • 48 service activities (Grid Operations, Support
    and Management, Network Resource Provision)
  • 24 middleware re-engineering (Quality
    Assurance, Security, Network Services
    Development)
  • 28 networking (Management, Dissemination and
    Outreach, User Training and Education,
    Application Identification and Support, Policy
    and International Cooperation)

Emphasis in EGEE is on operating a
production grid and supporting the end-users
6
EGEE Activities
  • 48 service activities (Grid Operations, Support
    and Management, Network Resource Provision)
  • 24 middleware re-engineering (Quality
    Assurance, Security, Network Services
    Development)
  • 28 networking (Management, Dissemination and
    Outreach, User Training and Education,
    Application Identification and Support, Policy
    and International Cooperation)

Emphasis in EGEE is on operating a
production grid and supporting the end-users
7
Computing Resources April 2005
This greatly exceeds the project expectations for
numbers of sites Shows that the main issue of
complexity is the number of sites
8
SA1 Operations Structure
  • Operations Management Centre (OMC)
  • At CERN coordination etc
  • Core Infrastructure Centres (CIC)
  • Manage daily grid operations oversight,
    troubleshooting
  • Run essential infrastructure services
  • Provide 2nd level support to ROCs
  • UK/I, Fr, It, CERN, Russia (M12)
  • Taipei will also run a CIC
  • Regional Operations Centres (ROC)
  • Act as front-line support for user and operations
    issues
  • Provide local knowledge and adaptations
  • One in each region many distributed
  • User Support Centre (GGUS)
  • In FZK manage PTS provide single point of
    contact (service desk)
  • Not foreseen as such in TA, but need is clear

9
Grid Operations
  • The grid is flat, but
  • Hierarchy of responsibility
  • Essential to scale the operation
  • CICs act as a single Operations Centre
  • Operational oversight (grid operator)
    responsibility
  • rotates weekly between CICs
  • Report problems to ROC/RC
  • ROC is responsible for ensuring problem is
    resolved
  • ROC oversees regional RCs
  • ROCs responsible for organising the operations in
    a region
  • Coordinate deployment of middleware, etc
  • CERN coordinates sites not associated with a ROC

RC - Resource Centre ROC - Regional Operations
Centre CIC Core Infrastructure Centre
10
Grid monitoring
  • Operation of Production Service real-time
    display of grid operations
  • Accounting information
  • Selection of Monitoring tools
  • GIIS Monitor Monitor Graphs
  • Sites Functional Tests
  • GOC Data Base
  • Scheduled Downtimes
  • Live Job Monitor
  • GridIce VO fabric view
  • Certificate Lifetime Monitor

11
Operations focus
  • Main focus of activities now
  • Improving the operational reliability and
    application efficiency
  • Automating monitoring ? alarms
  • Ensuring a 24x7 service
  • Removing sites that fail functional tests
  • Operations interoperability with OSG and others
  • Improving user support
  • Demonstrate to users a reliable and trusted
    support infrastructure
  • Deployment of gLite components
  • Testing, certification ? pre-production service
  • Migration planning and deployment while
    maintaining/growing interoperability
  • Further developments now have to be driven by
    experience in real use

12
EGEE Activities
  • 48 service activities (Grid Operations, Support
    and Management, Network Resource Provision)
  • 24 middleware re-engineering (Quality
    Assurance, Security, Network Services
    Development)
  • 28 networking (Management, Dissemination and
    Outreach, User Training and Education,
    Application Identification and Support, Policy
    and International Cooperation)

Emphasis in EGEE is on operating a
production grid and supporting the end-users
13
gLite middleware
  • The 1st release of gLite (v1.0) made end March05
  • http//glite.web.cern.ch/glite/packages/R1.0/R2005
    0331
  • http//glite.web.cern.ch/glite/documentation
  • Lightweight services
  • Interoperability Co-existence with deployed
    infrastructure
  • Performance Fault Tolerance
  • Portable
  • Service oriented approach
  • Site autonomy
  • Open source license

14
gLite Release 1.0
  • Job management Services
  • Workload Management
  • Computing Element
  • Logging and Bookkeeping
  • Data management Services
  • File and Replica catalog
  • File Transfer and Placement Services
  • gLite I/O
  • Information Services
  • R-GMA
  • Service Discovery
  • Security
  • Deployment Modules
  • Distribution available as RPMs, Binary Tarballs,
    Source Tarballs and APT cache

Serious testing certification is just starting
15
gLite Services for Release 1.0Components Summary
and Origin
  • Computing Element
  • Gatekeeper, WSS (Globus)
  • Condor-C (Condor)
  • CE Monitor (EGEE)
  • Local batch system (PBS, LSF, Condor)
  • Workload Management
  • WMS (EDG)
  • Logging and bookkeeping (EDG)
  • Condor-C (Condor)
  • Storage Element
  • File Transfer/Placement (EGEE)
  • glite-I/O (AliEn)
  • GridFTP (Globus)
  • SRM Castor (CERN), dCache (FNAL, DESY), other
    SRMs
  • Catalog
  • File and Replica Catalog (EGEE)
  • Metadata Catalog (EGEE)
  • Information and Monitoring
  • R-GMA (EDG)
  • Service Discovery (EGEE)
  • Security
  • VOMS (DataTAG, EDG)
  • GSI (Globus)
  • Authentication for C and Java based (web)
    services (EDG)

16
Main Differences to LCG-2
  • Workload Management System works in push and pull
    mode
  • Computing Element moving towards a VO based
    scheduler guarding the jobs of the VO (reduces
    load on GRAM)
  • Re-factored file replica catalogs
  • Secure catalogs (based on user DN VOMS
    certificates being integrated)
  • Scheduled data transfers
  • SRM based storage
  • Information Services R-GMA with improved API,
    Service
  • Discovery and registry replication
  • Move towards Web Services

17
EGEE Activities
  • 48 service activities (Grid Operations, Support
    and Management, Network Resource Provision)
  • 24 middleware re-engineering (Quality
    Assurance, Security, Network Services
    Development)
  • 28 networking (Management, Dissemination and
    Outreach, User Training and Education,
    Application Identification and Support, Policy
    and International Cooperation)

Emphasis in EGEE is on operating a
production grid and supporting the end-users
18
Outreach Training
  • Public and technical websites constantly evolving
    to expand information available and keep it up to
    date
  • 2 conferences organised
  • 300 _at_ Cork, 400 _at_ Den Haag
  • Athens 3rd project conference 18-22 April 05
  • http//public.eu-egee.org/conferences/3rd/
  • Pisa 4th project conference 24-28 October 05
  • More than 70 training events (including the GGF
    grid school) across many countries
  • 1000 people trained
  • induction application developer advanced
    retreats
  • Material archive with more than 100 presentations
  • Strong links with GILDA testbed and GENIUS portal
    developed in EU DataGrid

19
Deployment of applications
  • Pilot applications
  • High Energy Physics
  • Biomed applications
  • http//egee-na4.ct.infn.it/biomed/applications.htm
    l
  • Generic applications Deployment under way
  • Computational Chemistry
  • Earth science research
  • EGEODE first industrial application
  • Astrophysics
  • With interest from
  • Hydrology
  • Seismology
  • Grid search engines
  • Stock market simulators
  • Digital video etc.
  • Industry (provider, user, supplier)
  • Many users
  • broad range of needs
  • different communities with different background
    and internal organization

20
High Energy Physics
  • Very experienced and large international user
    community
  • Involvement in many projects worldwide and users
    of several grids (e.g. all LHC experiments do use
    multiple grids at the same time for their data
    challenges)
  • LG experiments ZEUS, D0, CDF, H1, Babar
  • Production infrastructure (LCG/EGEE)
  • Intensive usage during 2004 data challenges
  • LHCb 3500 concurrent jobs for long periods
  • Many issues of functionality and performance were
    exposed
  • Data challenges were also first real use of LCG-2
    only limited testing had been done in advance
  • Major issue was reliability badly configured
    and unstable sites
  • Nevertheless significant work was done
  • gt1 M SI2K years of cpu time (1000 cpu years)
  • 400 TB of data generated, moved and stored
  • 4000-5000 simultaneous jobs (4 times CERN grid
    capacity)
  • ARDA role in application development and
    middleware testing
  • Helping the evolution of the experiments
    specific middleware towards analysis usage
  • Large effort on the 4 LHC experiments prototypes
  • CMS prototype migrated to gLite version 1 and
    exposed to several users
  • Improved reliability has been achieved by
    selecting well maintained sites
  • Efficiencies of better than 90 have been
    possible D0, CMS, ATLAS, in well controlled
    conditions
  • This remains main area of focus for improvement
    due in large part to number of sites in the
    infrastructure

21
Recent ATLAS work
10,000 concurrent jobs in the system
Number of jobs/day
  • ATLAS jobs in EGEE/LCG-2 in 2005
  • In latest period up to 8K jobs/day
  • Used a combination of RB and Condor_G submissions

22
ZEUS on LCG-2
23
LCG Deployment Schedule
  • LHC starts in 2007
  • Ramp-up with series of service challenges to
    ensure key services infrastructure in place
  • Extremely aggressive timescale

24
Introduction The MAGIC Telescope
  • Ground based Air Cerenkov Telescope
  • Gamma ray 30 GeV - TeV
  • LaPalma, Canary Islands (28 North, 18 West)
  • 17 m diameter
  • operation since autumn 2003(still in
    commissioning)
  • Collaborators

IFAE Barcelona, UAB Barcelona, Humboldt U.
Berlin, UC Davis, U. Lodz, UC Madrid, MPI
München, INFN / U. Padova, U. Potchefstrom, INFN
/ U. Siena, Tuorla Observatory, INFN / U. Udine,
U. Würzburg, Yerevan Physics Inst., ETH Zürich
Physics Goals Origin of VHE Gamma rays Active
Galactic Nuclei Supernova Remnants Unidentified
EGRET sources Gamma Ray Burst
25
Introduction ground ?-ray astronomy
26
MAGIC Hadron rejection
  • Based on extensive Monte Carlo Simulation
  • air shower simulation program CORSIKA
  • Simulation of hadronic background is very CPU
    consuming
  • to simulate the background of one night, 70 CPUs
    (P4 2GHz) needs to run 19200 days
  • to simulate the gamma events of one night for a
    Crab like source takes 288 days.
  • At higher energies (gt 70 GeV) observations are
    possible already by On-Off method (This reduces
    the On-time by a factor of two)
  • Lowering the threshold of the MAGIC telescope
    requires new methods based on Monte Carlo
    Simulations

27
Experiences
  • Data challenge Grid-1
  • 12M hadron events
  • 12000 jobs needed
  • started march 2005
  • up to now 4000 jobs

170/3780 Jobs failed ? 4.5 failure
Job successful Output file registered at PIC
  • First tests
  • with manual GUI submission
  • Reasons for failure
  • Network problems
  • RB problems
  • Queue problems
  • Diagnostic
  • no tools found
  • complex and time consuming
  • ? use metadata base, log the failure,
  • resubmit and dont care

28
Biomed applications
  • Loosely coupled community
  • Had to go the long way of getting up to speed
  • VO creation and core services installation
  • Setting up a task force of experts
  • Recently joined the user support at application
    level
  • Applications
  • See list and description from web site
  • http//egee-na4.ct.infn.it/biomed/applications.htm
    l
  • 12 applications running today
  • New applications emerging
  • medical imaging, bioinformatics, phylogenetics,
    molecule structures and drug discovery...
  • Grown to a significant infrastructure usage
  • 29kCPU hours and 24k jobs reported on January

29
Bioinformatics
  • GPS_at_ Grid Protein Sequence Analysis
  • NPSA is a web portal offering proteins databases
    and sequence analysis algorithms to the
    bioinformaticians (3000 hits per day)
  • GPS_at_ is a gridified version with increased
    computing power
  • Need for large databases and big number of short
    jobs
  • xmipp_MLrefine
  • 3D structure analysis of macromolecules from
    (very noisy) electron microscopy images
  • Maximum likelihood approach for finding the
    optimal model
  • Very compute intensive
  • Drug discovery
  • Health related area with high performance
    computation need
  • An application currently being ported in Germany
    (Fraunhofer institute)

30
Medical imaging
  • GATE
  • Radiotherapy planning
  • Improvement of precision by Monte Carlo
    simulation
  • Processing of DICOM medical images
  • Objective very short computation time compatible
    with clinical practice
  • Status development and performance testing
  • CDSS
  • Clinical Decision Support System
  • knowledge databases assembling
  • image classification engines widespreading
  • Objective access to knowledge databases from
    hospitals
  • Status from development to deployment, some
    medical end users

31
Medical imaging
  • SiMRI3D
  • 3D Magnetic Resonance Image Simulator
  • MRI physics simulation, parallel implementation
  • Very compute intensive
  • Objective offering an image simulator service to
    the research community
  • Satus parallelized and now running on LCG2
    resources
  • gPTM3D
  • Interactive tool for medical images segmentation
    and analysis
  • A non gridified version is distributed in several
    hospitals
  • Need for very fast scheduling of interactive
    tasks
  • Objectives shorten computation time using the
    grid
  • Status development of the gridified version
    being finalized

32
Evolution of biomedical applications
  • Growing interest of the biomedical community
  • Partners involved proposing new applications
  • New application proposals (in various
    health-related areas)
  • Enlargement of the biomedical community (drug
    discovery)
  • Growing scale of the applications
  • Progressive migration from prototypes to
    pre-production services for some applications
  • Increase in scale (volume of data and number of
    CPU hours)

33
EGEE Geographical Extensions
  • EGEE is a truly international under-taking
  • Collaborations with other existing European
    projects, in particular
  • GÉANT, DEISA, SEE-GRID
  • Relations to other projects/proposals
  • OSG OpenScienceGrid (USA)
  • Asia Korea, Taiwan, EU-ChinaGrid
  • BalticGrid Lithuania, Latvia, Estonia
  • EELA Latin America
  • EUMedGrid Mediterranean Area
  • Expansion of EGEE infrastructure in these regions
    is a key element for the future of the project
    and international science

34
Summary
  • EGEE is a first attempt to build a worldwide Grid
    infrastructure for data intensive applications
    from many scientific domains
  • A large-scale production grid service is already
    deployed and being used for HEP and BioMed
    applications with new applications being ported
  • Resources user groups are expanding
  • A process is in place for migrating new
    applications to the EGEE infrastructure
  • A training programme has started with many events
    already held
  • next generation middleware is being tested
    (gLite)
  • First project review by the EU successfully
    passed in Feb05
  • Plans for a follow-on project are being prepared
Write a Comment
User Comments (0)
About PowerShow.com