Title: EGEE
1EGEE ??????????????? ????? ? ?????????? ????
?????????????? ??? ?????
- ?.?. ????? (????? ???), ?.?. ????????? (????),
?.?. ???????? (??? ??) - ?? ????? ???? ??????????? ?????????? ???? ???
??????????? ???????? ? ??????? - ???? ???, 5 ?????? 2005
2? ??????? EGEE
- EGEE Enabling Grids for E-sciencE
- ???????? ?????????? ?????????????? ????
???? ??? ?????????????? ?????????? ? ???????????
???????? ? ??????? ? ??????? ?????????????
EGEE ?????? EC FP6 ? ???????? 31 M, ??????
2004 ?????? 2006 (? 2009), 70 ?????????
(????????????) ?? 30 ????? (??????, ???,
??????), ????? 30 ??????????????? ?????????
EGEE ???????? ???? ?????????????? (SA1), Grid
vs Networking (SA2), ??????????/???????? MW
(JRA) Globus2CondorEDG, ????????
????-?????????? (NA4), ???????? (NA3),
dissemination (NA2)
50 25 25
EGEE ?????? ????? 1000 ?????? ????????????,
????? 100 ??????, 10 ?????????? ????????, ?
?????????????? ????? 10000 CPU ? ????? 5 Pbyte
??????
3????
- ???? ?????????? ???? ??? ??????????? ???????? ?
??????? - 8 ??????????-?????????? ?????????????????
??????? EGEE - ???? (?.???????), ???? (?.????????), ??? ??
(?.????????), - ??? ??? (?.???????), ???? ??? (?.?????), ????
??? (?.?????), - ????? ??? (?.??????), ???? (?.?????????)
- ???? ???????????? ????????? ? ??????? EGEE
(????? 12 ?????????) - ?.????? ???? EGEE Project Management Board,
- ??????? ?????? 0.5 M ?????????? ??????????
?????????????? (????????????)
???? ???????? ??????????? ???????? ??????????
?????????????? EGEE ??????? ? SA1 (??? 8
??????????), SA2 (??? ??), NA2-NA4 ???? -
??????????????????? ???? ??????????????
?????????? ???? ??? ?????, ????? ? ??? ???????
??? ?????????? ?????????? ? ???? ???????.
???? ?????? ????? 100 ?????? ????????????, ()
10 ??????, 3 (3) ?????????? ????????, ?
?????????????? ????? 300 CPU ? 50 Tbyte ??????
???? ?? ??????? - ????????
4?????????? EGEE http//goc.grid-support.ac.uk/gri
dsite/gocmain/
5Operations (SA1, SA2) Management
6??????????? ????????
- CIC Core Infrastructure Center
- ??????????? ???????????????? ??????? ????
???????? 24?7 ????? ??? - ?????????? ? ???????????? ?? ?????????????
???????? ???? - ??????? ????? ???????? ? ??????????? ?????
(accounting) ????? ???, ???? - ?????????? ? ????????? ??????????? ???????????
????? ??? - ????????? ???????????? ??????????????, CA
??? ?? - ???????????? (?????????) ???? ??
??? ??? - ROC Regional Operations Center
- ????, ???? (user support), ???? ???, ????
- RC Resource Center (8)
- VO Virtual Organization
- ?????? ??????? ??????? LHC ATLAS, ALICE, CMS,
LHCb PHOTON, - ?????? (???? ??? EGEE)
- ?????????? VOs e-Earth (????????? - ??? ? ??
???), fusion (??? ?? ), ?????????
(????????????, ??? ), - ??????? (????, ?????, ??-? ????????????, ),
-
7???? ??????????????
???? (CIC)
??? (ROC)
??????????? ??
CA
??????????????????????
????????????????????????
????????? ??
??????????
???????????????? ??
?????? ?????????? EGEE
????????? ??????
??????? ???????
...
JINR
SINP
ITEP
IHEP
MyP
...
BDII
RLS
RB
8???? SA1
- Distributed ROC https//edms.cern.ch/file/479460/4
/EGEE-SA1-ExecPlan-RU-v1.6.pdf - IHEP, plus some functions provided by ITEP (user
support), JINR (operational monitoring), IMPB RAS
and PNPI - - serve 8 RCs, 5 new RCs to appear in
next 3 months and next 5 to the end of 2005 - - support RDIG SA1 managers
http//mail.ihep.ru/Lists/roc_support/List.html - - MW repository http//grid-cvs.ihep.su
- - user support http//ussup.itep.ru
- - GridIce server http//lcfgmon.itep.ru/gr
idice - Distributed CIC - preparing to start in April
2005 (TA) - Now some core servces are 8x5 supported
(preparing to 24x7) - - RB, IS, RC, MyProxy, (regional) VO
management (SINP MSU) - - Grid monitoring and accounting (JINR)
- - CA (SINP MSU -gt RRC KI), today about 300
(active) certificates, http//lcg20.sinp.msu.ru/CA
/ - - MW validation and documentation
localization (KIAM RAS), http//www.gridclub.ru
9CIC-on-duty
10CIC-on-duty (????.)
- ??????????? ?????? ???????? ????????? ?????????
?? ????????????? ???????????. - ?????????? ?? ??????? ???????? LCG-ROLLOUT ?
???????????? ?? ??????????? ????????. ??????
?????? ??????????? ?????? ???. - ???? ???????? ?????? ???? ??????? ?? ????????
???????? GIIS (GIIS Monitor), ? ???????? ??????
????????? ?? ???????, ??????????? ??
?????????????? ???????. ? ?????? ?????????
????????? ????? ???????? ????? ??????? (history
of published values) ????? ???????? ???????? ??
???????? ????????? ??? ????? ????? ?????????
????????????????? ?????. ??? ????????????? ?????
? ???????????????? ?????. ??? ???????? ?? ????,
??? ?????? 2 ????. - ???? ??? ? ???? ???????? ????????? ???????? GIIS
(GIIS Monitor reports) ? ????? ??????
???????????? ? ????????????? ????????.
http//egee.sinp.msu.ru
11CIC-on-duty (????.)
- ???? ???????? ?????? ???? ??????? ?? ????????
???????? ??????????? ??????? (Live Job Monitor).
??????????? ????????? ??????????????? ?????????
(?????? ????????????? ?????, ??????? ??????????
????? ? ??????? ?? ?????-?? ????????? ????? ?
?.?.). ????????????? ???? ??????? ?????? 30
?????. - ?????????????? ? 1030 ??????????? ?????? ?
?????? ?????? (Site Test Reports) ? ???????? ?? ?
???????? ? ??????? ???????? "???????" (Savannah
tracking system). ? ?????? ????????? ??????????,
??????? ????????? ????? ??? ???????????? ??????. - ????? ???????? ???????? ????????? ????????????
(Certificate Lifetime Monitor) ? ???????? ???,
???? ????? ????? ? ????????????, ?????????? ?
??????? ??????. - ?????????? ?? ????????? ????? ????????
???????????? ?????? (GOC Job Monitor) (????????
????????? ??? ? ????). - ?????? ????????? ?? ?????????? ???????????
??????, ???????? ????????? ????? ??????????
???????? ? ??????????? ?? ??????????
?????????????? (FAQs and Troubleshooting Guides). - ? ?????? ????????? ???????????? ?? ??????? ?????,
? ??????? ?????? ???? ????????? ?????????
?????????. - ???????? ? ?????? ? ????? ????????? ?????? ????
???????? ?? ?????????? ???????, ?????????? ?
????? ???????? ???????????? ??????
(https//cic.in2p3.fr).
12???? (CIC) ??? ???
- ???????????? (?????????) ???? ??
- ??????????? ???????????? ?? ??????? ????.
13?????? ??????? ? ????
Site
RB
UI
CE
WN
SE
UI ????????? ???????????? RB ??????
???????? BDII ?????????????? ???? ?????? ??
???????? RLS ?????? ?????? ?????? CE
???????????? ??????? SE ??????? ????????
?????? WN ??????? ???? FS ????????
?????? MyProxy ?????? ????????? ????????
??????????? ????????????
Site
CE
WN
SE
14?????? ??????? ? ????
Site
RB
UI
CE
WN
SE
Site
CE
UI ????????? ???????????? RB ??????
???????? BDII ?????????????? ???? ?????? ??
???????? RLS ?????? ?????? ?????? CE
???????????? ??????? SE ??????? ????????
?????? WN ??????? ???? FS ???????? ??????
WN
SE
15?????? ??????? ? ????
Site
RB
UI
CE
WN
SE
Site
CE
UI ????????? ???????????? RB ??????
???????? BDII ?????????????? ???? ?????? ??
???????? RLS ?????? ?????? ?????? CE
???????????? ??????? SE ??????? ????????
?????? WN ??????? ???? FS ???????? ??????
WN
SE
16?????? ??????? ? ????
Site
RB
UI
CE
WN
SE
Site
CE
UI ????????? ???????????? RB ??????
???????? BDII ?????????????? ???? ?????? ??
???????? RLS ?????? ?????? ?????? CE
???????????? ??????? SE ??????? ????????
?????? WN ??????? ???? FS ???????? ??????
WN
SE
17?????? ??????? ? ????
Site
RB
UI
CE
WN
SE
Site
CE
UI ????????? ???????????? RB ??????
???????? BDII ?????????????? ???? ?????? ??
???????? RLS ?????? ?????? ?????? CE
???????????? ??????? SE ??????? ????????
?????? WN ??????? ???? FS ???????? ??????
WN
SE
18?????? ??????? ? ????
Site
RB
UI
CE
WN
SE
Site
CE
UI ????????? ???????????? RB ??????
???????? BDII ?????????????? ???? ?????? ??
???????? RLS ?????? ?????? ?????? CE
???????????? ??????? SE ??????? ????????
?????? WN ??????? ???? FS ???????? ??????
WN
SE
19Computing Resources Feb 2005
- Country providing resources
- Country anticipating joining
- In LCG-2
- 113 sites, 30 countries
- gt10,000 cpu
- 5 PB storage
- Includes non-EGEE sites
- 9 countries
- 18 sites
20Infrastructure metrics
Countries, sites, and CPU available in EGEE
production service
Region coun-tries sites cpu M6 (TA) cpu M15 (TA) cpu actual
CERN 0 1 900 1800 942
UK/Ireland 2 19 100 2200 2398
France 1 8 400 895 886
Italy 1 20 553 679 1777
South East 5 7 146 322 133
South West 2 12 250 250 498
Central Europe 5 8 385 730 373
Northern Europe 2 4 200 2000 427
Germany/Switzerland 2 10 100 400 1207
Russia 1 6 50 152 238
EGEE-total 21 95 3084 9428 8879
USA 1 3 - - 458
Canada 1 6 - - 316
Asia-Pacific 6 8 - - 394
Hewlett-Packard 1 1 - - 100
Total other 9 18 - - 1268
Grand Total 30 113 - - 10147
EGEE partner regions
Other collaborating sites
21Service Usage
- VOs and users on the production service
- Active HEP experiments
- 4 LHC, D0, CDF, Zeus, Babar
- Active other VO
- Biomed, ESR (Earth Sciences), Compchem, Magic
(Astronomy), EGEODE (Geo-Physics) - 6 disciplines
- Registered users in these VO 500
- In addition to these there are many VO that are
local to a region, supported by their ROCs, but
not yet visible across EGEE - Scale of work performed
- LHC Data challenges 2004
- gt1 M SI2K years of cpu time (1000 cpu years)
- 400 TB of data generated, moved and stored
- 1 VO achieved 4000 simultaneous jobs (4 times
CERN grid capacity)
Number of jobs processed/month
22Current production software (LCG-2)
- Evolution through 2003/2004
- Focus has been on making these reliable and
robust - rather than additional functionality
- Respond to needs of users, admins, operators
- The software stack is the following
- Virtual Data Toolkit
- Globus (2.4.x), Condor, etc
- EU DataGrid project developed higher-level
components - Workload management (RB, LB, etc)
- Replica Location Service (single central
catalog), replica management tools - R-GMA as accounting and monitoring framework
- VOMS being deployed now
- Operations team re-worked components
- Information system MDS GRIS/GIIS ? LCG-BDII
- edg-rm tools replaced and augmented as lcg-utils
- Developments on
- Disk pool managers (dCache, DPM)
- Not addressed by JRA1
- Other tools as required
23The deployment process
- Key point a certification process is essential
- However, it is expensive (people, resources,
time) - But, this is the only way to deliver production
quality services - LCG-2 was built from a wide variety of research
quality code - Lots of good ideas, but little attention to the
mundane needs of production - Building a reliable distributed system is hard
- Must plan for failure, must provide fail-over of
services, etc - Integrating components from different projects is
also difficult - Lack of common standards for logging, error
recovery, etc
24SA1 Operations Structure
- Operations Management Centre (OMC)
- At CERN coordination etc
- Core Infrastructure Centres (CIC)
- Manage daily grid operations oversight,
troubleshooting - Run essential infrastructure services
- Provide 2nd level support to ROCs
- UK/I, Fr, It, CERN, Russia (M12)
- Taipei also run a CIC
- Regional Operations Centres (ROC)
- Act as front-line support for user and operations
issues - Provide local knowledge and adaptations
- One in each region many distributed
- User Support Centre (GGUS)
- In FZK manage PTS provide single point of
contact (service desk) - Not foreseen as such in TA, but need is clear
25Grid Operations
- The grid is flat, but
- Hierarchy of responsibility
- Essential to scale the operation
- CICs act as a single Operations Centre
- Operational oversight (grid operator)
responsibility - rotates weekly between CICs
- Report problems to ROC/RC
- ROC is responsible for ensuring problem is
resolved - ROC oversees regional RCs
- ROCs responsible for organising the operations in
a region - Coordinate deployment of middleware, etc
- CERN coordinates sites not associated with a ROC
RC Resource Centre
26Accounting views
Select date range
Select VOs (Default All)
Web form to apply selection criteria on the data
Aggregate data across an organisation structure
(Default All ROCs)
27Policy Joint Security Group
Incident Response
Certification Authorities
Audit Requirements
Usage Rules
Security Availability Policy
Application Development Network Admin Guide
User Registration
http//cern.ch/proj-lcg-security/documents.html
28gLite Services for Release 1Software stack and
origin (simplified)
- Computing Element
- Gatekeeper (Globus)
- Condor-C (Condor)
- CE Monitor (EGEE)
- Local batch system (PBS, LSF, Condor)
- Workload Management
- WMS (EDG)
- Logging and bookkeeping (EDG)
- Condor-C (Condor)
- Storage Element
- File Transfer/Placement (EGEE)
- glite-I/O (AliEn)
- GridFTP (Globus)
- SRM Castor (CERN), dCache (FNAL, DESY), other
SRMs
- Catalog
- File and Replica Catalog (EGEE)
- Metadata Catalog (EGEE)
- Information and Monitoring
- R-GMA (EDG)
- Security
- VOMS (DataTAG, EDG)
- GSI (Globus)
- Authentication for C and Java based (web)
services (EDG)
29Main Differences to LCG-2
- Workload Management System works in push and pull
mode - Computing Element moving towards a VO based
scheduler guarding the jobs of the VO (reduces
load on GRAM) - Distributed and re-factored file replica
catalogs - Secure catalogs (based on user DN VOMS
certificates being integrated) - Scheduled data transfers
- SRM based storage
- Information Services R-GMA with improved API
and registry replication - Prototypes of additional services
- Grid Access Service (GAS)
- Package manager
- DGAS based accounting system
- Job provenance service
30Standards
- Web Services Fast moving area
- Follow WSRF and related standards but are not
early adopters - WS-I compatibility is a target
- Challenging to write WSDL which is WS-I
compatible AND can be processed by all the tools - Industry strength tooling not always available
- Trying to keep back from the bleeding edge
- Work on standards bodies
- Active contributions to
- GGF OGSA-WG
- GMA in OGSA
- Data Design team
- GGF INFOD-WG
- OASIS WS-N
- GGF GSM-WG (SRM)
- Co-chairing WG
- Replica Registration Service
- And following many, many others
- Adopting mature standards is a goal
31Release Timeline
First public nightly build (B 39)
RC1 (I20041217 B 151)
Release 1.0
Builds
First automated build (B 1)
First Integration build (I20041020 B 80)
RC1 (I20050204 B 206)
VOMS, Site Configuration
AliEn, R-GMA
Prototype Available to ARDA users
I/O Client I/O Server
Data Local Transfer Service, Single Catalog
CE, LB, WMS, WN
Functionality
May 2004
June 2004
July 2004
Aug 2004
Sep 2004
Oct 2004
Nov 2004
Dec 2004
Jan 2005
Feb 2005
March 2005
April 2005
Today
32RDIG in MW evaluation and testing
Testing/adaptation MW components (SA1) IHEP,
PNPI, JINR IHEP will participate in the
pre-production testing/adaptation of gLite
(SA1) Testing new MW components (NA4 ARDA) -
Metadata catalog, Fireman catalog, gridFTP, ...
(JINR, SINP MSU) - testing gLite for ATLAS and
CMS (PNPI, SINP MSU) EGEE work plan
January-March 2005 evaluation OMII
(JINR, KIAM RAS) April-October 2005
evaluation GT4 (SINP MSU, JINR, KIAM RAS)
33SINP MSU INFN (Padua)new mw - improved job flow
CERN-INTAS meeting, 14 March 2005, CERN
34SINP MSU new mw - monitoring of application jobs
- No LCG MW modification required (wrappers
additional server) - Access to the intermediate job output via
Web-interface - Authorization is based on the standard GSI
certificates and proxy certificates - Starting Web-page for interested users (with
instructions) - http//grid.sinp.msu.ru/acgi-bin/welcome.cgi
CERN-INTAS meeting, 14 March 2005, CERN
35New MW JINR, KIAM RAS, SINP MSUOGSA/Globus
evaluation for data intensive applications
- Based on the experience with OGSA/GT3 evaluation
in 2003-2004 (T. Chen et al. OGSA Globus Tolkit
Evaluation Activity at CERN, in Proc. of ACAT03,
NIMA 534 (2004) 80) - Release of the Globus Toolkit 4 is currently
scheduled for April 29, 2005 - www-unix.globus.org/toolkit/docs/development/4.0-d
rafts/GT4Facts - Therefore testing/evaluation of other OGSA/WS
systems potentially interesting for LCG/EGEE
CERN-INTAS meeting, 14 March 2005, CERN
36Testing the OMII basic functionality (KIAM
RASJINR)
- Applications must be pre-installed on the (Job
Service) server an execution of programs
prepared on client side is impossible. - No such core services like RB, IS, RC
- Management of (grid) accounts is not well
scalable, not well suitable for management of
large dynamic VOs - Clients must be installed for each user
separately (e.g., not under root) - Failed to deploy a new custom service into the
OMII container - Report was submitted to JRA1 and OMII Support
- The OMII 1.1.1 Job service was found to be robust
in a test with 20 concurrent clients - The maximal job submission rate 6 jobs/min
- no bulk batch mode for job submission --gt problem
for submitting large number of jobs - The Data Service was found to work stable with up
to 5 concurrent clients and a file size of up to
10MB (no tests beyond this limits yet).
CERN-INTAS meeting, 14 March 2005, CERN
37New Deployment new CIC/ROC
Release(s)
YAIM
Update Release Notes
Update User Guides
EIS
GIS
Every 3 months on fixed dates !
User Guides
Release Notes Installation Guides
Every Month
Certification is run daily
Every Month
at own pace
38gLite
- Differences
- Unit and functional testing already performed by
JRA1 - Releases have to be synchronized between JRA1,
SA1 based on NA4s priorities - New Sequence
- Certification Testbed (CERN)
- Installation/config tests
- Rerun functional tests (to validate
configuration) - Synthetic stress tests
- Preproduction Service
- Sites
- Krakow, FZK, IN2P3, CNAF, Padua, Bari, NIKHEF,
SNIC, Protvino-IHEP, UOM, LIP, PIC, RAL - sites test installation and configuration
- Applications test by using their production
software and give feedback on reliability and
functionality - Status
- Documentation of process is in draft state
- Certification Testbed
- gLite pre-release installed
- Preproduction Service
- Sites are installing current LCG2 release as a
platform for the gLite components
39SA2 (??? ??) ???????? ??????????? ??????????
?????????????? EGEE-NRENs
- ??????????? ????????? ????
- ?????????? ??????? ?????? ????????????
????????????????? ?????? - ??????????? ????? ?????????????? ? ??????????????
- ????????? ??????? ??????????????
- ??????? ?????????? ????????? ???????,
???????????? ? ?????? ????? - ??????? ?????????? ????? ?? ?????? ???????
?????????????? - ????? ??????? ???????
- ?????????? ??????? ?????? ??? ??????? ???????
???????, ????????? ? ????????????? EGEE ENOC
40SA2 (??? ??) ????? ??????? ??????? ???????
https//edms.cern.ch/document/503527
ENOC
Helpdesk
NREN
NREN
RC
RC
41????????? ???????? ?????????? VO
- ????? ?? ???????? ????
- ?.?. ????? - ??????????? ??????? ????
- ?.?. ????????? (NA4) - ????????????? ??
?????????????? ?????????? ???????? ?
??????????????? ???? - ?.?. ??????? - ????????????? ?? ???????????
??????? ??????????? ? ????????? ?????? ?? - ????? ??????? ????-??????? (????, ?.?.??????)
- ????????? ??????? ???????? (????????)
- ????????? ????? ??????????? ????????????? ? ?? ??
(???????) - ???????????? ???????????? ????? (???, ?.?.?????)
- ?????? ? ??????????? ? ????????? UI
- ????????? ????????????? ????? VO
- ???????? ??????????? VO.
- ????????? ?????? ????
42??????? ???????? ? ????? ?????? VO
- ?????????? ??
- rdig-registrar.sinp.msu.ru/newVO.html
- ????? ????? VO.
- ??????????? VO ? ?????? ?? ???????? ????????
???? - ?????????? ? ???????????????
- ?????? ?? ????????? (?????????? ? ????????? ? VO)
- ???????????? ??????????????? ??????? VO
- ??????????? ????????????? - ?????? VO ??
- rdig-registrar.sinp.msu.ru
- ?????????? ????? VO ? ????-?????????????? ????
- ?????????? ? ??????????? ????????? ??????? (??)
? ??????? ????? ??????????? ? ????????????? ??
????????
43?????????? ? ??????????????? ????? ??-???? ? VO
- ????????????? VO
- ??? ???????????? ????? VO
- ??? ??????? ???. ? ??????????? ????????? ? ????
- ? ?????????, ??????? ????????? ?????
????????????? ? ????? ? VO - ?????????? ????? ????????? ????????????
??????????? VO ? ?? (software managers group) - VO ????????? ?????? ????????? ????????
????????????? ????, ?????????????
?????????????? ? ?????????? ??????????? ???????,
?????????????? ?????? ???????????? - ???? ???????????? ?????????????? ???????? ?
?????? ???????????? ? ????????? ???????? ??
44(?????? ?????) VO ? ????
- RGStest ??? ???????????? ????
- eEarth ?????? ????????? ? ???????????
- ????????????,
- ? ?????????, ????????????? ??????????? ?
???????????? ??????? ?????? ? ???????? ??????
???????? ????????????? ?????????? ?? ????????????
???????? ??? ??????, ?????????????? ??????
??????? ? ?????????? ?????, ? ??????????
????????????? ???????????? ???? (???? ?????
????????? ???????? ?????? ??? ?????
??????????????) - ? ?????? ????????????? ????????
- Space Physics Interactive Data Resource (SPIDR) ?
- Integrated Distributed Environmental Archive
System (IDEAS), - ???????? ? ????????? ?????????? ??? ???????????
????? - ?? ???? ?????????????? ?????? ? ????????? ??????
????? ??? - ???????????? VO eEarth - ???. ???. ??? ? ?? ???
?.?. ?????. - ??????????? VO ? ??????? ?????????? ???? 10
45??????????? ????? VO
?? ?????? ?????????? ??????????? ??-????
????????? ? ???????? (????) ? ??????? (???)
????????? ??????? ? ?????? ????????? ???????????
??? ?????????? ??????
46?????????? ????? ??
47?????? ??????????
- ???? (EGEE) ???? ?????????????? ??? ???????
???????????? - ?????????? ????? ?????????? ??????? (VO)
- ???? (EGEE) ??????????????? ??????? ???
- ???????? ????????????? MW
- ????????????/???????? ?????? MW
- ????? ????????????? ? ???????????????? ??????????
- ?????????? ?????????? ????????????? MW