Title: CrossGrid%20Testbed%20Status
1 CrossGrid Testbed Status
- Jorge Gomes jorge_at_lip.pt
- LIP Computer Centre / X WP4
2X testbeds
- According with the plans the initial monolithic
testbed was recently separated in two
infrastructures.
Production testbed EDG 1.2.2/3
Initial testbed EDG 1.2.2/3
Validation testbed EDG 1.4.3
Deployed in the context of tasks 4.1 and 4.4
- Production testbed
- Used to test application prototypes.
- Validation testbed
- Mostly used to test new production middleware.
3X testbed resources
Production testbed Production testbed Validation testbed Validation testbed
Computing Elements 15 Computing Elements 3
Worker nodes 69 Worker nodes 4
CPUs 115 CPUs 5
Storage elements 14 Storage Elements 3
Storage capacity 2.7TB Storage capacity 1.2TB
- Of the 16 sites foreseen
- 10 are fully available.
- 2 are deployed and being tested.
- 2 are currently in the validation testbed.
- 2 are deployed but not available (not tested).
- The X testbeds already offer considerable
computing and storage resources.
4X production sites status
5The X production Resource Broker
1.
1.
Job requests are submitted from remote UIs
2.
Jobs are sent to the RB located at
LIP
2.
Jobs are sent to the RB located at LIP
3.
3.
The RB uses site information in the matchmaking
3.
The RB submits the job to a CE using GRAM
4.
The RB submits the job to a CE using GRAM
Central site
Any
X
Lisbon
Remote site
4
2
Any
Resource
CrossGrid
JSS
broker
User
Interface
I I
1
3
JDL Job
request
6Central X production services (1)
The CrossGrid production central services are
located in Lisbon and maintained by LIP.
MyProxy RB RC VO UI Monitoring
Certification proxy Resource broker Replica
catalogue Virtual organisation server User
interface Grid monitoring
MyProxy
RB
Central services
RC
VO
Monitoring
7Central X production services (2)
- Resource Broker
- Matchmaking and load balancing scheduler.
- Performs load sharing across X sites.
- Certificate Proxy server
- Short lived certificates for long lived
processes. - Used by the applications portal and by the RB.
- Virtual Organizations server
- Database for user authentication.
- Is used to build the authorization databases of
all X sites. - Replica Catalogue
- Database for physical replica file location.
- Central service to find the location of files in
SEs. - Network and Grid Monitoring
- Early detection of problems in the X testbed.
8The X production Replica Catalogue
- The production RC
- This is basically an LDAP server.
- Hosted at lngrid08.lip.pt port 9980.
- Is used by the RB and RM.
VO Collection Description
crossgrid cgtst0 CrossGrid collection
wpsix wpsixtst0 Being used in tests
atlas atlastst0 Being used in tests
cms cmstst0 Not used
URL ldap//lngrid08.lip.pt9980 /rcCrossGridRepl
icaCatalogue,dclngrid08,dclip,dcpt
9X validation sites status
- The validation testbed was created in the context
of the task 4.4 testbed quality assurance. - Currently EDG 1.4.3 is being tested.
- All three sites have been successfully deployed.
- The central services for the validation testbed
have been successfully deployed at LIP.
10The X validation Resource Broker
Self registration
1.
1.
Job requests are submitted from remote UIs
2.
Jobs are sent to the RB located at
LIP
2.
Jobs are sent to the RB located at LIP
3.
3.
The RB uses site information in the matchmaking
3.
The RB submits the job to a CE using GRAM
4.
The RB submits the job to a CE using GRAM
Athens
Central site
Any
X
Lisbon
Remote site
4
2
Any
Resource
CrossGrid
JSS
Broker
User
Interface
3
1
Karlsruhe
Information Index
JDL Job
request
New server
11Central X validation services (1)
The CrossGrid validation central services are
located in Lisbon and maintained by LIP.
MyProxy RB RC VO UI Monitoring II
Certification proxy Resource broker Replica
catalogue Virtual organisation server User
interface Grid monitoring Information Index
I I
MyProxy
RB
Central services
RC
VO
Monitoring
12Central X validation services (2)
- Resource Broker
- Matchmaking and load balancing scheduler.
- Performs load sharing across X sites.
- Certificate Proxy server
- Shared with the production testbed.
- Virtual Organizations server
- Shared with the production testbed.
- Replica Catalogue
- Database for physical replica file location.
- Central service to find the location of files in
SEs. - Network and Grid Monitoring
- Shared with the production testbed.
- Information Index
- TOP MDS information server contains pointers to
the site information servers.
13The X validation Replica Catalogue
- The production RC
- This is basically an LDAP server.
- Hosted at rc01.lip.pt port 9980.
- Is used by the RB and RM.
VO Collection Description
crossgrid cg CrossGrid collection
URL ldap//rc01.lip.pt9980/rcCG Replica
Catalog,dcrc01,dclip,dcpt
14Production and validation systems hosted at LIP
Test and
Production
Shared
Validation
LCFG
Gatekeeper
Gatekeeper
(lngrid01)
(lngrid02)
(ce01)
CA
WN
WN
Local
(OFFLINE)
Resources
(...)
(...)
SE
SE
(lngrid03)
(se01)
Some X central systems will soon be moved to the
FCCN NOC in Lisbon.
UI
UI
(lngrid05)
(ui01)
RB
RB
MyProxy
(lngrid06)
(rb01)
(lngrid07)
ral
Services
RC
RC
VO
Cent
(lngrid08)
(rc01)
(lnnet05)
II
Monitoring
(ii01)
(lnnet07)
15Central Services Hosting
- LIP and the Portuguese academic network (FCCN)
have establish a protocol for the hosting of
LIP/CrossGrid systems into the Lisbon NOC. - The contract allows
- LIP to install servers in the Lisbon NOC.
- Higher bandwidth.
- The systems to be in the same room of the Géant
router (only one hop in the middle). - Continuous power supply (diesel generator) .
- The systems will be under full control of LIP.
- This is result of a collaboration between LIP and
FCCN on Grid and network technologies.
16Virtual Organizations
- CrossGrid has its own VO server
- The VO server is used to build the authorization
databases of the X testbed systems. - Currently is an LDAP server (VOMS is being
tested). - Hosted at grid-vo.lip.pt port 9990.
- CrossGrid users can send their VO membership
requests to vo.admin_at_lip.pt - 43 users are registered in the crossgrid VO.
VO Group Description
crossgrid testbed1 All CrossGrid users
cgTV alpha Test and validation experts
cgTV beta Test and validation users
gdmpservers apptb All production GDMP servers
gdmpservers tvtb All validation GDMP servers
gdmpservers devtb Not used
17Certification Authorities
- Five new CAs were created and are now recognized
by CrossGrid. - All CAs are operational issuing certs and CRLs.
- All CAs are recognized by DataGrid with one
exception that is finishing the acceptance
process.
18Certification Authorities (2)
- However the work is not complete (it will never
be). - Sometimes CRLs expire causing denial of service.
- A tool to monitor the CRL issuance is being
developed. - Possibly the same will happen with the issued
certificates since they have 1 year of lifetime. - A tool to monitor the validity of the host
certificates is being developed. - The new Cyprus CA is not installed everywhere.
- Security policies and procedures to deal with
certificate compromise are required. - A draft was written (to be discussed in the
security team). - Probably the CRL download period must be shorter.
- A manual explaining the theory behind
certificates and how they should be used is
required.
19Monitoring and verification
- Grid and network monitoring services have been
deployed to monitor the X testbed. - http//mapcenter.lip.pt
- An installation and verification tool was
developed at LIP to verify X testbed sites. - Interim version can be consulted at
- http//www.lip.pt/computing/cg-services/site_check
20Testbed support
- The CrossGrid helpdesk application is being
tested and is almost ready. - The current sources of support are still
- crossgrid-wp4-support_at_lists.cesga.es
- http//grid.ifca.unican.es/crossgrid/wp4
- The support for the central services is currently
provided by LIP. - grid.support_at_lip.pt
- http//www.lip.pt/computing/cg-services
- http//www.lip.pt/computing/cg-tv-services
- Installation manual
- http//gridportal.fzk.de/cgi-bin/viewcvs.cgi/cross
grid/crossgrid/wp4/sites/demo/documents/install_gu
ide_v1.0.pdf
21Testbed monitoring
Mapcenter grid monitoring framework. Mapcenter
was developed by DataGrid and adapted to
CrossGrid by LIP. Enhancements are being
implemented by LIP in cooperation with DataGrid.
http//mapcenter.lip.pt
22X host check tool
Host Check grid host checker. Host Check was
developed by LIP to support the CrossGrid testbed
deployment. Host Check produces a detailed
report for each testbed CE and SE.
http//www.lip.pt/computing/cg-services/site_check
23Production RB statistics
Total users 33
Jobs submitted 2094
Jobs accepted 1951
Jobs with good match 1836
Jobs submitted by JSS 1817
Jobs run 1651
Jobs done 1101
- The peak usage of the RB was between last
November and December. - Since the current RB doesnt support parallel
jobs, MPI job submissions pass unnoticed to the
RB.
24Validation RB statistics
Total users 8
Jobs submitted 4173
Jobs accepted 4173
Jobs with good match 4010
Jobs submitted by JSS 4007
Jobs run 3964
Jobs done 3954
163 matching failures 3 not submitted 43
didnt run 219 jobs lost 94.8 success
- The test and validation RB has been established
recently. - The validation RB also doesnt support parallel
applications.
25Production CEs statistics
Sites Connections Pings Jobs OK Failed Jobs Failed Jobs Failed Jobs Failed Jobs
Sites Connections Pings Jobs OK LCAS CRL exp Jobman GSS
LIP 6556 462 2836 50 17 92 3099
IFIC 5326 655 2649 100 97 45 1780
Cyfronet 4516 306 2522 0 20 111 1557
II SAS 1404 6 1185 0 15 99 99
FZK 1799 11 1112 118 7 123 428
Demo 9481 5 1111 36 0 51 8278
ICM 705 34 604 8 24 2 33
CESGA 7321 1 544 78 28 13 6657
UAB 600 14 519 0 9 14 44
INS 592 2 517 6 20 20 27
PSNC 582 0 496 15 14 11 46
TCD 145 0 131 0 0 2 12
AUTH 141 0 127 0 3 0 11
TOTAL 39168 1496 14353 411 254 583 22071
26Validation CEs statistics
Sites Connections Pings Jobs OK Failed Jobs Failed Jobs Failed Jobs Failed Jobs
Sites Connections Pings Jobs OK LCAS CRL exp Jobman GSS
LIP 67365 2319 64995 21 0 4 26
FZK 8883 64 8671 38 12 50 48
Demo 10665 0 6170 4 6 2 4483
TOTAL 86913 2383 79836 63 18 56 4557
- The validation testbed has been heavily
exercised. - More than 80.000 jobs have been submitted since
the end of November.
27X in the DataGrid testbed
- CrossGrid
- sites in the
- DataGrid
- testbed as
- seen by
- Mapcenter
- Europe view
28Test of X applications (1)
- The tests of the X HEP example application using
MPICH-G2 across sites started in November. - Test were performed
- Using dedicated systems (IFCA).
- Using the CrossGrid production testbed (LIP,
Demokritos). - The tests over the testbed have shown that
- Its possible to run MPI jobs in the testbed.
- MPI across sites with MPICH-G2 works.
- However problems were detected in sites using
private IP addresses.
29Test of X applications (2)
- It was possible to run the application in up to
seven sites at the same time. - The application was compiled statically.
- Both PBS and FORK job managers were used in the
tests. - Issues
- There isnt support for parallel jobs in the RB
(yet), matchmaking must be performed by the user. - Check that the user is authorized at the testbed
sites. - Check that there are free CPUs available.
- PBS jobs may end up waiting in a queue.
- Sometimes processes stay hanged in the queues.
- Sometimes the execution hangs at start.
- Problems with private IP addresses.
- Possible problems with firewalls.
30IST Demonstration
- World grid demonstration involving European and
US sites from CrossGrid, DataGrid, GriPhyN and
PPDG. - Has taken place on November 2002.
- It was the largest grid testbed in the world.
- Applications from the CERN/LHC experiments CMS
and Atlas were used. - CrossGrid participated with 3 sites
- LIP - Lisbon
- FZK - Karlsruhe
- IFIC - Valencia
31Future
- Test and deploy the first release of CrossGrid
middleware. - Initiate the security group activities.
- Policies, guidelines, tracking of problems,
patches. - Support the extension of the testbed to new
sites. - More sites internal to the project.
- Possible external sites and users (policy
needed). - Support clusters already running other Linux
flavours. - Light installation.
- Establish a development testbed.
- Prepare the test and possible migration to EDG
2.x and Linux 7.x. - Study the usage of QoS in CrossGrid.
- Create a QoS test infrastructure.
32E N D
LIP
FKZ
IFIC
IFCA