Title: Computing Resource Review Board Project Status
1CERN-RRB-2007-102
Computing Resource Review Board Project
Status CERN 23 October 2007
2Grid Activity
- Continuing increase in usage of the EGEE and OSG
grids - All sites reporting accounting data (CERN,
Tier-1, -2, -3) - Increase in past 17 months 5 X number of jobs
- 3.5 X
cpu usage
3Tier-2 Sites September 2007
- Of the 45 federations reporting - 10 account
for 50 of the cpu usage, 24 for 90 - Total usage equivalent to 48 of the commitment
of the 53 federations in the WLCG MoU - Only 16 federations have usage exceeding 70 of
the commitment
4September 2007 - CPU UsageCERN, Tier-1s, Tier-2s
- gt 80 of CPU Usage is external to CERN
5Baseline Services
The Basic Baseline Services from the TDR (2005)
- Storage Element
- Castor, dCache, DPM (with SRM 1.1)
- Storm added in 2007
- SRM 2.2 spec. agreed May 2006 -- being
deployed now - Basic transfer tools Gridftp, ..
- File Transfer Service (FTS)
- LCG File Catalog (LFC)
- LCG data mgt tools - lcg-utils
- Posix I/O
- Grid File Access Library (GFAL)
- Synchronised databases T0??T1s
- 3D project
- Information System
- Compute Elements
- Globus/Condor-C
- web services (CREAM)
- gLite Workload Management
- in production at CERN
- VO Management System (VOMS)
- VO Boxes
- Application software installation
- Job Monitoring Tools
... continuing evolution reliability,
performance, functionality, requirements
6CERN data export 2007
- Data distribution from CERN to Tier-1 sites
- The target rate was achieved last year under test
conditions - This year under more realistic experiment
testing, reaching 70 of the target peak rate
7CERN data export 2007
8(No Transcript)
9(No Transcript)
10Data Storage Services
- Signalled as a major concern at the last meeting
- Good progress with experiment testing (see
previous slides) - dCache (DESY, FNAL)
- New version being deployed now with all
functionality needed for startup (including SRM
2.2) - CASTOR (CERN)
- Performance problems at CERN resolved full
performance demonstrated with ATLAS - New version (SRM 2.2-ready) deployed at all
Castor sites over past few months - Upgrades with all functionality needed for
startup being deployed now - DPM (CERN), STORM (INFN)
- simpler disk-only systems
- Being introduced in production
11Castor during CMS Export Tests
CMS t0 export pool(330TB across 60 servers)
- Red Data into the pool
- 100MB/s from tape
- occasionally up to 100MB/s data import
- Rest is data written by CSA07 (preparation)
application.
- Green Data out of the pool
- 280MB/s to tape
- occasionally up to 100MB/s data export
- Rest is data read by CSA07 (preparation)
application.
CSA 07 starts
- Several concurrent activities with aggregate
I/O corresponding to nominal CMS speed at
100 efficiency. - Up to 900,000 file operations/day (10/s)
- Good stability
Tony Cass cern/it 5 october 07
12SRM 2.2 Current Schedule
- Schedule has slipped again
- New implementations installed at test sites, but
test programme stalled due to availability of
experts - Subject of workshop at beginning of September
? more realistic schedule agreed - Beta testing
- September ATLAS testing (BNL, FZK, IN2P3, NDGF)
- October LHCb testing (CERN, CNAF, FZK, IN2P3,
NIKHEF) - End of October (after CSA07) CMS testing
- November
- dCache 1.8 in production at FZK
- SRM 2.2 production services at Castor sites
- February 2008 SRM 2.2 in production at all key
sites - DPM, STORM already available for production use
13Site Reliability
14Combined Computing Readiness Challenge - CCRC
- A combined challenge by all Experiments Sites
- validate the readiness of the WLCG computing
infrastructure - before start of data taking
- at a scale comparable to that need for data
taking in 2008 - Should be done well in advance of the start of
data taking - to identify flaws, bottlenecks
- and allow time to fix them
- Wide battery of tests simultaneously all
experiments - Driven from DAQ with full Tier-0 processing
- Site-site data transfers, storage system to
storage system - Required functionality and performance
- Data access patterns similar to 2008 processing
- CPU and data loads simulated as required to reach
2008 scale - Coordination team in place
- Two test periods February, May
15Ramp-up Needed for Startup
16Summary
- Applications support in good shape
- WLCG service
- Baseline services in production with the
exception of SRM 2.2 - Continuously increasing capacity and workload
- General site reliability is improving
- Data and storage remain the weak points
- Experiment testing progressing
- involving most sites, approaching full dress
rehearsals - Sites experiments working well together to
tackle the problems - Major Combined Computing Readiness Challenge next
year before the machine starts - Steep ramp-up ahead to delivering the capacity
needed for 2008 run