LCG Service Challenge Report - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

LCG Service Challenge Report

Description:

Dominique Boutigny, Rainer Mankel, Davide Salomoni, Junji Haba, Rik Yoshida ... SC1 - Nov04-Jan05 - data transfer between CERN and three Tier-1s (FNAL, NIKHEF, FZK) ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 17
Provided by: sijbran
Category:
Tags: lcg | challenge | report | rik | service

less

Transcript and Presenter's Notes

Title: LCG Service Challenge Report


1
LCG Service Challenge Report
  • LHCC closed session 17/11/2005
  • Sijbrand de Jong also for
  • Dominique Boutigny, Rainer Mankel, Davide
    Salomoni, Junji Haba, Rik Yoshida
  • SC time path and goals
  • SC3 results and plans
  • SC4 plans
  • Conclusion

2
Service Challenges (SC) time path goals
SC1 - Nov04-Jan05 - data transfer between CERN
and three Tier-1s (FNAL, NIKHEF, FZK)
2005
SC2 Apr05 - data distribution from CERN to 7
Tier-1s 600 MB/sec sustained for 10 days (one
third of final nominal rate)
2006
cosmics
2007
first beams
first physics
2008
full physics run
3
SC3 results data throughput
Failed by factor of 2, needs to be repeated
4
SC3 results data throughput
  • Sources of problems
  • Service instability
  • CASTOR
  • dCache (lots of help from DESY)
  • Storage Resource Management (SRM) in general
  • gLite File Transfer Service (FTS)
  • Much debugging and fixing of various components

5
SC3 results data throughput
Rerun scheduled in January 2006, targets
6
SC3 results LCG team view
Jamie Shiers
  • Considered much more than just a throughput
    test
  • SRM required on all sites
  • LCG File Catalogue (LFC) deployed as required
  • Global catalog for LHCb, site local for
    ALICE ATLAS
  • CMS will be essentially as LHCb
  • Many of the baseline services (BSWG) deployed
  • (Nearly) all T1s took part better than foreseen
    !
  • gt20 T2s included Good !
  • All experiments, clear goals, metrics
    established
  • Underlined complexity of the enterprise
  • Excellent collaboration between LCG, sites
    experiments
  • Many problems resolved, continue to improve
  • Need re-run of throughput test to confirm all
    fixed

7
SC3 results What experiments tested
Nick Brook
Data base service needed, but not yet tested
8
SC3 results Where experiments tested
9
SC3 results Experiments findings (1)
Gained lots of experience, did much
debugging Many file transfers did not work on
first attempt
Quality Successful transfers vs. those
started Hours Number of hours with successful
transfers Rate Volume / Hours
10
SC3 results Experiments findings (2)
CASTOR-2 has not always been a delight (
stronger wording) Still need to establish many of
original goals Stability is key
When service stable - LHCb SC3 needs surpassed
11
SC3 results Experiments findings (3)
Much data has been transferred ALICE 200,000
files 20 TB CMS
145 TB sustained rate 20-90 MB/s LHCb
75,000 files 10 TB sustained rate 10-55
MB/s ATLAS 20 TB
sustained rate 20-30 MB/s Many CPU resources
have been used. Reviewers remarks Experiments
results not cast in the same metrics Much
quantitative information lacking in presentations
12
SC3 resultsTier 1 2 experience (1)
M.Mazzucato (CNAF), I.Fisk (FNAL), G.Stewart
(Glasgow)
Remember that HEP is not the only user, others
have different requirements (biology,
astrophysics,) Operating system issues
(redhat vs scientific linux) CNAF WAN-gtdisk
175 MB/s WAN-gtTape 50 MB/s 1200-1550
kSI2k CPU power active T2 in test
Torino, Milano, Pisa, Legnaro, Bari, Catania
13
SC3 resultsTier 1 2 experience (2)
FNAL no major hick-ups, much functionality
tested earlier had their share of
small problems helped T2 sites
supports both OSG-0.2 and LCG-2
concurrently running CDF DØ (Glasgow) T2 Much
British CPU will be T2 (many sites too)
software installation Python versus
bash
SFT/GGUS messages clearer
quality of release availability
operator availability 95 uptime ambitious
real support from T1 people
14
SC4 plans
  • Full throughput test to start disk ? disk /
    disk ? tape
  • Full baseline services, including database
    service, VOMS
  • T0 recording to tape and T1 reprocessing
  • Site Functionality Test (SFT) as performance
    metric
  • Many SFT sub-tests still have to be
    defined/implemented
  • T1 sites all seem to be ready to start
  • Many T2 sites interested, no problem to reach
    20-40 sites
  • Deploy COOL, 3D, AA services (PROOF, xrootd)
  • Next generation tape drives

15
SC4 Preparation for data taking ?
YES, I think so, but
still have to pass SC3 and how to measure
success
16
Conclusion
  • Service Challenges sensible approach to prepare
  • for data taking phase
  • SC3 spawned a lot of effort, also outside LCG
  • Many T1 and T2 sites enthusiastically
    participate
  • Major improvements in many baseline services
  • Still to demonstrate required SC3 throughput
  • Need database services for SC4
  • Quantitative information needed for tractability
Write a Comment
User Comments (0)
About PowerShow.com