Title: DO Computing Status and Budget Requirements
1DO Computing Status and Budget Requirements
- Amber Boehnlein
- DO International Finance Committee
- April 20, 2005
2 Computing Model
Remote Farms
Central Farms
User Data
Data Handling Services
Central Storage
User
Desktops
Central Analysis Systems
Remote Analysis Systems
3Recent Achievements
- Operations are smooth for DO
- Joint operations department formed from CDF and
DO CD departments - Combining pager rotations, expanding use of
automated tools. - Second generation deployments
- Completion of calibration DB access in RECO for
DO - TMB and Common analysis format
- Reduction in skim sizes
- 30 speed-up of reco
- Monte Carlo production for DO using automated
submission tools - Global Reprocessing for DO has started
- 2003-100M events reprocessed offsite
- Goal of 800M events reprocessed offsite for 2005
- Data handling developments for improved
functionality and operations and product
longevity - Hardwarereplacing aging infrastructure
components such as D0mino
4Computing Contributions
- Use the FNAL equipment budget to provide very
basic level of functionality - Databases, networking and other infrastructure
- Primary Reconstruction
- Robotic storage and tape drives
- Disk cache and basic analysis computing
- Support for data access to enable offsite
computing - Estimate costs based on experience or need for
replacements - Remote Contributions
- Monte Carlo production takes place at remote
centers - Reprocessing (or primary processing)
- Analysis at home institutions
- Contributions at FNAL to project disk and to
CLuED0 - Collaboration-wide analysis
5Virtual Center
- For the value basis, determine the cost of the
full computing system at FNAL costs, purchased in
the yearly currency - Disk and servers and CPU for FNAL analysis
- Production activities such as MC generation,
processing and reprocessing. - Mass storage, cache machines and drives to
support extensive data export - Assign fractional value for remote contributions
- Merit based assignment of value
- Assigning equipment purchase cost as value
(Babar Model) doesnt take into account life
cycle of equipment nor system efficiency or use. - While shown as a predictor, most useful after the
fact - Computing planning board includes strong remote
participation, representation - Not included as part of the value estimate yet
- Wide Area Networking, Infrastructure, desktop
computing, analysis
6Data Handling/Production
CAB analysis stations
- 15M-25M Events logged per week
- Production capacity sized to keep up with data
logging. - MC production at remote sites 1M events/week
- Tape writes/reads
- 7TB/week average writes
- 30 TB/week reads
- Analysis requests at FNAL
- 750 -1100 M events
- DO 50 TB/week in 1000 requests
Files/30 minutes Red shows error
7Central Analysis
- Support peak load of 200 users
- TMB, Ntuple based analysis, some user MC
generation - Supports post-processing fixing as a common
activity (moving to production platform) - B physics tends to be most cpu and event
intensive - DOCentral Analysis Backend
- 2 THZ
- Past year, short of cache, over-reliance on tape
access. - Deployed 100 TB as SAM Cache on CABSRV
- 70 TB user controlled space, primarily on CLuED0
- ASA nodes still not in production
Using Remedy system Tickets/hardware/year Tracking
in this way helps Us to understand which And
how to mitigate operational issues
8 Central Robotics
30TB At peak
Daily Enstore traffic for CDF, DO, and other users
DO 9940 638 TB DO LTOI 175
TB DO LTOII 159 TB
971 TB Total Diversity of robotics/drives maintain
s flexibility
5000 mounts/day At peak
Known data loss due to Robotics/Enstore for DO
gt10 GB
9Wide Area Networking
- OC(12) to ESNET, filling production link,
anticipate upgrade - RD Fiber link to Starlight-used to support
reprocessing for - WestGrid
Out Traffic at the border router, peak stressing
OC(12)
Outbound since DEC 2004 CDF Green, DO-Blue
10Cost Estimate-Sept 2004
The guidance in 2002 was 2M, cut to 1.5 M. In
2003, 1.5M, cut to 1.35M (0.05M off the top,
0.1M for Wideband tax.) Added replacing mover
nodes to infrastructure relative to document We
did not add a tax cost to the price of the
nodes, and probably should consider doing so.
(535/node in FY2004) (Reco farm sized to keep up
with 25 Hz weekly)
11FY 2005
- Bottom up estimate for FNAL budget 1.8M for
equipment, 250K for tapes - Actual budget 1.25 M in equipment funds, 125K
for tapes - Possible mitigations and trade-offs.
- 30 speed up of Reco
- Go to 4 year retirement cycle on farm, analysis
nodes - Rely more on remote computing, particularly in
the out-years - Postpone 10Gb uplink to FY2006
- Reduce skim size and assume only one skimming
pass - Rely more heavily on LTOII media which costs ½ of
STK media for the same density. - Reduce drastically amount of MC DSTs stored
- Recycle stk tapes
- 2006 bottoms up estimate in progress
12Conclusions
- The DO computing model is successful
- We have developed tools to enable us to target
computing spending at FNAL - We use metrics from SAM and system monitoring to
provide estimators. - Use Virtual Center Concept to calculate the
value that remote computing give the
collaboration. - DO continues to pursue a global vision for the
best use of resources by moving towards
interoperability with LCG and OSG - DO computing remains effort limiteda few more
dedicated people could make a huge difference. - Short budgets, needs for continued construction
projects and aging computing infrastructure is a
serious cause for concern