DO Computing Status and Budget Requirements - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

DO Computing Status and Budget Requirements

Description:

Disk and servers and CPU for FNAL analysis ... B physics tends to be most cpu and event intensive. DO Central Analysis Backend ~2 THZ ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 13
Provided by: pc688
Category:

less

Transcript and Presenter's Notes

Title: DO Computing Status and Budget Requirements


1
DO Computing Status and Budget Requirements
  • Amber Boehnlein
  • DO International Finance Committee
  • April 20, 2005

2
Computing Model
Remote Farms
Central Farms
User Data
Data Handling Services
Central Storage
User
Desktops
Central Analysis Systems
Remote Analysis Systems
3
Recent Achievements
  • Operations are smooth for DO
  • Joint operations department formed from CDF and
    DO CD departments
  • Combining pager rotations, expanding use of
    automated tools.
  • Second generation deployments
  • Completion of calibration DB access in RECO for
    DO
  • TMB and Common analysis format
  • Reduction in skim sizes
  • 30 speed-up of reco
  • Monte Carlo production for DO using automated
    submission tools
  • Global Reprocessing for DO has started
  • 2003-100M events reprocessed offsite
  • Goal of 800M events reprocessed offsite for 2005
  • Data handling developments for improved
    functionality and operations and product
    longevity
  • Hardwarereplacing aging infrastructure
    components such as D0mino

4
Computing Contributions
  • Use the FNAL equipment budget to provide very
    basic level of functionality
  • Databases, networking and other infrastructure
  • Primary Reconstruction
  • Robotic storage and tape drives
  • Disk cache and basic analysis computing
  • Support for data access to enable offsite
    computing
  • Estimate costs based on experience or need for
    replacements
  • Remote Contributions
  • Monte Carlo production takes place at remote
    centers
  • Reprocessing (or primary processing)
  • Analysis at home institutions
  • Contributions at FNAL to project disk and to
    CLuED0
  • Collaboration-wide analysis

5
Virtual Center
  • For the value basis, determine the cost of the
    full computing system at FNAL costs, purchased in
    the yearly currency
  • Disk and servers and CPU for FNAL analysis
  • Production activities such as MC generation,
    processing and reprocessing.
  • Mass storage, cache machines and drives to
    support extensive data export
  • Assign fractional value for remote contributions
  • Merit based assignment of value
  • Assigning equipment purchase cost as value
    (Babar Model) doesnt take into account life
    cycle of equipment nor system efficiency or use.
  • While shown as a predictor, most useful after the
    fact
  • Computing planning board includes strong remote
    participation, representation
  • Not included as part of the value estimate yet
  • Wide Area Networking, Infrastructure, desktop
    computing, analysis

6
Data Handling/Production
CAB analysis stations
  • 15M-25M Events logged per week
  • Production capacity sized to keep up with data
    logging.
  • MC production at remote sites 1M events/week
  • Tape writes/reads
  • 7TB/week average writes
  • 30 TB/week reads
  • Analysis requests at FNAL
  • 750 -1100 M events
  • DO 50 TB/week in 1000 requests

Files/30 minutes Red shows error
7
Central Analysis
  • Support peak load of 200 users
  • TMB, Ntuple based analysis, some user MC
    generation
  • Supports post-processing fixing as a common
    activity (moving to production platform)
  • B physics tends to be most cpu and event
    intensive
  • DOCentral Analysis Backend
  • 2 THZ
  • Past year, short of cache, over-reliance on tape
    access.
  • Deployed 100 TB as SAM Cache on CABSRV
  • 70 TB user controlled space, primarily on CLuED0
  • ASA nodes still not in production

Using Remedy system Tickets/hardware/year Tracking
in this way helps Us to understand which And
how to mitigate operational issues
8
Central Robotics
30TB At peak
Daily Enstore traffic for CDF, DO, and other users
DO 9940 638 TB DO LTOI 175
TB DO LTOII 159 TB
971 TB Total Diversity of robotics/drives maintain
s flexibility
5000 mounts/day At peak
Known data loss due to Robotics/Enstore for DO
gt10 GB
9
Wide Area Networking
  • OC(12) to ESNET, filling production link,
    anticipate upgrade
  • RD Fiber link to Starlight-used to support
    reprocessing for
  • WestGrid

Out Traffic at the border router, peak stressing
OC(12)
Outbound since DEC 2004 CDF Green, DO-Blue
10
Cost Estimate-Sept 2004
The guidance in 2002 was 2M, cut to 1.5 M. In
2003, 1.5M, cut to 1.35M (0.05M off the top,
0.1M for Wideband tax.) Added replacing mover
nodes to infrastructure relative to document We
did not add a tax cost to the price of the
nodes, and probably should consider doing so.
(535/node in FY2004) (Reco farm sized to keep up
with 25 Hz weekly)
11
FY 2005
  • Bottom up estimate for FNAL budget 1.8M for
    equipment, 250K for tapes
  • Actual budget 1.25 M in equipment funds, 125K
    for tapes
  • Possible mitigations and trade-offs.
  • 30 speed up of Reco
  • Go to 4 year retirement cycle on farm, analysis
    nodes
  • Rely more on remote computing, particularly in
    the out-years
  • Postpone 10Gb uplink to FY2006
  • Reduce skim size and assume only one skimming
    pass
  • Rely more heavily on LTOII media which costs ½ of
    STK media for the same density.
  • Reduce drastically amount of MC DSTs stored
  • Recycle stk tapes
  • 2006 bottoms up estimate in progress

12
Conclusions
  • The DO computing model is successful
  • We have developed tools to enable us to target
    computing spending at FNAL
  • We use metrics from SAM and system monitoring to
    provide estimators.
  • Use Virtual Center Concept to calculate the
    value that remote computing give the
    collaboration.
  • DO continues to pursue a global vision for the
    best use of resources by moving towards
    interoperability with LCG and OSG
  • DO computing remains effort limiteda few more
    dedicated people could make a huge difference.
  • Short budgets, needs for continued construction
    projects and aging computing infrastructure is a
    serious cause for concern
Write a Comment
User Comments (0)
About PowerShow.com