D0 Production Status and Plans - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

D0 Production Status and Plans

Description:

Canada China Korea US ... Biggest problem likely to be manpower ~1 FTE/site for 6-8 months is significant burden ... Manpower demands will be significant issue ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 11
Provided by: MichaelD116
Category:

less

Transcript and Presenter's Notes

Title: D0 Production Status and Plans


1
D0 Production Status and Plans
  • Run II Computing Review 2004
  • Michael Diesburg

2
Reconstruction Farm Status
  • Current Configuration
  • 8 processor SGI Origin, 4GB memory, 2GB disk
  • Used as NFS server, output staging area
  • 19 dual processor PIII nodes
  • Used as input stagers for production nodes
  • 368 dual processor worker nodes (PIII, Athlon,
    Xeon)
  • 1225 GHz total capacity (PIII equivalent)
  • Software components
  • FBSNG Fermilab farm batch system
  • Dfarm distributed disk cache (used for code
    distribution)
  • FCP farm copy (controls network copy load)
  • SAM Used for all data I/O
  • Local control scripts

3
Reconstruction Farm Status
  • Current Operating Performance
  • Have demonstrated operating efficiency gt 80
  • i.e. can utilize gt 80 of rated capacity for
    periods of months
  • Current demand on resources (since Jan 04) has
    been 60 of capacity.
  • This is nearly ideal match of demand and
    available resources
  • Reconstructed data available in SAM in 48 hours
  • Special requests can be satisfied in lt 24 hours
  • Capacity available for new tasks (TMB fixing)
  • Can catch up from down periods in a few days
  • Can reconstruct 3.5M events/day if average
    execution time is 25GHz-secs/event.

4
Reconstruction Farm Status
  • Operating Performance Projections
  • Production capacity is strongly dependent on
    luminosity of the data being processed
  • Average reconstruction time is 25 secs/event at
    initial luminosity of 35E30, 65 secs/event at
    70E30, (110 secs/event at 90 E30)
  • To produce 3.5M events/day at L gt 35E30 we will
    be dependent on duty cycle mismatch between farm
    and accelerator
  • With current configuration we can keep up with
    data collection to 75E30 assuming accelerator
    duty cycle of 1/3.
  • But, will lose advantages of excess capacity
    noted above.
  • Operational efficiency will also suffer

5
Reconstruction Farm Status
  • Farm Expansion Plans
  • Will add 80 3GHz nodes to farm this year
  • Will bring total capacity to 1500 GHz (PIII)
  • With this addition we should be able to keep up
    with data collection to 85E30
  • Need HDCF ready by October in order to have
    these nodes available at end of current shutdown
  • Will also move nodes currently in NML to HDCF
  • Will upgrade input staging node connections to GB
  • Will move some functionality off IRIX to Linux
  • FBS, Dfarm server
  • SAM station

6
Reconstruction Farm Status
  • Farm Expansion Plans
  • Operational software needs updating
  • Current software is very robust and allows high
    efficiency operation
  • Not very flexible (tailored to specific
    executables)
  • Not portable (very specific to FNAL installation)
  • Will shift to non-process specific, non-site
    specific system
  • Plan on moving to samGRID as soon as possible
  • Need to maintain efficiency of operation as cut
    over is made
  • Will allow us to more easily shift operation
    between reconstruction, Monte Carlo,
    post-processing
  • Also will make us interoperable with remote
    sites.

7
Monte Carlo Production Status
  • MC Production
  • All MC production has been done off-site
  • Utilized 13 sites in 8 countries over the last
    year
  • Brazil Czech Republic Denmark France
  • Germany India UK US
  • Expect additional sites to come online in next
    year in
  • Canada China Korea US
  • Difficult to characterize the potential capacity
    since resources are not all dedicated
  • Generated 37M events last year (Max 1.5M/week)
  • Usually very little delay between request
    submission and start
  • Generation time 1-2 weeks (depends on site)
  • Demand varies significantly during year
  • Currently have idle capacity

8
Reprocessing Status
  • Reprocessing, Round I
  • Planning for reprocessing run began June 2003
  • Required 5 ½ months preparation until startup
  • Most time spent on making p14 version of reco
    sufficiently robust to proceed
  • Significant changes in reco capabilities lead to
    rethinking entire processing chain and priorities
  • Began reprocessing 15 Nov 2003, finished 5 Jan
    2004
  • 100M events processed at remote sites
  • 25TB data transferred
  • Final merging, storage of TMBs done at FNAL to
    reduce load on remote sites, processing done from
    DSTs (no Db access)
  • ½ FTE required at each remote site for duration
    of processing phase

9
Reprocessing Status
  • Reprocessing, Round II
  • Scheduled to begin Dec 2004
  • Scale of task 10x first round of reprocessing
  • 1000M events (100M first round)
  • 250 TB data (25 TB first round)
  • Remote sites will do merging and final storage to
    SAM
  • Processing from raw data rather than DST (DB
    access required, data unfiltered)
  • Expect to require 6-8 months for completion
  • Expect to use samGRID at all sites for Round II
  • Biggest problem likely to be manpower
  • 1 FTE/site for 6-8 months is significant burden

10
Production Status
  • Conclusions
  • Local production going well, but will have
    difficulties next year as luminosity increases
  • Monte Carlo production in good shape but subject
    to non-uniform demand
  • First round of remote reprocessing was major
    success, second round to start soon. Manpower
    demands will be significant issue
Write a Comment
User Comments (0)
About PowerShow.com