BaBar and the GRID - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

BaBar and the GRID

Description:

Roger Barlow for Fergus Wilson. GridPP 13. 5th July 2005, Durham. 5th July 2005, Durham ... Fergus Wilson. 4. BaBar Computing Model Monte Carlo ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 15
Provided by: fergus9
Category:
Tags: babar | grid | roger | wilson

less

Transcript and Presenter's Notes

Title: BaBar and the GRID


1
BaBar and the GRID
  • Roger Barlow for Fergus Wilson
  • GridPP 13
  • 5th July 2005, Durham

2
Outline
  • Personnel.
  • Current BaBar Computing Model
  • Monte Carlo
  • Data Reconstruction
  • User Analyses
  • Projections of required resources.
  • BaBar GRID effort and planning.
  • Monte Carlo
  • User Analysis

3
BaBar GRID Personnel (2.5 FTEs)
Roger Barlow Manchester BaBar GRID PI
Fergus Wulson RAL
We do not have an infinite number of monkeys
our goals are therefore constrained
James Werner Manchester GridPP funded
Giuliano Castelli RAL GridPP funded
Chris Brew RAL 50 GRID
4
BaBar Computing Model Monte Carlo
  • Monte Carlo is generated at 25 sites around the
    world.
  • Database driven production.
  • 20KBytes per event.
  • 10 seconds per event.
  • 2.8 billion events generated last year.
  • 99.5 efficient.
  • Need 100-150 million events per week.
  • MC datasets (ROOT files) are merged and sent to
    SLAC.
  • MC datasets are distributed from SLAC to any Tier
    1/2/3 that wants them.

5
BaBar Computing Model - Data
  • 10 Mbytes/sec to tape at SLAC.
  • Reconstructed at Padova (1.5 fb-1/day).
  • Skimmed into datasets at Karlsruhe.
  • Skimmed datasets (ROOT files) sent to SLAC.
  • Datasets are distributed from SLAC to any Tier
    1/2/3 that wants them.
  • An analysis can be run on a laptop.

6
BaBar Computing Model User Analysis
  • Location of datasets provided by mySQL/Oracle
    database.
  • Data/Monte Carlo datasets accessed via Xrootd
    file server (load-balancing, fault-tolerant, disk
    or tape interface).
  • Conditions accessed from proprietary Objectivity
    database.

User Code
mySQL
Tier 1/2/3
Xrootd
Objectivity
Files
Files
Files
Files
7
Current Status at RAL Tier 1
  • RAL imports data and Monte Carlo every night.
  • RAL has the full data and Monte Carlo for 4 out
    15 of the Analysis Working Group.
  • All disk and tape are full.
  • Importing has stopped.
  • We will have to delete our backups of the data.
  • Moving to a disk/tape staging system but unlikely
    to keep up with demand.
  • CPU underused at the moment.

8
BaBar Projections
  • Bottom-up planning driven by luminosity
  • Double dataset by 2006 (500 fb-1)
  • Quadruple dataset by 2008 (1000 fb-1)

9
BaBar Monte Carlo on the GRID
  • We have already produced 30 million Monte Carlo
    events on the GRID at Bristol/RAL/Manchester/RHUL
    (2004 using globus).
  • Now using LCG at RAL
  • Software is installed via an RPM at sites
    (provided by BaBar Italian GRID groups).
  • Job submission/control from RAL.
  • 1.2 million events per week during June 2005.
  • This is 7.5 of BaBar weekly production (during a
    slow period).
  • Will aim to soak up 25 of our Tier 1 allocation
    with SP as requested by GridPP. Should do 3-6
    million per week at RAL.

10
BaBar Monte Carlo on the GRID Tier 2
  • We are merging the QMUL, Birmingham and Bristol
    BaBar farms
  • 240 slow (866MHz) cpus.
  • We will setup regional Objectivity servers that
    can be accessed over WAN. This means Objectivity
    is not needed at every Tier site.
  • We need a large stable Tier 2 if we are to roll
    this out beyond RAL. We dont have the manpower
    to develop the MC and manage lots of small sites.

11
BaBar GRID Data Analysis
  • We now have a standard generic initialisation
    script for all GRID sites.
  • Sets up BaBar environment.
  • Sets up xrootd/objectivity.
  • Identifies what software releases are available.
  • Identifies what conditions are available.
  • Identifies what collections of datasets are
    available.
  • Identifies if site is setup and/or validated for
    Monte Carlo production.

12
BaBar GRID Data Analysis
  • Prototype Job Submission System (EasyGrid)
  • interfaces to mySQL database to identify
    required datasets and allocates them to jobs.
  • Submits jobs
  • Resubmits jobs when they fail.
  • Resubmits jobs when they fail again.
  • Monitors progress.
  • Retrieves output (usually root files).
  • Have analysed 60 million events this way with
    jobs submitted from Manchester to RAL.

13
BaBar GRID Data Analysis
  • The Data Analysis works if you know that the data
    exists at a particular site.
  • Datasets are not static
  • MC always being generated.
  • Billions of events.
  • Millions of files.
  • Thousands (currently 36000) collections of
    datasets (arranged by processing release and
    physics process).
  • The challenge will be to
  • Interrogate sites about their available data.
  • Allocate jobs according to available data and
    site resources.
  • Monitor it all.
  • First Step
  • Shortly the local mySQL database that identifies
    the locally available datasets will also know
    about the availability of datasets at every other
    site. Can then form the backend of an RLS.

14
Conclusion
  • We are already doing Monte Carlo production on
    the GRID.
  • We have met all our deliverables.
  • We will start major production at RAL.
  • We need some large Tier 2 sites if this is to go
    anywhere in the UK.
  • We are already doing Data Analysis on the GRID.
  • We have met all our deliverables.
  • Concentrate on sites with BaBar infrastructure
    and local datasets.
  • Provide WAN-accessible servers.
  • We have a prototype data analysis GRID interface.
  • Still many GRID issues to be tackled before
    allowing normal people near it.
  • BUTthe GRID still has prove it can provide a
    production quality service on the time scale of
    running experiments.
Write a Comment
User Comments (0)
About PowerShow.com