Distributed Analysis in the BaBar Experiment - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed Analysis in the BaBar Experiment

Description:

To effectively analyse this enormous dataset, we need large computing facilities ... Distributing the analysis to other sites raises many additional research questions ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 24
Provided by: TimA96
Category:

less

Transcript and Presenter's Notes

Title: Distributed Analysis in the BaBar Experiment


1
Distributed Analysisin the BaBar Experiment
  • Tim Adye
  • Particle Physics Department
  • Rutherford Appleton Laboratory
  • University of Oxford
  • 11th November 2002

2
Talk Plan
  • Physics motivation
  • The BaBar Experiment
  • Distributed analysis and the Grid

3
Where did all the Antimatter Go?
  • Nature treats matter and antimatter almost
    identically
  • but the Universe is made up of just matter
  • How did this asymmetry arise?
  • The Standard Model of Particle Physics allows
    for a small matter-antimatter asymmetry in the
    laws of physics
  • Seen in some K0-meson decays
  • Eg. 0.3 asymmetry
  • This CP Violation in the Standard Model is not
    large enough to explain the cosmological
    matter-antimatter asymmetry on its own
  • Until recently, CP Violation had only been
    observed in K-decays
  • To understand more, we need examples from other
    systems

4
What BaBar is looking for
  • The Standard Model also predicts that we should
    be able to see the effect in B-meson decays
  • B-mesons can decay in 100s of different modes
  • In the decays
  • B0 ? J/Ã K0 and
  • B0 ? J/Ã K0
  • we look for differences in the time-dependent
    decay rate betweenB0 and anti-B0 (B0).

Asymmetry
s
s
5
First ResultsSummary of the summary
  • First results from BaBar (and rival experiment,
    Belle) confirm the Standard Model of Particle
    Physics
  • The observed CP Violation is too small to explain
    the cosmological matter-antimatter asymmetry
  • but there are many many more decay modes to
    examine
  • We are making more than 80 measurements with
    differentB-meson, charm, and -lepton
    decays.

6
Experimental Challenge
  • Individual decays of interest are only 1 in 104
    to 106B-meson decays
  • We are looking for a subtle effect in rare (and
    often difficult to identify) decays, so need to
    record the results of a large number of events.

7
The BaBar Collaboration
9 Countries 74 Institutions 566
Physicists
8
PEP-II ee- Ring at SLAC
Low Energy Ring (e, 3.1 GeV)
Linear Accelerator
High Energy Ring (e-, 9.0 GeV)
BaBar
PEP-II ring C2.2 km
9
The BaBar Detector
108 B0B0 decays recorded
26th May 1999 first events recorded by BaBar
10
  • To effectively analyse this enormous dataset, we
    need large computing facilities more than can
    be provided at SLAC alone
  • Distributing the analysis to other sites raises
    many additional research questions
  • Computing facilities
  • Efficient data selection and processing
  • Data distribution
  • Running analysis jobs at many sites
  • Most of this development either has, or will,
    benefit from Grid technologies

11
Distributed computing infrastructure
1. Facilities
  • Distributed model originally partly motivated by
    slow networks
  • Now use fast networks to make full use of
    hardware (especially CPU and disk) at many sites
  • Currently specialisation at different sites
    concentrates expertise
  • eg. RAL is primary repository of analysis data in
    the ROOT format

Tier A
Lyon
Padua
RAL
Tier C 20 Universities,
9 in UK
12
1. Facilities
RAL Tier ADisk and CPU
13
RAL Tier A
1. Facilities
  • RAL has now relieved SLAC of most analysis
  • BaBar analysis environment tries to mimic SLAC so
    external users feel at home
  • Grid job submission should greatly simplify this
    requirement
  • Impressive takeup from UK and non-UK users

14
BaBar RAL Batch Users(running at least one
non-trivial job each week)
1. Facilities
A total of 153 new BaBar users registered since
December
15
BaBar RAL Batch CPU Use
1. Facilities
16
Data Processing
2. Data Processing
  • Full data sample (real and simulated data) in all
    formats is currently 700 TB.
  • Fortunately processed analysis data is only 20
    TB.
  • Still too much too store at most smaller sites
  • Many separate analyses looking at different
    particle decay modes
  • Most analyses only require access to a sub-sample
    of the data
  • Typically 1-10 of the total
  • Cannot afford for all the people to access all
    the data all the time
  • Overload the CPU or disk servers
  • Currently specify 104 standard selections
    (skims) with more efficient access

17
Strategies for Accessing Skims
2. Data Processing
  • Store an Event tag with each event to allow fast
    selection based on standard criteria
  • Still have to read past events that arent
    selected
  • Cannot distribute selected sub-samples to Tier C
    sites
  • Index files provide direct access to selected
    events in the full dataset
  • File, disk, and network buffering still leaves
    significant overhead
  • Data distribution possible, but complicated
  • therefore only just starting to use this
  • Copy some selected events into separate files
  • Fastest access and easy distribution, but uses
    more disk space a critical trade-off
  • Currently this gives us a factor 4 overhead in
    disk space
  • We will reduce this when index files are deployed

18
Physics Data Selection(metadata)
2. Data Processing
  • Currently have about a million ROOT files in a
    deep directory tree
  • Need a catalogue to facilitate data distribution
    and allow analysis datasets to be defined.
  • SQL database
  • Locates ROOT files associated with each dataset
  • File selection based on decay mode, beam energy,
    etc.
  • Each site has its own database
  • Includes a copy of SLAC database with local
    information (eg. files on local disk, files to
    import, local tape backups)

19
Data Distribution
3. Data Distribution
  • Tier A analysis sites currently take all the data
  • Requires large disks, fast networks, and
    specialised transfer tools
  • FTP does not make good use of fast wide-area
    networks
  • Data imports fully automated
  • Tier C sites only take some decay modes
  • We have developed a sophisticated scheme to
    import data to Tier A and C sites based on SQL
    database selections
  • Can involve skimming data files to extract events
    from a single decay mode. This is done
    automatically as an integral part of the import
    procedure

20
Remote Job SubmissionWhy?
4. Job Submission
  • The traditional model of distributed computing
    relies on people logging into each computing
    centre, building, and submitting jobs from there.
  • Each user has to have an account at each site and
    write or copy their analysis code to that
    facility
  • Fine for one site, maybe two. Any more is a
    nightmare for site managers (user registration
    and support) and users (set everything up from
    scratch)

21
Remote Job Submission
4. Job Submission
  • A better model would be to allow everyone to
    submit jobs to different Tier A sites directly
    from their home university, or even laptop
  • Simplifies local analysis code development and
    debugging, while providing access to full dataset
    and large CPU farms
  • This is a classic Grid application
  • This requires significant infrastructure
  • Authentication and authorisation
  • Standardise job submission environment
  • Grid software versions, batch submission
    interfaces
  • The program and configuration for each job has to
    be sent to the executing site and results
    returned at the end.
  • We are just now starting to use this for real
    analysis jobs

22
The Wider Grid
  • We are already using many of the systems being
    developed for the European and US DataGrids.
  • Globus, EDG job submission, CA, VO, RB,
    high-throughput FTP, SRB
  • Investigating the use of many more
  • RLS, Spitfire, R-GMA, VOMS,
  • We are collaborating with other experiments
  • BaBar is a member of EDG WP8 and PPDG (European
    and US particle physics Grid applications groups)
  • We are providing some of the first Grid
    technology use-cases

23
Summary
  • BaBar is using B decays to measure
    matter-antimatter asymmetries and perhaps explain
    why the universe is matter dominated.
  • Without distributing the data and computing, we
    could not meet the computing requirements of this
    high-luminosity machine.
  • Our initial ad-hoc architecture is evolving
    towards a more automated system borrowing
    ideas, technologies, and resources from, and
    providing ideas and experience for, the Grid.
Write a Comment
User Comments (0)
About PowerShow.com