Title: BaBar and D
1BaBar and DØ Experiment Reports
- DOE Review of PPDG
- January 28-29, 2003
- Lee Lueking
- Fermilab Computing Division
- D0 liaison to PPDG
2BaBar Introduction DØ
- BaBar's PPDG effort concentrating on
- Data Distribution on the Grid (SRB, Bdbserver).
- Job submission on the Grid (EDG,LCG).
- People involved
- Tim Adye (RAL)
- Andy Hanushevsky (SLAC)
- Adil Hasan (SLAC)
- Wilko Kroeger (SLAC).
- Interactions with other Grid efforts that are
part of BaBar - GridPP (UK), EDG (Europe through Dominique
Boutigny), GridKA, Italian Grid groups etc. - BaBar Grid applications are being designed to be
data-format neutral - BaBar's new computing model should have little
impact on the apps.
- DØs PPDG effort concentrating on
- Data Distribution on the Grid (SAM).
- Job submission on the Grid (JIM w/Condor-G and
Globus). - People involved
- Igor Terekhov (FNAL JIM Team Lead)
- Gabriele Garzoglio (FNAL)
- Andrew Baranovski (FNAL)
- Parag Mhashilkar Vijay Murthi (via Contr. w/
UTA CSE) - Lee Lueking (FNAL D0 Liaison to PPDG)
- Interactions with other Grid efforts that are
part of D0 - GridPP (UK), GridKA (DE), NIKHEF (NL), CCIN2P3
(FR) - Very closely working with the Condor team to
achieve - Grid Job Resource Matchmaking service
- Other robustness and usability features
3Overview of BaBar and DØ Data Handling
- Both experiments have extensive distributed
computing and data handling systems - Significant amounts of data are processed at
remote sites in the US and Europe
DØ Integrated Data Consumed Mar02 to Mar03
BaBar Analysis Jobs (SLAC) Apr'02 to Mar'03
BaBar Deployment
1.2 PB
140k Jobs
DØ Integrated Files Consumed Mar02 to Mar03
BaBar Database Growth (TB) Jan'02 to
Dec'02
DØ SAM Deployment
730 TB
4.0 M Files
Mar2002
Mar2003
4BaBar Bulk Data Distribution SRB
- Storage Resource Broker (SRB) from SDSC being
used to test out data distribution from Tier A to
Tier A with view to production this summer. - So far have had 2 successful demos at Super
Computing 2001 (SLAC-gtSLAC), 2002
(SLAC-gtccin2p3). - Have been testing SRB V2 (released Feb 2003), new
features Bulk registering in RDBMS, parallel
stream file replication. - Busy incorperating newly designed BaBar metadata
tables to SRB's RDBMS tables. Looking to improve
file replication performance (playing with
streams, etc).
5BaBar User-driven data distribution BdbServer
- Attempts to address use-case user wants to copy
a collection of sparse events with little space
overhead (mainly Tier A to Tier C). - BdbServer essentially a set of scripts that
- Submit a job to the Grid to make a deep-copy of
the sparse collection (ie copy objects for events
of interest only). - Then copy the files back to user's institution
through Grid (can use globus-url-copy). - Poster at CHEP2003
- Currently have tested Deep-copy through the grid
using EDG and pure Globus. Just completed test of
extracting data using globus-url-copy (pure
Globus request). - To do incorperate with BaBar bookeeping.
Robustness, reliability tests, production-level
scripts for submission, copying.
6BaBar Job Submission on the Grid
- Many production-like activities could take
advantage of using compute resources at more than
one site. - Analysis Production ccin2p3 (France), UK, SLAC
using EDG installations. - Simulation Production Ferrara (Italy) Grid
Group, Ohio using EDG and VDT installations. - Also very useful for data distribution
(BdbServer), ccin2p3 (France), SLAC.
Proposed BaBar Grid Architecture
7BaBar Job Submission on the Grid
- There was a CHEP 2003 talk and Poster, a grid
demo set up in UK (run BaBar jobs on UK grid) and
have managed to run Simulation Production and
data distribution tests on Grid. - Plan test new EDG2/LCG installations, increase
users as releases stabilize. - BbgUtils.pl perl script to allow easier
client-side installation of Globus CA's
(currently works for Sun, Linux). - Script copies all tar files and signing-policies
etc necessary for client installation for that
expt. - Can be readily extended to include SRB
client-side installation, EDG/LCG client side
installation, etc.
8DØ Objectives of SAMGrid
- Bring standard grid technologies (including
Globus and Condor) to the Run II experiments. - Enable globally distributed computing for DØ and
CDF. - JIM (Job and Information Management) complements
SAM by adding job management and monitoring to
data handling. - Together, JIM SAM SAMGrid
9JIM Job Management
User Interface
User Interface
Submission Client
Submission Client
Match Making Service
Match Making Service
Broker
Queuing System
Queuing System
Information Collector
Information Collector
JOB
Data Handling System
Data Handling System
Data Handling System
Data Handling System
Execution Site 1
Execution Site n
Computing Element
Computing Element
Computing Element
Storage Element
Storage Element
Storage Element
Storage Element
Storage Element
Grid Sensors
Grid Sensors
Grid Sensors
Grid Sensors
Computing Element
10DØ JIM Deployment
- A site can join SAM-Grid with combos of services
- Monitoring, and/or
- Execution, and/or
- Submission
- May 2003 Expect 5 initial execution sites for
SAMGrid deployment, and 20 submission sites. - GrkdKa (Karlsruhe) Analysis site
- Imperial College and Lancaster MC sites
- U. Michigan (NPACI) Reconstruction center.
- FNAL - CLueD0 as a submission site.
- Summer 2003 Continue to add execution and
submission sites. Second round of execution site
deployments include Lyon (ccin2p3), Manchester,
MSU, Princeton, UTA, FNAL CAB system. - Hope to grow to dozens execution and hundreds of
submission sites over next year(s). - Use grid middleware for job submission within a
site too! - Administrators will have general ways of
managing resources. - Users will use common tools for submitting and
monitoring jobs everywhere.
11Whats Next for SAMGrid?After JIM version 1
- Improve scheduling jobs and decision making.
- Improved monitoring, more comprehensive, easier
to navigate. - Execution of structured jobs
- Simplifying packaging and deployment. Extend the
configuration and advertising features of the
uniform framework built for JIM that employs XML. - CDF is adopting SAM and SAMGrid for their Data
Handling and Job Submission. CDF also has asked
to join PPDG. - Interoperability, interoperability,
interoperability - Working with EDG and LCG to move in common
directions - Moving to Web services, Globus V3, and all the
good things OGSA will provide. In particular,
interoperability by expressing SAM and JIM as a
collection of services, and mixing and matching
with other Grids
12Challenges
- Meeting the challenges of real data handling and
job submission BaBar and DØ have confronted
real-life issues, including
- Troubleshooting is an important and time
consuming activity in distributed computing
environments, and many tools are needed to do
this effectively. - Operating these distributed systems on a 24/7
basis involves coordination, training, and
worldwide effort. - Standard middleware is still hard to use, and
requires significant integration, testing, and
debugging.
13(No Transcript)
14PPDG Benefits to BaBar and DØ
- PPDG has provided very useful collaboration with,
and feedback to, other Grid and Computer Science
Groups. - Development of tools and middleware that should
be of general interest to the Grid community,
e.g. - BbgUtils.pl
- Condor-G enhancements
- Deploying and testing grid middleware under
battlefield conditions of operational experiments
hardens the software and helps CS learn what is
needed. - The CS groups enable the experiments to examine
problems in new, innovative ways, and provide
important new technologies for solving them.
15The End