Title: Spring 2002 CMS Monte Carlo Production: What ? How ? What Next ?
1Spring 2002 CMS Monte Carlo Production What ?
How ? What Next ?
- Véronique Lefébure (CERN-HIP) CERN-IT
SeminarThe 25th of September 2002
2What ? Physics Applications Production steps Data products
Resource Constraints CPU, RAM Persistency
How Much Data Number of events, TB of data Delivery deadline
How ? World-Wide Distributed Production Where, who Coordination
Production Tools RefDB,IMPALA,DAR,BOSS Data Transfer Data Storage Data Validation
Success and Difficulties
What next ? Possible Improvements Coming Major Production 2004 Data Challenge
3Introduction CMS
- On-line System
- Multi-level trigger
- Filter out background
- Reduce data volume
40 MHz (1000 TB/sec)
Level 1 Trigger
75 KHz (50 GB/sec)
Level 2 Trigger
5 KHz (5 GB/sec)
Level 3 Trigger
100 Hz (100 MB/sec)
Data Recording Offline Analysis
4Data Simulation Needs
- Spring 2002 Production for the CMS Physics
Community - need a large amount of simulated data in order to
prepare the CMS DAQ TDR document Data
Acquisition Technical Design Report due for end
of 2002 - need the most up-to-date physics software to be
used - need the data before June 2002 CMS week
5Monte Carlo Production Steps
- The full Production Chain consists of 4 steps
- 3 Logical Monte Carlo Simulation Steps
- Generation
- Simulation
- Digitisation
- 1 Reconstruction and Analysis Step
- Production was performed step by step for many
different p-p physics channel
RAW data as produced by the real detector Stored
in Objectivity/DB
6Monte Carlo Production Steps1) Generation
Primary interactionsin vacuum of beam-pipe
- Generation of one p-p interaction at a time
- for a Selected physics channel
- In reality 4 or 20 interactions per
beam-crossing depending on the beam luminosity - (2.1033 or 1034 cm-2 s-1) i.e. interactions are
superimposed pile-up
p
7Monte Carlo Production Steps2) Simulation
Secondary interactionsin detector material and
magnetic field
- Individual Hits
- Crossing points
- Energy deposition
- Time of flight
- In reality one beam-crossing every 25 nsltlt
time of flight and electric signal development - i.e. superimposition of signals from particles
from different beam-crossings pile-up
8Monte Carlo Production Steps3) Digitisation
Response of Sensitive detector elements, taking
into account the two sourcesof Pile-Up
- 4 or 20 interactionsper beam-crossings
- Beam-crossings -5,3
- For 1 Signal p-p event of 1 MB
- We have 70 MB of Pile-up events _at_1034 cm-2 s-1
9Monte Carlo Production Steps4) Reconstruction
and Analysis
Higher level physicsReconstruction
andHistograming
- Level-1 trigger Filtering
- Track, clusters, vertices Reconstruction
- First-pass physics Analysis
- Histograming
10Physics Applications
Application Input Output (for jobs of 500 events)
CMKIN/PYTHIA (ISAJET,COMPHEP) Fortran77 Very fast 5 sec/event ascii file PAW ntuple (size 30 MB)
CMSIM/GEANT3 Fortran77 Very slow 1 to gt10 min/event ascii file Geometry and magnetic field ZEBRA file (size14 MB) PAW ntuple ZEBRA file (size 0.5 GB)
ORCA-COBRA C Object-Oriented ooHit Formatting very fast (I/O) 1034 PU (200 PU events)1 min/event Executable size lt200 MB Multi-threaded ascii file Geometry and magnetic field ZEBRA file ZEBRA file Objectivity/DB data metadata ooHit files (size 0.5 GB)
ORCA-COBRA C Object-Oriented ooHit Formatting very fast (I/O) 1034 PU (200 PU events)1 min/event Executable size lt200 MB Multi-threaded ascii file Objectivity/DB data metadata ooHit files for Signal and Pile-up events Objectivity/DB data metadata Digis files (size 2 GB)
ORCA-COBRA C Object-Oriented ooHit Formatting very fast (I/O) 1034 PU (200 PU events)1 min/event Executable size lt200 MB Multi-threaded ascii file Objectivity/DB data metadata ooHit Digi files for Signal and Pile-up events Objectivity/DB data metadata filesor PAW ntuple or ROOT files
Generation
p
Simulation
ooHit formatting
Digitisation
Reconstructionand Analysis
11More Production Steps
- Filtering (Level-1 trigger, )
- Add digits (eg. First calorimeter digits, then
Tracker after filtering) - Cloning of ooHits and/or Digis (smaller
collection of data to handle, less staging at
analysis time) - Re-digitisation with different algorithms or
parameters
12Resource Constraints
- Long CMSIM jobs can take 2 days and more
- RAM gt 512 MB for dual processors (ORCA)
- Redhat 6.1(.1) for Objectivity/DB license
- Data server
- 80 GB of Pile-Up events (re-used, otherwise
300TB!) - Typically 1 server per 12 CPUs
- Disk space size of one typical dataset _at_ 1034
50K events (1MB fz 1MB oohits 4 MB
digis)/event 300 GB - Lockserver, AMS server number of file handles
may reach 3000
13Job Complexity
- Generation and Simulation jobs easy part
- ORCA-COBRA jobs more tricky
- Closely-coupled jobs
- Shared federation/lockserver, output server, AMS
- 5 jobs write in parallel to 1 DB
- 1 job may populate many DBs (10)
- One stale lock can bring everything to a halt
- Massive I/O system _at_ 1034
- 100 jobs in parallel
- Input 70 MB pile-up events per 1 MB signal
event, 1 event/minute
1MB/sec/job - Output 4 MB/minute/event/job
- Not yet fully robust physics software need to
recover from crashes and to spot infinite loops
14How Much Data ?
- Generation/Simulation
- 4 months
- 6 M events 150 Physics channels
- ORCA production
- 2 months
- 19000 files 500 Collections 20 TB
- NoPU 2.5M, 2x1033PU4.4M, 1034PU 3.8M,
filter 2.9M - 300 TB of pile-up movement on the LAN
- 100 000 jobs, 45 years CPU (wall-clock)
- More than 10 TB traveled on the WAN
- Production completed just on time
Successful Production at a regular global rate !
15CMSIM
6 million events
1.2 seconds per event for 4 months
Feb. 8th
June 6th
162x1033PU
4 million events
1.2 seconds per event, 2 months
April 12th
June 6th
171034PU
3.5 million events
1.4 seconds per event, 2 months
June 6th
April 10th
18Physics Results
Data is used for physics studies, not only for
computing performance studies
19How ?
- Production
- Distribution
- Coordination
- Production Tool Suite
- Success and Difficulties
20World-wide Distributed Production
CMS Production Regional Centre
21World-wide Distributed Production
- 11 Regional Centres (RC)gt 20 sites in USA,
Europe, Russia 1000 CPUsBristol/RAL (UK),
Caltech, CERN, Fermilab, Imperial College (UK),
IN2P3-Lyon, INFN (Bari, Catania, Bologna,
Firenze, Legnaro, Padova, Perugia, Pisa, Roma,
Torino), Moscow (ITEP, JINR, SINP MSU, IHEP),
UCSD(San Diego), UFL (Florida), Wisconsin Note
Still more sites joining (RICE, Korea, Karlsruhe,
Pakistan, Spain,Greece, )
- gt 30 Production OperatorsMaria Damato,
Alessandra Fanfani, Daniele Bonacorsi, Catherine
MacKay, Dave Newbold, Suresh Singh, Vladimir
Litvine, Salavatore Costa, Julia Andreeva, Tony
Wildish, Veronique Lefebure, Greg Graham, Shafqat
Aziz, Nicolo Magini, Olga Kodolova, David
Colling, Philip Lewis, Claude Charlot, Philippe
Mine, Giovanni Organtini, Nicola Amapane, Victor
Kolosov, Elena Tikhonenko, Massimo Biasotto,
Stefano Lacaprara, Alexander Kryukov, Nikolai
Kruglov, Leonello Servoli, Livio Fano, Simone
Gennai, Ian Fisk, Dimitri Bourilkov, Jorge
Rodriguez, Pamela Chimney, Shridara Dasu, Iyer
Radhakrishna, Wesley Smith,plus probably many
more persons in the shadow !
- gt 20 Physicists as Production Requestors
22Coordination Issues
- Physicists side
- Handle four Physics groups
- Check uniqueness of requests
- Check number of requested events is reasonable
- Take care of requests priorities
- Producers side
- Deploy and support production tools
- Distribute physics executables
- Distribute adequately requests to RCs
- Insure uniqueness of produced data
- Track progress of data production and transfer
23Coordination Means
- Physicists side
- 1 Coordinator per Physics group
- 1 Coordinator for the 4 Physics groups
- Meetings
- Use of MySQL CMS DB for recording and managing
the production requests (RefDB) - Producers side
- 1 Production Manager
- 1 Production Coordinator in contact with the
Physics Coordinators - 1 or 2 Contact Persons per Regional Centre
- Meetings and mailing list
- Use of MySQL CMS DB for assigning production
requests to Regional Centres and progress
tracking (RefDB) - Pre-allocation of run numbers, random seeds,
DBIDs - Automatic file naming provided by RefDB
24RefDB Central Reference Database
- Production Requests
- Submission forms for each production step
- List of recorded Requests
- Modification/Correction of submitted Requests
- Production Assignments
- Selection of a set of Requests for Assignment to
an RC - Re-assignment of a Request to another RC or
production site - List and Status of Assignments
25RefDB Central Reference Database
- Meta Data catalogue
- Browse Datasets according to
- Physics Channel
- Software Version
-
- Get Production Status
- Get Data Location
- Get Input Parameters
26How ?
- Production
- Distribution
- Coordination
- Production Tool Suite
- Success and Difficulties
27Production Tools Spring02 Components
IMPALA
Job Scripts Generator
Central Input Parameters DB
Monitoring Schema Scripts
RefDB
BOSS
Local Job Monitoring DB
Central Output Metadata DB
Job Scheduler
28DARDistribution After Release
- CMS software distribution tool
- allows to create and install the binaries
- Distribution tar files published at FNAL and at
CERN - Local installation dar -i Distribution_Tar_File
Installation_Directory - Used for distribution of ALL physics executables
and Geometry file
29BOSSBatch Object Submission System
- tool for job monitoring and book-keeping
developed by CMS - not a job scheduler, but can be interfaced with
any scheduler - LSF (CERN, INFN)
- PBS (Bristol, Caltech, UFL, Imperial College,
INFN) - FBSNG (Fermilab)
- Condor (INFN, Wisconsin)
- Uses a database (MySQL)
30BOSS
- User registers a scheduler
- Scripts for job submission, deletion and query
(DB blobs) - User registers a job type
- Schema for the information to be monitored (new
DB table) - Algorithms to retrieve the information from the
job (DB blobs) - User submits jobs of a defined type
- A new entry is created for the job in the BOSS
database tables - The running job fetches the user monitoring
programs and updates the BOSS database
31BOSS
32BOSS for Spring02 Production
BOSS Job Type Registration components
Job Type Table
cmkin.schema , preprocess, runtimeprocess ,
postprocess
KIN
Generation
cmsim.schema , preprocess, runtimeprocess ,
postprocess
SIM
Simulation
oohit.schema , preprocess, runtimeprocess ,
postprocess
OOHit
OOHit
oodigi.schema , preprocess, runtimeprocess ,
postprocess
Digitisation
OODigi
33From BOSS to RefDB Summary scripts
- Updating RefDB with current status of assignment
progress - Book-keeping of the monitored values
- Checking of uniqueness of generation and
simulation run numbers and random seeds - Warning for duplicate runs
- Warning for missing or incomplete runs
34Data Validation Scripts
- After storage of the data Final Validation at
the Meta Data level - Basically, checks that warnings given by the
summary scripts have been corrected - Correct number of events
- No duplicates
- Closure of DB files (COBRA sense of it no more
data will be written to that DB file) - All DB files of a Collection are attached to the
Federation
35IMPALA
Job Scripts Generator
Central Input Parameters DB
Monitoring Schema Scripts
RefDB
BOSS
Local Job Monitoring DB
Central Output Metadata DB
Job Scheduler
36IMPALAIntelligent Monte Carlo Production Local
Actuator
- Automated script generation tool developed by CMS
for MC Production - Job splitting 50 000 events 100 jobs of 500
events - Interfaces defined for
- Parameter Handling
- Input source discovery and enumeration
- Tracking (declared, created, submitted,
running, done, problems, logs) - Job Submission
37IMPALA
IMPALA Tracking/Production files
IMPALA Tracking/Batch files
38IMPALA Configuration
- Executable location (DAR file)
- Output data location (Boot file for the
Objectivity/DB federation, output disk, ) - BOSS (or Scheduler) installation location
- Local functions (CopyLogFiles, StageIn,
StageOut, )
39Data Transfer(Tonys scripts)
- Transfer tool developed by CMS Tonys scripts
- For CERN/Europe
- Many US sites use GDMP (Grid) and globus-url-copy
- Simple HTTP server publishes list of files
- Files on disk (find) or on tape (flat list)
- Client searches list for new files
- Compares to list of files already retrieved,
selects by pattern-matching (to select datasets) - Client asks server to push n files
- DBServer pushes files in m parallel streams
- using designated copy agent scp, bbcp, rfcp
40Spring02 Transferred Data
- To CERN 3 or 4 exporters in parallel, 7 TB in
total - To FNAL 5TB
- Sustained rate network-to-disk higher than
sustained rate disk-to-tape
From To Rate
Bristol, RAL, IC,IN2P3,INFN, Caltech,FNAL,UFL,Wisconsin Moscow CERN 200GB/day (disk 150GB limit) Slow
Caltech, UFL, Wisconsin UCSD Moscow FNAL 1 TB/day
Bristol RAL 1 TB/day
INFN INFN 300 GB/day
41Data Storage
- CASTOR (CERN)
- ENSTORE (Fermilab)
- Basic tape system (RAL)
42Success and Difficulties
- Coordination
- Farm Setup
- Running Jobs
- Data Transfer
- Data Storage and Publication
43Success and Difficulties Coordination
- Use of a Central Reference DB RefDB
- Uniform format of input parameter files
NEW - Storage and index of parameter files
NEW - Automatic retrieval of the parameters by IMPALA
NEW - Tracking of the global CMS production rate
NEW - Test-assignments for validation of software
installation NEW - Where GRID tools can help us
- Assignment of Requests to RCs is still done by
hand - Need of a CMS-wide Resource Monitoring System
- Update of RefDB has to be done by hand
- Should be automated and incorporated in the Job
Monitoring System
44Success and Difficulties Farm Setup
- We have a Production Tool Suite
NEW - But a lot to learn the first time
- At system level (MySQL, Disk servers
configuration for Pile-Up, AMS Lockserver , ) - At the software level (test-assignments to play)
- Heavy support task rapidly evolving production
software new releases, bug fixes (but excellent
team spirit) - Different Farm configurations not possible to
test the tools for all - (Different job schedulers, MSS or not,
distributed or central disks, shared or dedicated
CPUs, firewalls or no, data servers on CPU nodes
or not,) - Where GRID tools can help us
- installation in one command toolkit
45Success and Difficulties Running Jobs
- ORCA Digitisation Job Resume System
NEW - Highly helpful (10 of failure, jobs can now be
easily resumed) - Still need more robustness in the user analysis
part of ORCA - Invalidation of bad runs to be automated
- Objectivity/DB readonly option
NEW - Much less locking problems than before
- System problem recovery
- Cleaning of stale Objectivity/DB locks
- 2GB file size limit to be controlled on Solaris
disk (CERN) - Network failure (no more disk failure)
- Disk space
- Scaling problems in the way we use BOSS
- Where GRID tools can help us
- Farm Monitoring System , with discovery of crash
reason and action for recovering
46Success and Difficulties Data Transfer
- We have transfer tools Tonys scripts and GDMP
- Much more data movement than before over
half the data has traveled on the WAN - still problems to be handled by hand
- Transfer interrupted (time limit)
- Data corruption
- Disk space limitation
- Missing files Datasets spread over up to 500
files for one collection (typically 100 files)
but we must have every file before analysis can
start safely - Where GRID tools can help us
- Replica Manager
47Success and Difficulties Storage Publication
- Validation scripts for Dataset integrity check
NEW - Should be part of the data transfer tool
- Tape failures (RAL)
- Archive failure in Castor rare but difficult to
spot - Stage in time to Castor can be very long for few
files (gt1hour) - Interaction between Castor and (multiple)
analyses not well understood ? needs studying
48Success and Difficulties Summary
- Major improvements in the physics code and in the
production machinery with respect to previous
years - ORCA Resume System
- Use of RefDB and BOSS made better automation
and book-keeping possible - Our CMS production tools can be improved more
automation - GRID tools may help to have it even better
- Tool for Installation/Configuration of Production
Tools - Resource Monitoring System
- Replica Manager
- Anything that can help reducing the manpower
needs - Data access for user analysis has to be improved
- Problems have been addressed by the Production
team and the Production Tools Review team
49More and Faster
- 1999 1TB 1 month 1
person - 2000-2001 27 TB 12 months 30 persons
- 2002 20 TB 2 months 30 persons
- 2003 175 TB 6 months lt30 persons
50Coming Data Challenge
- 2004 Data ChallengeDC04
- Analysis of data produced by25 LHC startup
luminosity (2.1033 cm-2 s-1) _at_ a data-taking
rate of 25Hz during 1 month 5. 107 events - 5 LHC final luminosity (1034 cm-2 s-1)
- To validate the software baseline
- new LCG persistency framework (POOL,ROOT)
- new simulation software (OSCAR/Geant4)
- new GRID tools and resources
- 2003 pre-challenge production of the 5. 107
events _at_ 2.1033 cm-2 s-1
51Two Phases
- Pre-Challenge (2003, Q3,Q4) (Must
be successful) - Large scale simulation and digitization
- Will prepare the samples for the challenge
- Will prepare the samples for the Physics TDR
- Progressive shakedown of tools and centers
- All centers taking part in challenge should
participate to pre-challenge - The Physics TDR and the Challenge depend on
successful completion - Challenge (2004,Q1,Q2)
(May fail,
i.e. not be completed on
schedule) - Reconstruction at T0 (CERN)
- Distribution to T1s
- Subsequent distribution to T2s
52Pre-challenge Resource Needs
- Simulation 100 TB 5 months 1000 CPUs
- Digitisation 75 TB 2 months 150 CPUs
- 800MHz P3 is 33 SI95
- Working assumption that most farms will be at
50SI95/CPU in late 2003
Challenge Resource Needs
- Reconstruction 25 TB 1 month 460 CPUs
- at CERN
_at_ 50SI95/CPU - World-wide distributed analysis
53Summary and Conclusions
- Very successful MC Production
- 20 TB of data delivered on time to the Physicists
- Smooth production over 4 months
- 20 production sites, 30 persons
- More automation for next Data Challenge
- Improvements of our CMS tools
- Expecting help from GRID tools
54More Information
- GRID/production Workshop (June 2002)
http//documents.cern.ch/age?a02826 - The Spring02 DAQ TDR Production CMS Note
CMS-IN 2002/034 - CMS MC Production web page
- RefDB, BOSS, IMPALA, DAR
- http//cmsdoc.cern.ch/cms/production/www/html/gene
ral/index.html
55Acknowledgements
- Thanks to the CERN-IT division for the invitation
to give this talk - Thanks to David Stickland and Tony Wildish to let
me present it - Thanks to the whole CMS Production Team for
achieving these nice results, and to everyone on
the CERN CASTOR, Tape, Objectivity, LSF, AFS and
CMS support lists !