Title: An ODBMS approach to persistency in CMS
1An ODBMS approach to persistency in CMS
- Lucia Silvestris
- INFN Bari - CERN/EP
- CHEP 7 - 11 February 2000
- Padova Italy
2CMS - Software Components
Request asynchronous data Environment Data
Slow Control
Online Monitoring
CMS Detector (Muon, Tracker, Calo)
Quasi-online Reconstruction
store
Request part of event
Request part of event
Filter Unit/ Event Filter Objectivity Formatter
Store rec-Obj
Request part of event
Persistent Object Store Manager Object Database
Management System
Request asynchronous data
Store rec-Obj calibration
store
Request part of event
Data Quality Calibrations Group Physics Analysis
Simulation G3 and or G4
User Analysis on demand
3CARF Components
- CARF Architecture On-demand reconstruction
- (see V.Innocente talk on CARF Architecture-session
A) - Framework Main Services
- Define the events to be dispatched (events and
geometry from Simulations or Test-Beams) - Manage the not yet removed sequential
components (coming from Geant3) - Run-Time Dynamic Loading is used to configure and
build CARF Applications - Framework Persistency Services
- Framework Ancillary Services
- User Interface, Error Report, Logging
facilities,... - Timing facility, Utility library
4CMS Persistency history
- Prototype 1997-98
- Test Beams DAQ and Analysis using Objectivity/DB
in different CMS Test-Beam areas (H2, T9 and
X5b). - The system was successfully tested.
- Production 1999
- Test Beam DAQ (from April 99)
- Monte Carlo (GEANT3) reconstruction (from October
99) - Persistent digit for Calorimeter, Muon and
Trigger - Physics Generator information (vertices, tracks)
persistent - (see D. Stickland talk on ORCA - session A)
5Persistent Service for High Energy Physics Data
- Environmental data
- Detector and Accelerator status
- Calibrations, Alignments
- Event-Collection Meta-Data
- (luminosity, selection criteria, )
-
- Event Data, User Data
6Do a user need a DBMS?
- Do I encode meta-data (run number, version id) in
file names? - How many files and logbooks I should consult to
determine the luminosity corresponding to a
histogram? - How easily I can determine if two events have
been reconstructed with the same version of a
program and using the same calibrations? - How many lines of code I should write and which
fraction of data I should read to select all
events with two ?s with p?gt 11.5 GeV and
?lt2.7? - The same at generator level?
If the answers scare you, you need a DBMS!
7Can CMS do without a DBMS?
- An experiment lasting 20 years can not rely just
on ASCII files and file systems for its
production bookkeeping, condition database,
etc. - Even today at LEP, the management of all real and
simulated data-sets (from raw-data to n-tuples)
is a major enterprise.
A DBMS is the modern answer to such a problem
and, given the choice of OO technology for the
CMS software, an ODBMS (or a DBMS with an OO
interface) is the natural solution.
8A BLOB Model
Event
Event
DataBase Objects
RecEvent
RawEvent
Blob a sequence of bytes. Decoding it is a
user responsibility.
Why should Blobs not be stored in the DBMS?
9 CMS Raw Event
RawData are identified by the corresponding
ReadOutUnit
Raw Event
ReadOutUnit
ReadOutUnit
The ReadOutUnit Object can identify a
complete detector or a detector component
Raw Data
Raw Data
Raw Data
Vector of Digi
Vector of Digi
Vector of Digi
The vector of Digi in the Testbeam contains the
ADC or TDC values
ReadOutUnit
10Persistent Object Management
- The persistent object management is a major
responsibility in the CMS Analysis and
Reconstruction Framework (CARF) - CARF manages
- multi-threaded transactions
- creation of databases and containers
- meta data and event collections
- physical clustering of event objects
- persistent event structure and its relations with
the transient - Use of Database is transparent to detector
developers - users access persistent objects through C
pointers - CARF takes care of memory pinning
11CMS Event Structure
The Run object contains event collection
condition like Beam energy, particle type,
magnetic field etc..
Persistent
Event Collection
Event Collection
Transient
Run
In case of re-reconstruction the original
structure is kept. Event objects are cloned and
new collections created
Event
Event
Event
Event
RawEvent
RecEvent
RecEvent
The event header object contains event num, spill
num, event num in the spill
RecEvent
RecEvent
Event Header
12CMS Reconstructed Objects
Reconstructed Objects produced by a given
algorithm are managed by a Reconstructor.
RecEvent
A Reconstructed Object (Track) is split into
several independent persistent objects to allow
their clustering according to their access
patterns (physics analysis, reconstruction,
detailed detector studies, etc.). The top level
object acts as a proxy. Intermediate
reconstructed objects (RHits) are cached by value
into the final objects .
S-Track Reconstructor
esd
rec
Track SecInfo
Track Constituents
S Track
..
aod
Vector of RHits
S Track
13Test Beam Production in 1999
- Detector performances studies have been the real
users for - Test Beams project
- From April 99 to October 99 the test beam
software was in production for the Tracker and
the Muon reading data from VME - FastBus modules
and filling one federate database for each beam
line (H2b, X5b, T9) and for each data taking
period. - Some system databases
- Beam configuration Read-Out Unit list
- LogBook logbook information for each run
- ListRuns run list
- Run Databases event collection with the same
data taking conditions - The DAQ system Objectivity formatter running on
Solaris - More than 800 GB of data stored in Objectivity/DB
- Ran without major problems
14Test Beam Production in 1999
Offline - cmsc01
Online
Prod Boot
Prod Boot
Clone FD
Prod FD
Prod FD
BConfDB
BConfDB
RunDB
RunDB
LogDB
LogDB
Run1
Run2
Run3
RunN
Run1
Run2
Run3
RunN
15Test Beam Data Analysis
- Online (Prompt data) Monitoring
- on online machine
- fast feedback of the detector performances.
- Offline analysis
- locally on the data server or remotely using AMS
server. - During August, Tracker (X5b) test beam up to 25
concurrent users were accessing data on the
offline system without any observable degradation.
During 1999 Hbook Histograms and ntuples
HBook
n-tuples
TB Analysis Package
Persistent Data
HTL
During 2000 Moves from Hbook Histograms and
ntuples, to HTL and Tags See I. Gaponenko talk
on IGUANA session F
16Tracker Silicon Detector Performances Studies
- Muon beams 50 GeV
- Silicon non irradiated detector
- APV6 Chip deconvolution mode
- FED VME Modules
- active area 62.5 mm x 61.5mm
- thickness 300 mm
- High Resistivity
- strip pitch 61 mm
- strip width 14 mm
- implanted strips 1024
- Scl 31.8 Ncl 2.9
- Scl/Ncl 10.9
17Muon Drift Tube Detector Performances Studies
- DTBX Format
- bits (015) Drift Time (1.04ns) 065535
- bit (16) Signal Edge 1falling
- bits (1722) Cell Number 1..63
- bits (2325) Layer Number 14
- bits (2627) SuperLayer Number 1..3
Beam Profile
Cell Nb
Drift Time (ns)
18Muon Trigger (BTI) Test Beam Analysis
- The Muon Test Beam analysis is fully integrated
with the Muon and first level trigger
reconstruction. - For Bunch and Track Identifier (BTI)
- comparison between real data and simulation is
performed. - see C. Grandi talk on CMS Muon Trigger - session
B
19High Level Trigger Production with ORCA in 1999
DB pop.
20ORCA High Level Trigger 2000 production
- First ORCA production in October 99 was very
successful (gt700GB in Objy/DB), but ORCA prod
2000 must have much more functionality - All data will be in the database
- Every CMSIM run will have its objects in many
database files - Single Db file contains concatenation from many
CMSIM runs (64 k files Objectivity limit) - Many layers of apparently autonomous federations
actually synchronized by enforcing common schema
and unique DbIDs
21High Level Trigger Processing 2000
Minimum Bias
JetMet
Muon
(FZ)User
...
G3 Hits and Tracks
JetMet
Each box is an independent production running in
parallel
ORCA Xings Digis
ORCA RecObjs
JetMet
22Selective Tracker Digitization
Trigger
Calorimetry
Tracker
Muon
Trigger
Calorimetry
Muon
Select
Tracker
Trigger
Calorimetry
Muon
23ORCA HLT Production 2000
Signal
Zebra files with HITS
HEPEVT ntuples
CMSIM
MC Prod.
MB
Catalog import
ORCA Digitization (merge signal and MB)
Objectivity Database
ORCA ooHit Formatter
Objectivity Database
ORCA Prod.
Catalog import
HLT Algorithms New Reconstructed Objects
Objectivity Database
HLT Grp Databases
Mirrored Dbs (US, Russia, Italy..)
24ORCA 2000 Db Structure
One CMSIM Job, oo-formatted into multiple Dbs.
For example
FZ File
Few kB/ev
MC Info Container 1
300kB/ev
1 CMSIM Job
ooHit dB's
100kB/ev
Calo/Muon Hits
200kB/ev
Tracker Hits
Multiple sets of ooHits concatenated into
single Db file. For example
MC Info Run1
MC Info Run2
2 GB/file
Concatenated MC Info from N runs.
MC Info Run3..
Physical and logical Db structures diverge...
25Conclusions
- The persistent object management is a major
responsibility of CMS Analysis and
Reconstruction Framework - A DBMS is required to manage the large data set
of CMS (including user data) - An ODBMS is the natural choice if OO is used in
all software - Once an ODBMS is used to manage the experiment
data, its very natural to use it to manage any
kind of data related to detector studies and
physics analysis - Objectivity/DB has been evaluated in different
prototypes which successfully stored and
retrieved data (Test-Beam, simulated,
reconstructed, statistical i.e histograms).
From 1999 both for Test Beam and High Level
Trigger studies we are in production using
Objectivity/DB.