Title: A Uniform and Coherent Approach to Object Persistency
1A Uniform and Coherent Approachto Object
Persistency
2HEP Data
- Environmental data
- Detector and Accelerator status
- Calibrations, Alignments
- Event-Collection Data
- (luminosity, selection criteria, )
-
- Event Data, User Data
Navigation is essential for an effective physics
analysis Complexity requires coherent access
mechanisms
3Later selected DAQ
Not in original design
Later more filters to DVNs and Ntpule
4CMS Experiment-Data Analysis
Quasi-online Reconstruction
Environmental data
Detector Control
Online Monitoring
store
Request part of event
Store rec-Obj
Request part of event
Event Filter Object Formatter
Request part of event
store
Persistent Object Store Manager
Object Database Management System
Store rec-Obj and calibrations
store
Request part of event
Data Quality Calibrations Group Analysis
Simulation G3or G4
User Analysis on demand
5Uniform approach
- Coherent data access model
- same mechanisms, same language, same transaction
model - Save effort
- A single team of experts
- A single team of administrators
- Leverage experience
- developers can easily move from one application
to another (from event-data to calibration-data
applications) - Reuse design and code
- Basic requirements are often the same
- We can use the same code to manage event data,
calibrations, n-tuple - Main road in producing better and higher quality
software
6Reconstruction Sources
7CMS Reconstruction Model
Geometry
Conditions
Sim Hits
Raw Data
Detector Element
Event
Digis
Rec Hits
Algorithm
8(No Transcript)
9Raw Event
RawData are identified by the corresponding
ReadOut. RawData belonging to different detector
s are clustered into different containers. The
granularity will be adjusted to optimize I/O
performances. An index at RawEvent level is
used to avoid the access to all containers in
search for a given RawData. A range index at
RawData level could be used for fast
random access in complex detectors.
RawEvent
ReadOut
ReadOut
...
RawData
RawData
Index implemented as an ordered vector of pairs
10Reconstruction Object Model
All persistent objects are managed by
CARF. Physics Modules access them through
standard C pointers
11CMS Reconstructed Objects
Reconstructed Objects produced by a given
algorithm are managed by a Reconstructor.
RecEvent
A Reconstructed Object (Track) is split into
several independent persistent objects to allow
their clustering according to their access
patterns (physics analysis, reconstruction,
detailed detector studies, etc.). The top level
object acts as a proxy. Intermediate
reconstructed objects (RHits) are cached by value
into the final objects .
S-Track Reconstructor
esd
Track SecInfo
rec
S Track
..
Track Constituents
aod
Vector of RHits
S Track
12CARF2000 Event Structure
13CMS Event Structure
Persistent
Event Collection
Event Collection
Transient
Run
RecEvent
RecEvent
In case of re-reconstruction the original
structure is kept. Event objects are cloned and
new collections created
RawEvent
RecEvent
RecEvent
14Physical clustering
15CMS needs a real DBMS
- An experiment lasting 20 years can not rely just
on ASCII files and file systems for its
production bookkeeping, condition database,
etc. - Even today at LEP, the management of all real and
simulated data-sets (from raw-data to n-tuples)
is a major enterprise - Multiple models used (DST, N-tuple, HEPDB,
FATMAN, ASCII) - A DBMS is the modern answer to such a problem
- An ODBMS provides a coherent and scalable
solution for managing all kind of data - seamless integration with OO languages
- internal navigation capability
16CMS Experience
- CMS has used Objectivity/DB for the current
prototype activity in close contact with IT in
the context of the RD45 project - Database Developers (just OO and C)
- Designing and implementing persistent classes not
harder than for native C classes. - Physics Software Developers (do not see
Objectivity) - Persistent objects are accessed using standard
C - Same code can access either persistent or
transient object - Framework (easy to manage DB)
- Flexible and transparent distinction between
logical associations and physical clustering. - Fully transparent I/O with performances
essentially limited by the disk speed (random
access).
17CMS Experience
- Administration (essentially file management)
- Very flexible file-level management
(localization, archival, replication) using AMS
features - Several tools available to monitor activities and
performance - File size overhead (5 for realistic CMS object
sizes) not larger than for other products - Physicists (easy to use)
- Personal Databases are invaluable and in common
use - Analysis performance and flexibility improved by
shallow (link) deep (data) local copy of
selected event sample - use same type of event-catalog as production
- Framework and CMS tools hide all details
- All our tests show that Objectivity/DB can
satisfy CMS requirements in terms of performance,
scalability and flexibility for all kind of data
18Alternatives other ODBMS
- Versant is a viable commercial alternative to
Objectivity - do we have time to build an effective partnership
(eg. MSS interface)? - Espresso (by IT/DB) should be able to produce a
fully fledged ODBMS in a couple of years once the
proof-of-concept prototype is ready - Migrate CARF from Objectivity to another ODBMS
- We expect that it would take about one year
- Will not affect the basic principles of CMS
software architecture and data model - Will involve only the core CARF development team.
- Will not disrupt production and physics analysis
19Alternatives ORDBMS
- ORDBMS (Relational DB with OO interface) are
appearing on the market - Up to now they looked targeted to those who have
already a relational system and wish to make a
transition to OO - A New ORACLE product has all the appearances of a
fully fledged ODBMS - IT/DB is in the process of evaluating this new
product as an event store - If it will look promising CMS will join this
evaluation next year. - We will consider the impact of ORDBMS on CMS
Data Model and on migration effort before the end
of 2001
20Fallback Solution Hybrid Models
- We believe that this solution could seriously
compromise our ability to perform our physics
program competitively - (R)DBMS for Event Catalog, Calibration, etc
- Object-Stream files for event data
- Ad-hoc networked data-server and MSS interface
- Less flexible
- Rigid split between DBMS and event data
- One way navigation from DBMS to event data
- More complex
- Two different I/O systems
- More effort to learn
- More resources for developing and maintaining
our application software - This approach will be used by several experiment
at BNL and FermiLab (RDBMS not directly
accessible from user applications) - CMS is following closely these experiences.
21Conclusion
- CMS has chosen to follow a uniform and coherent
approach for the development of Experiment-Data
Analysis Software - Today a Functional Prototype exists and includes
- A modular Object Oriented Framework
- A Service and Utility Toolkit
- A Persistent Object Service based on
Objectivity/DB - Specialized applications for DAQ, Simulation,
Reconstruction and Visualization - A set of plug-in modules for detector and physics
simulation, reconstruction and analysis - CMS is currently reviewing the present
architecture, the software design and the
technical choices to prepare for next software
development cycle