Title: Navigation Requirements
1Navigation Requirements
CMS View of OIDs and Refs
- Vincenzo Innocente
- Lassi Tuura
- CMS
2Driving Requirements
- Ability to locate a persistent object even if
relocated or multiply located - Scattering write in one file, relocate to many
- Gathering collect in one file interesting
objects - Solution should possibly address all kind of data
- Event Data
- Calibrations, Conditions, Geometry
- MetaData themselves
3Scenarios
- Production (simplify it)
- Write all data from a single process in a single
file - Split later the data according to a clustering
strategy - Access (transfer as little as possible)
- Select and reprocess events from a large sample
- Get a local persistent copy of just what needed
- Ensure full event navigability at every stage of
analysis - Typical query
- Give me the closest (actually in the fastest way)
collection of tracks compatible with this
configuration belonging to the events satisfying
these criteria
4Event Model
- (Event) Data Product (100-1000 per event)
- chunk of (event-) data managed as a single unit
- Collection of Digis belonging to a part of a
detector - Collection of RecObj (track, calo-clusters, jets)
produced by a given algorithm - Currently identified by (its objy OID and
federaton) - Event
- ascii string
- metadata describing how it was produced
- Its transient type
- Physical location
- Inter Data Product dependency tracked
- Consistency ensured
5Production 2002, Complexity
Number of Regional Centers 11
Number of Computing Centers 21
Number of CPUs 1000
Largest Local Center 176 CPUs
Number of Production Passes for each Dataset(including analysis group processing done by production) 6-8
Number of Files 11,000
Data Size (Not including fz files from Simulation) 17TB
File Transfer by GDMP and by perl Scripts over scp/bbcp 7TB toward T1 4TB toward T2
6HEP Data
- Event-Collection Meta-Data
- Environmental data
- Detector and Accelerator status
- Calibrations, Alignments
- (luminosity, selection criteria, )
-
- Event Data, User Data
Navigation is essential for an effective physics
analysis Complexity requires coherent access
mechanisms
7Re-Reconstruction Clones
Production
User
Run and Config
Run and Config.
Id-2
Tracker
Local Replica
Ecal
Ecal
Hcal
Hcal
8CMS Reconstructed Objects
Reconstructed Objects produced by a given
algorithm are managed by a Reconstructor.
RecEvent
A Reconstructed Object (Track) is split into
several independent persistent objects to allow
their clustering according to their access
patterns (physics analysis, reconstruction,
detailed detector studies, etc.). The top level
object acts as a proxy. Intermediate
reconstructed objects (RHits) are cached by value
into the final objects . Possible to Recalibrate
aod (and generate a new version without modify
or copy the esd and rec)
calibration dependent
CPU intensive
S-Track Reconstructor
esd
Track SecInfo
rec
S Track
..
Track Constituents
aod
Vector of RHits
S Track
9Raw Event
RawData are identified by the corresponding
ReadOut. RawData belonging to different detector
s are clustered into different containers. The
granularity will be adjusted to optimize I/O
performances. An index at RawEvent level is
used to avoid the access to all containers in
search for a given RawData. A range index at
RawData level could be used for fast
random access in complex detectors.
RawEvent
ReadOut
ReadOut
...
RawData
RawData
Index implemented as an ordered vector of pairs
10A Oid proposal
- We propose to use an object identifier composed
of three fields - Navigation-Scope (Sea??) identifier
- Always implicit explicit use limited to cross
reference among disjoint stores (for instance
event toward calibration) - Nothing prevents to use a context for a dataset
or even an event - Concrete implementation of the Sea is a file
catalog - Data Product Id (dp-id)
- Unique and immutable identifier (in a given sea)
of a data product - To simplify lookup in case of scattering-gathering
we suggest it includes a field identifying the
logical-file (lf-id) - In writing one can easily stream all
logical-files into the same physical file - For a given sea a physical file can map to
multiple logical-files - In small seas (lakes) (such as a local replica of
selected events) even a m-to-n mapping could be
affordable - Object index
- Used to identify single objects in the data
product - If the Data product is WORM indexing will work
whatever data structure is used below a Data
Product
11Data Product id resolution
- A possible implementation
- Sea is responsible for mapping a lf-id to a given
strategy to resolve a data-product-id - The same dp-id can be resolved differently
depending in which sea we are navigating - Physical resolution strategy
- Lf-id identifies a file, the rest of the dp-ida
physical location (objyl ike) - Local mapping
- Lf-id identifies a file (not necessarily the
original one) the rest of the dp-id is used to
look-up in a table contained in the file itself - Global mapping
- Lf-id identifies a table, the rest of the dp-id
is used to look-up in the table for the physical
location of the data-product - .