Navigation Requirements

About This Presentation

Title:

Navigation Requirements

Description:

Ability to locate a persistent object even if relocated or multiply located ... Ensure full event navigability at every stage of analysis. Typical query ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 12

Provided by: ygap

Category:

more less

Transcript and Presenter's Notes

Title: Navigation Requirements

1
Navigation Requirements
CMS View of OIDs and Refs

Vincenzo Innocente
Lassi Tuura
CMS

2
Driving Requirements

Ability to locate a persistent object even if
relocated or multiply located
Scattering write in one file, relocate to many
Gathering collect in one file interesting
objects
Solution should possibly address all kind of data
Event Data
Calibrations, Conditions, Geometry
MetaData themselves

3
Scenarios

Production (simplify it)
Write all data from a single process in a single
file
Split later the data according to a clustering
strategy
Access (transfer as little as possible)
Select and reprocess events from a large sample
Get a local persistent copy of just what needed
Ensure full event navigability at every stage of
analysis
Typical query
Give me the closest (actually in the fastest way)
collection of tracks compatible with this
configuration belonging to the events satisfying
these criteria

4
Event Model

(Event) Data Product (100-1000 per event)
chunk of (event-) data managed as a single unit
Collection of Digis belonging to a part of a
detector
Collection of RecObj (track, calo-clusters, jets)
produced by a given algorithm
Currently identified by (its objy OID and
federaton)
Event
ascii string
metadata describing how it was produced
Its transient type
Physical location
Inter Data Product dependency tracked
Consistency ensured

5
Production 2002, Complexity
Number of Regional Centers 11
Number of Computing Centers 21
Number of CPUs 1000
Largest Local Center 176 CPUs
Number of Production Passes for each Dataset(including analysis group processing done by production) 6-8
Number of Files 11,000
Data Size (Not including fz files from Simulation) 17TB
File Transfer by GDMP and by perl Scripts over scp/bbcp 7TB toward T1 4TB toward T2
6
HEP Data

Event-Collection Meta-Data
Environmental data
Detector and Accelerator status
Calibrations, Alignments
(luminosity, selection criteria, )
Event Data, User Data

Navigation is essential for an effective physics
analysis Complexity requires coherent access
mechanisms
7
Re-Reconstruction Clones
Production
User
Run and Config
Run and Config.
Id-2
Tracker
Local Replica
Ecal
Ecal
Hcal
Hcal
8
CMS Reconstructed Objects
Reconstructed Objects produced by a given
algorithm are managed by a Reconstructor.
RecEvent
A Reconstructed Object (Track) is split into
several independent persistent objects to allow
their clustering according to their access
patterns (physics analysis, reconstruction,
detailed detector studies, etc.). The top level
object acts as a proxy. Intermediate
reconstructed objects (RHits) are cached by value
into the final objects . Possible to Recalibrate
aod (and generate a new version without modify
or copy the esd and rec)
calibration dependent
CPU intensive
S-Track Reconstructor
esd
Track SecInfo
rec
S Track
..
Track Constituents
aod
Vector of RHits
S Track
9
Raw Event
RawData are identified by the corresponding
ReadOut. RawData belonging to different detector
s are clustered into different containers. The
granularity will be adjusted to optimize I/O
performances. An index at RawEvent level is
used to avoid the access to all containers in
search for a given RawData. A range index at
RawData level could be used for fast
random access in complex detectors.
RawEvent
ReadOut
ReadOut
...
RawData
RawData
Index implemented as an ordered vector of pairs
10
A Oid proposal

We propose to use an object identifier composed
of three fields
Navigation-Scope (Sea??) identifier
Always implicit explicit use limited to cross
reference among disjoint stores (for instance
event toward calibration)
Nothing prevents to use a context for a dataset
or even an event
Concrete implementation of the Sea is a file
catalog
Data Product Id (dp-id)
Unique and immutable identifier (in a given sea)
of a data product
To simplify lookup in case of scattering-gathering
we suggest it includes a field identifying the
logical-file (lf-id)
In writing one can easily stream all
logical-files into the same physical file
For a given sea a physical file can map to
multiple logical-files
In small seas (lakes) (such as a local replica of
selected events) even a m-to-n mapping could be
affordable
Object index
Used to identify single objects in the data
product
If the Data product is WORM indexing will work
whatever data structure is used below a Data
Product

11
Data Product id resolution

A possible implementation
Sea is responsible for mapping a lf-id to a given
strategy to resolve a data-product-id
The same dp-id can be resolved differently
depending in which sea we are navigating
Physical resolution strategy
Lf-id identifies a file, the rest of the dp-ida
physical location (objyl ike)
Local mapping
Lf-id identifies a file (not necessarily the
original one) the rest of the dp-id is used to
look-up in a table contained in the file itself
Global mapping
Lf-id identifies a table, the rest of the dp-id
is used to look-up in the table for the physical
location of the data-product
.

Write a Comment

User Comments (0)

About PowerShow.com

Navigation Requirements - PowerPoint PPT Presentation

Navigation Requirements

Ability to locate a persistent object even if relocated or multiply located ... Ensure full event navigability at every stage of analysis. Typical query ... – PowerPoint PPT presentation