Title: Meta Data (reloaded)
1Meta Data (reloaded)
An Introduction to non-Event Data in CARF
- Vincenzo Innocente
- CERN/EP/CMC
2Meta-data as attributes
- Traditionally Metadata are data that describes
other data - schema, protocols, type-dictionaries
- Its meaning has been extended to
- Attributes given to an objects in a given context
(including self) - Proxy/cache of some object-attributes for fast
retrieval - Neutral protocol among domains not sharing a
common data-model - Work-around for mistakes in the data-model
- Everything that looks like a super-structure in
your domain - In OO they can be considered as attributes of the
relation among two objects - They suffer of many problems related to object
identity - Copy semantics (shallow-deep)
- Delete semantics (roll-back)
- Play-back semantics (restore from backup,
regeneration) - Examples
- ls l phone innocent a the formatting of this
presentation
3Reconstruction Sources
4HEP Data
- Event-Collection Meta-Data
- Environmental data
- Detector and Accelerator status
- Calibrations, Alignments
- (luminosity, selection criteria, )
-
- Event Data, User Data
Event Collection
Collection Meta-Data
Navigation is essential for an effective physics
analysis Complexity requires coherent access
mechanisms
5Top Level Event Structure (ORCA4)
TheRun object is the entry point for provenience
and configuration information
Run
Crossing
Trigger
Pile-up
Run
SimEvent
6Re-Reconstruction Clones
Run
Run
Id-1
Local Replica
Crossing
Trigger
Pile-up
7Dataset Collection
MetaData User Tag
Run Collection
An example of Meta-Mess Due to the transition to
winter mode a collection ended to have attributes
in three contexts self, dataset and the master
collection
Rec Event
8Top Level MetaData Structure (spring)
System Collection
RunList
Owner (Transformation?)
Specific to DS type
Run
DataSet
SetUp
EVDFilePool
Event Collection
Persistent Algorithms
EVDFile
Configuration
Specific to DS type
Container
Specific to DS type
Location of event data
9Top Level MetaData Structure (winter)
System Collection
RunList
Owner (Transformation?)
Specific to DS type
Run
DataSet
PoolCatalog
SetUp
Event Collection
Persistent Algorithms
EVDFile
Configuration
Specific to DS type
Container
Specific to DS type
Location of event data
10Interface
11Interface
12Publication, Distribution and Replica
- Sharing of data sample produced in a private
environment must be supported - Local data sources
- Local data products
- Work in isolation without prior registration to a
larger scope - Make it available from a local scope
- Requires a change of scope to access it (sshcd,
change file/db-server) - Make it accessible from other scopes
- Usually implies publication in a global scope
(/afs/ http/, RLS, DSN) - Make it available from a different scope
- Implies a physical replica
- Data sharing cannot be supported just with a
centralized dbms or replica service
13Obj id in a distributed environment
- Unique Object id
- Easy to obtain, not human friendly
- Related to physical location (pool oid, /afs/)
- Fast direct access, ensure consistency, makes
replica management difficult - Location independent (pool file id)
- Support replica at object level, access requires
an additional scope, makes update (and delete)
difficult - Difficult to turn into a logical identity
- Add checksum?
- Attribute-based Object id
- Human best friend
- Does not guarantee uniqueness, supports
relational-algebra and fuzziness - Global vs local scope (namespace)
- Performance requires to turn it in a unique-id in
a restricted-scope (index) - Hybrid-Store should be intended more has
supporting various navigation and access
paradigm, rather than implemented using different
technologies
14Open issues
- Configuration should be moved in its own database
- Is it the same as the Condition DB?
- How we identify versions and variants?
- Should we refer to configuration items by a
unique id or through its attributes? - Owner, Originator, Transformation, Dataset
- Do we need to distinguish between these concepts?
- What is the relationship among them and w.r.t.
the configuration? - Naming policy
- Can we afford multiple naming policies?
- At which level naming policies should be
enforced? - Can we really implement a unique consistent
naming policy in a fully distributed environment?