Title: POOL File Catalog and Collection
1 POOL File Catalog and Collection
- Zhen Xie
- On behalf of the POOL team
http//lcgapp.cern.ch/project/persist
2Thanks to the following developers for providing
data and comments for this talk M.Girone,
J.Wojcieszuk (File Catalog) I.Papadopoulos,
H.Schmuecker (Collection)
3Overview
- File Catalog component
- keep track of files (physical location and
description) - resolve logical file reference (FileID) into a
physical file - Collection component
- keep track of a large set of objects and their
description - entry point to access data via logical grouping
4 Static Relationships
Production manager
File Catalog
Storage components
Collections
ref
run
evt
1
1
POOL
1
2
5Dynamic relationships - read
storage components
user application
collection
file catalog
PFN
object
read
read via collection
6Dynamic relationships write
storage components
user application
collection
file catalog
register/lookup (PFN)
write
fileID
define collection
(attributespec)
create collection
register/lookup (PFN)
fileID
fill collection
(ref, attribute)
7File Catalog-schema
Logical Naming
Object Lookup
Physical Filename 2
Logical Filename 2
Physical Filename m
Logical Filename n
- FileID unique, immutable identifier of the file
- Physical File Name (PFN) identifies the physical
location of the file, - e.g. lx01.cern.ch/data/tt.root
- Logical filenames are supported but not required
8File Catalog-implementation
- XML catalog
- disconnected
- 20K entries
- MySQL catalog
- local cluster
- 1M - 10M entries
- EDG-RLS based catalog
- on the grid
- large
9File Catalog - features
- Transaction control
- C API command-line tools
- File registration, lookup
- File administration operations
- e.g. add replica, renamePFN
- Iteration over catalog
- Cross catalog operations
- e.g. append XML to MySQL catalog
- Import, extract catalog fragment
- Query user defined file meta data
- Graphic user interface
- Performance scalability improvement
10File Catalog-use case
Grid catalog
11File Catalog-performance (preliminary)
- XML tested up to 50K entries
- start time
- new catalog 10ms
- catalog with 20K entries 6s
- registerPFN lt0.3ms/entry
- MySQL tested up to 1M entries
- up to 300 concurrent clients, commit every 100
entries or less frequent - registerPFN lt1.5ms/entry
- EDG-RLS based catalog (see Mr J.Caseys talk)
- registerPFN 6ms/entry (autocommit)
Pentium III-1.2GHz free memory-220MB PFN-200
char FileID-36 char
12Explicit Collection
- Explicit list of object references
- Allow iteration over large set of objects
- Queryable if associated with object meta data
- Like ntuple with object reference
- Allow accessing data via logical association
13Explicit Collection-features
- Transaction control
- Meta data definition
- Collection creation, population
- Iteration
- Query
- MySQL implementation
- Root implementation
14Hierarchical Collection
- Collection of collections
- Leaf collections are explicit object collections
- Allow complex user selection on collection and
object meta data - all events with gt4 selected electrons from runs
with working ECAL and selected calibration,
alignment setup - Currently in prototyping phase
ref
collection
object metadata
collection metadata
15Implicit Collection
- Collections defined by physical containment of
the objects - e.g. all objects in container A of database B
- Expose physical clustering of objects through the
collection interface by storage components - To be implemented in POOL
16Summary
- File Catalog
- achieved cross-file navigation
- grid-aware (via EDG-RLS), but also preserving
grid-decoupled modes - on-going integration with LHC experiment software
and production for LCG-1 - Collection
- achieved accessing objects by explicit MySQL
collection - there are pending design and implementation
issues