POOL File Catalog and Collection - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

POOL File Catalog and Collection

Description:

Allow complex user selection on collection and object meta data ... Expose physical clustering of objects through the collection interface by storage components ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 17
Provided by: zhen8
Category:

less

Transcript and Presenter's Notes

Title: POOL File Catalog and Collection


1
POOL File Catalog and Collection
  • Zhen Xie
  • On behalf of the POOL team

http//lcgapp.cern.ch/project/persist
2
Thanks to the following developers for providing
data and comments for this talk M.Girone,
J.Wojcieszuk (File Catalog) I.Papadopoulos,
H.Schmuecker (Collection)
3
Overview
  • File Catalog component
  • keep track of files (physical location and
    description)
  • resolve logical file reference (FileID) into a
    physical file
  • Collection component
  • keep track of a large set of objects and their
    description
  • entry point to access data via logical grouping

4
Static Relationships
Production manager
File Catalog
Storage components
Collections
ref
run
evt

1
1
POOL

1
2
5
Dynamic relationships - read
storage components
user application
collection
file catalog
PFN
object
read
read via collection
6
Dynamic relationships write
storage components
user application
collection
file catalog
register/lookup (PFN)
write
fileID
define collection
(attributespec)
create collection
register/lookup (PFN)
fileID
fill collection
(ref, attribute)
7
File Catalog-schema
Logical Naming
Object Lookup
Physical Filename 2
Logical Filename 2
Physical Filename m
Logical Filename n
  • FileID unique, immutable identifier of the file
  • Physical File Name (PFN) identifies the physical
    location of the file,
  • e.g. lx01.cern.ch/data/tt.root
  • Logical filenames are supported but not required

8
File Catalog-implementation
  • XML catalog
  • disconnected
  • 20K entries
  • MySQL catalog
  • local cluster
  • 1M - 10M entries
  • EDG-RLS based catalog
  • on the grid
  • large

9
File Catalog - features
  • Transaction control
  • C API command-line tools
  • File registration, lookup
  • File administration operations
  • e.g. add replica, renamePFN
  • Iteration over catalog
  • Cross catalog operations
  • e.g. append XML to MySQL catalog
  • Import, extract catalog fragment
  • Query user defined file meta data
  • Graphic user interface
  • Performance scalability improvement


10
File Catalog-use case
Grid catalog
11
File Catalog-performance (preliminary)
  • XML tested up to 50K entries
  • start time
  • new catalog 10ms
  • catalog with 20K entries 6s
  • registerPFN lt0.3ms/entry
  • MySQL tested up to 1M entries
  • up to 300 concurrent clients, commit every 100
    entries or less frequent
  • registerPFN lt1.5ms/entry
  • EDG-RLS based catalog (see Mr J.Caseys talk)
  • registerPFN 6ms/entry (autocommit)

Pentium III-1.2GHz free memory-220MB PFN-200
char FileID-36 char
12
Explicit Collection
  • Explicit list of object references
  • Allow iteration over large set of objects
  • Queryable if associated with object meta data
  • Like ntuple with object reference
  • Allow accessing data via logical association

13
Explicit Collection-features
  • Transaction control
  • Meta data definition
  • Collection creation, population
  • Iteration
  • Query
  • MySQL implementation
  • Root implementation

14
Hierarchical Collection
  • Collection of collections
  • Leaf collections are explicit object collections
  • Allow complex user selection on collection and
    object meta data
  • all events with gt4 selected electrons from runs
    with working ECAL and selected calibration,
    alignment setup
  • Currently in prototyping phase

ref
collection
object metadata
collection metadata
15
Implicit Collection
  • Collections defined by physical containment of
    the objects
  • e.g. all objects in container A of database B
  • Expose physical clustering of objects through the
    collection interface by storage components
  • To be implemented in POOL

16
Summary
  • File Catalog
  • achieved cross-file navigation
  • grid-aware (via EDG-RLS), but also preserving
    grid-decoupled modes
  • on-going integration with LHC experiment software
    and production for LCG-1
  • Collection
  • achieved accessing objects by explicit MySQL
    collection
  • there are pending design and implementation
    issues
Write a Comment
User Comments (0)
About PowerShow.com