Persistency at LHC - PowerPoint PPT Presentation

About This Presentation
Title:

Persistency at LHC

Description:

a different process running the same executable ... But with the concerns and weight of Objectivity it is overkill for this role. ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 64
Provided by: monica69
Category:

less

Transcript and Presenter's Notes

Title: Persistency at LHC


1
Persistency at LHC
  • Vincenzo Innocente
  • CERN/EP/CMC

2
Sources and Contributions
  • Presentations at last RD45 workshop
  • Presentations at the Architecture Working Group
  • Experiments Web pages
  • Focus on
  • LHC experiments prototypes
  • New generation experiments (BaBar, STAR, RunII)
    experience and plans

3
  • Persistency
  • in
  • General

4
Persistency what for?
Process 1
Process 3
  • A process saves its state to be later re-used by
  • the same process
  • a different process running the same executable
  • a different process running a different
    executable
  • Ideal persistency
  • Core Dump!

Process 2
Volatile Memory
Permanent Storage
5
Use Cases
  • Extended (in space and time) virtual memory
  • proprietary format optimized for computational
    and storage performance of a single application
  • Import/Export in a heterogeneous environment
  • standard application-independent format
  • conversion to/from internal application format
  • Management of different versions (identification,
    query mechanism) and of concurrency (locking)
  • proprietary internal mechanism
  • rely on the file system DBMS

6
Use Cases
  • Extended (in space and time) virtual memory
  • proprietary format optimized for computational
    and storage performance of a single application
  • Import/Export in a heterogeneous environment
  • standard application-independent format
  • conversion to/from internal application format
  • Management of different versions (identification,
    query mechanism) and of concurrency (locking)
  • proprietary internal mechanism
  • rely on the file system DBMS

7
Object Persistency
  • Objects are atomic entities
  • have a state (data members
    including relationships)
  • provide services (methods)
  • Persistent objects survive process boundaries
  • when retrieved
  • have the same state
  • provide the same services
  • as they were stored

Event
Event
Event
Volatile Memory
Permanent Storage
Event
Event
8
Object Persistency
  • Persistency
  • Objects retain their state between two program
    contexts
  • Storage entity is a complete object
  • State of all data members
  • Object class
  • OO Language Support
  • Abstraction
  • Inheritance
  • Polymorphism
  • Parameterised Types (Templates)

9
OO Language Binding
  • User had to deal with copying between program and
    I/O representations of the same data
  • User had to traverse the in-memory structure
  • User had to write and maintain specialised code
    for I/O of each new class/structure type
  • Tight Language Binding
  • ODBMS allow to use persistent objects directly as
    variables of the OO language
  • C, Java and Smalltalk (heterogeneity)
  • I/O on demand No explicit store retrieve calls

10
Problems with Naïve OP
  • Storing services (methods ready to run) is non
    trivial
  • persistency services store data only
  • configuration management takes care of code
  • frameworks can use dynamic loading to match data
    code
  • Clean and performant object design is difficult
  • Different (partial) representations of the state
    of an object may be required to cope with
    computational, storage and I/O efficiencies (and
    code development efficiency)
  • Object design and implementation evolve,
    persistent objects stay the same
  • Old persistent objects need to be converted

11
More Problems with Naïve OP
  • Object granularity does not match raw I/O
    granularity (which in turn is device dependent)
  • small objects should be physically clusterized
    according to users access patterns
  • Object logical relationships do not necessarily
    reflect access patterns (old rows vs columns
    dilemma)
  • How objects become persistent
  • At construction time (user can control
    clustering)
  • By reachability An object becomes persistent
    when attached to an already persistent object
    (clustering control difficult)

12
Physical Model and Logical Model
  • Physical model may be changed to optimise
    performance
  • Existing applications continue to work

13
Realistic Object Persistency
Conversion from/to computational optimal format?
compression?
object
file
page
object
objects
page
Conversion from/to machine dependent format new
shape
14
Components of a POM
  • Storage manager
  • manage the physical structure on disk
  • Transaction/concurrency manager
  • client transaction, journaling, locking
    mechanisms
  • (or rely on OS and file system protections)
  • RTTI system
  • identifies the concrete type of object to
    retrieve/store
  • Converters
  • from storage format to user format and
    viceversa
  • machine-dependencies, schema-evolutions,
    user-hooks

15
Components of a POM
  • Application Cache manager
  • dynamic memory management with garbage collection
  • Tools and (G)UI
  • naming, indexing, query mechanisms
  • interactive browsing and query
  • development tools
  • administration tools

16
Objectivity/DB
  • ODBMS close to ODMG standard (library not
    framework)
  • Storage Manager based on fixed physical hierarchy
  • slot-page-container-database(file)-federation
  • Lock-server and journals to manage transactions
  • Proprietary parsing of extension of C (ooddlx)
  • Objects are converted when opened
  • schema-evolution effects automatic or user
    defined
  • Basic naming, indexing and query mechanisms
  • Crude Browsing and administration tools
  • but Objy is integrated with some third-party
    frameworks

17
ROOT
  • Application Framework with embedded I/O
  • Storage Manager based on
  • logical hierarchy Tbasket-branch-tree
  • physical logical-records in files
  • No transactions, no concurrency management
  • Proprietary parsing of C subset (CINT)
  • Objects are converted when retrieved (Streamer)
  • Automatically or by user (schema-evolution only
    by user)
  • No naming, indexing or query mechanisms
  • but CINT scripting
  • Pawerful interactive environment

18
(Wrapped O)RDBMS
  • Powerful, reliable and efficient storage managers
    with full concurrency and transaction management
  • SQL query mechanisms with transparent (hidden)
    indexing and naming
  • User friendly, fully integrated browsers and
    tools
  • (for relational tables)
  • Poor object integration
  • (developers should be both OO and ER experts at
    the same time)

19
  • Persistency
  • in
  • HEP

20
HEP Data
  • Environmental data
  • Detector and Accelerator status
  • Calibrations, Alignments
  • Event-Collection Meta-Data
  • (luminosity, selection criteria, )
  • Event Data, User Data

21
Environmental Data
Version C
Geometry
Version B
Version A
Version C
Alignment
Version B
Version A
Version B
Calibration
Version A
time
Snapshot for Environmental data items valid for
the currently processed event.
Parameters
22
Event Structure Placement (BaBar)
Event Header
Tag
Tag
Evs
Sim Header
Raw Header
Emc Header
Trk Header
Pid Header
Beta Header
Hdr
Sim Data
Sim
Raw Data
Raw
Emc Data
Trk Data
Pid Data
Rec
Emc Data
Trk Data
Pid Data
Beta Data
Esd
Trk Data
Pid Data
Beta Data
Aod
Databases
23
BaBar Event Structure
  • Decoupling of placement navigation
  • Hierarchical Placement Regions
  • Sim (Simulated Data). 100kBytes/event
  • Tru (Simulated Truth Data) 40kBytes/event
  • Raw (Raw Data) 30kBytes/event
  • Rec (Reconstructed Data) 100kBytes/event
  • Esd (Event Summary Data) 20kBytes/event
  • Aod (Analysis Object Data) 2kBytes/event
  • Tag (Event Selection Tag) 200Bytes/event
  • Navigation Trees
  • Minimize size of navigation headers
  • Allow for expansion of data without schema
    evolution

24
Root Physical Clustering
25
ODBMS-MSS Integration
  • SLAC-Objy Plan
  • Extensible AMS
  • Allows use of any type of filesystem via oofs
    layer
  • Generic Authentication Protocol
  • Allows proper client identification
  • Opaque Information Protocol
  • Allows passing of hints to improve filesystem
    performance
  • Defer Request Protocol
  • Accommodates hierarchical filesystems
  • Redirection Protocol
  • Accommodates terabyte filesystems
  • Provides for dynamic load balancing

26
Dynamic Load Balancing Hierarchical Secure AMS
ams
Redwood
ams
Dynamic Selection
client
hpss
Redwood
ams
Redwood
27
One Technology for All ?
  • Event catalogues
  • Update (add and remove) items of a catalogue
  • Searchable SQL or equivalent
  • Event data
  • Write once-read many (WORM)
  • Often on tertiary (sequential) storage
  • Bulk data used by the entire collaboration (Raw,
    Rec,)
  • User extracted data (N-tuples)

28
One Technology for All ?
  • Detector data
  • Updates of data items
  • Versioning of data items
  • Version configuration
  • Statistical data
  • Understandable by interactive tools
  • A single coherent solution (non optimal for all
    purposes)
  • or
  • Ad-hoc optimal product for each given type?

29
LHCb Event Persistency
SicbCnvSvc
Transient Event Store
Sicb data Files
Sicb/Zebra
Converter
Event Data Service
Converter
Converter
Persistency Service
RootCnvSvc
Algorithm
Algorithm
Root data Files
Root I/O
Converter
Converter
Converter
OutputStream
AppManager
OutputStream
30
LHCb Generic Persistent Model
Technology
Converter
(2)
(3)
(4)
12ByteOID
ltnumbergt
(1)
Lookup table
31
LHCb Link Tables
  • One Link table per Storage technology per DB
  • Link to Objy object
  • no link table
  • 8 Bytes are enough to hold ooRef directly
  • Link to ROOT object
  • Link table entry must contain all navigation info
  • File name
  • Tree/Branch name
  • Link to ZEBRA (SICB) object
  • Link Table contains file name ZEBRA bank name

32
Hybrid Event Store in STAR
  • Adoption of ROOT I/O for the event store leaves
    Objectivity with one role left to cover the true
    database functions of the event store
  • Navigation among event collections, runs/events,
    event components
  • Data locality (now translates basically to file
    lookup)
  • Management of dynamic, asynchronous updating of
    the event store from one end of the processing
    chain to the other
  • From initiation of an event collection (run) in
    online through addition of components in
    reconstruction, analysis and their iterations
  • But with the concerns and weight of Objectivity
    it is overkill for this role.
  • So we went shopping
  • looking to leverage the world around us, as
    always
  • and eyeing particularly the rising wave of
    Internet-driven tools and open software
  • and came up with MySQL in May.

33
Requirements STAR 8/99 View
34
RHIC Data Management Factors For Evaluation
  • Changes in the STAR view from 97 to now are
    shown
  • Objy RootMySQL Factor
  • ? ? Cost
  • ? ? Performance and capability as data access
    solution
  • ? ? Quality of technical support
  • ? ? Ease of use, quality of doc
  • ? ? Ease of integration with analysis
  • ? ? Ease of maintenance, risk
  • ? ? Commonality among experiments
  • ? ? Extent, leverage of outside usage
  • ? ? Affordable/manageable outside RCF
  • ? ? Quality of data distribution mechanisms
  • ? ? Integrity of replica copies
  • ? ? Availability of browser tools
  • ? ? Flexibility in controlling permanent
    storage location
  • ? ? Level of relevant standards compliance,
    eg. ODMG
  • ? ? Java access
  • ? ? Partitioning DB and resources among groups

35
  • Experiments
  • Status and Plans

36
CMS
  • Use Objy in production
  • Test Beam DAQ
  • Montecarlo (GEANT3) reconstruction
  • Objectivity fully integrated in Application
    Framework (CARF)
  • CARF manages transactions, physical clustering
    and the whole persistent object structure and its
    relations with the transient structure
  • users access persistent objects through C
    pointers
  • CARF takes care of pinning
  • leaf inheritance from ooObj often used

37
CMS
  • Limited use of Objectivity extentions
  • associations, indexes, maps, query predicates,
    etc.
  • object copy, move, versions
  • Schema evolution routinely used
  • No complex object conversion attempted so far
  • Multi-federation environment to decouple
  • production
  • analysis
  • development

38
CMS Production Federations
Empty user dbs system dbs last run-data db
Copy of Empty user dbs system dbs all
run-data db
Online Boot
Offline Boot
Online FD
Offline FD
Clone FD
Run1
Us1
Us1
Run1
Conf
Run2
Conf
Us2
Us2
Run3
RunCat
RunN
RunCat
RunN
39
CMS User Federations
populates user dbs link system dbs copies or link
run-data db
Empty user dbs system dbs all run-data db
User Boot
Offline Boot
User FD
Offline FD
Clone FD
Run1
Us1
Us1
Run2
Conf
Us2
Us2
Run3
Run1
RunCat
RunN
40
Atlas
  • Used Objectivity in several test-bed applications
  • HCAL test-beam
  • ATLFAST
  • 1TB Milestone (HPSS used as MSS)
  • Plan to use Objectivity in future test-beams
    and MonteCarlo reconstruction
  • The application framework will provide a
    database independent interface

41
ALICE
  • Simulation and reconstruction framework fully
    integrated in ROOT
  • Used in TestBeams
  • (actually a real Heavy Ion experiment)
  • Mockup Data Challenge 7 TB in seven days
  • MonteCarlo simulation and reconstruction
  • Use HPSS and/or CASTOR for file management

42
ALICE DC II
NA 57 data source
Computer Centre
9 PowerPC AIX
LDC
LDC
5 MB/s
LDC
Intel/Linux PC Cluster 10/15 nodes
LDC
LDC
LDC
LDC
Switch
LDC
LDC
GB eth
GDCEvent Builder
pipe
Switch
ROOTObjectifier
Intel/PC Linux PowerPC /AIX Sun
Switch
LDC
LDC
LDC
10MB/s GB eth
LDC
LDC
LDC
LDC
LDC
10 MB/s
HPSS
CASTOR
??
LDC
ALICE DAQ data source
DATEGDCLDC
43
LHCb
  • Do not want to limit to one persistency
    technology
  • Speed, when you need speed
  • Functionality, when you need functionality
  • Ease migration to upcoming (superior)
    technologies
  • Independence
  • Well defined interface to persistency
    technologies
  • Interface abstract technology independent API
  • Example ODBC for relational DBs

44
LHCb
  • LHCb application framework (GAUDI) is independent
    from persistent technology
  • Manage its own application caches (data services)
    specialized in
  • event data, detector data, statistical data
  • Provides abstract interface for user provided
    converters

45
BaBar
  • Taking data since May
  • Use Objectivity for all kind of data
  • many home made tools to manage the database
  • Complete decoupling between transient objects
    (seen by end user) and their persistent
    representations
  • No schema evolution (explicit renaming of
    classes)
  • Starts using multiple-federations to decouple
    running environments

46
STAR
  • Hybrid solution
  • ROOT for event file
  • MySQL for event catalog and environmental data
  • MySQL under test for event tags as well
  • HPSS (through Grand Challenge) for tertiary
    storage management

47
(No Transcript)
48
Fermi RUNII (CDF DØ)
  • Sequential access model based on RUNI experience
  • focus on efficient data access from hierarchical
    storage
  • clustering optimized to largest data volume
    access pattern
  • Use
  • ROOT (CDF), EVpack (modified DSPACK) (DØ) for
    event files (MSQL and Oracle8 evaluated by DØ)
  • just I/O back-ends to EDM and DØOM
  • SAM for event catalog and file management
  • Oracle8 supporting database

49
Data Organization
User and physics group (derived) data
Metadata
Event Information Tiers
Warm Cache
Physical Clustering
From Oct 1997 Review - Lee Lueking
50
Data Access
Mass Storage
Pipeline
Consumers
Metadata
Thumbnail
Freight Train
Pick Event
User File
Data flow
Group of Users
Disk Storage
File
Tape Storage
Single User
Pipeline Name
Event
Metadata
Lee Lueking - October 1997
51
Season IV - aggregate bandwidths, summed from
spreadsheet
52
  • (non-technical)
  • Risk
  • Analysis

53
Toward 2001 Milestone
  • If the ODBMS industry flourishes it is very
    likely that by 2005 CMS will be able to obtain
    products, embodying thousands of man-years of
    work, that are well matched to its worldwide data
    management and access needs. The cost of such
    products to CMS will be equivalent to at most a
    few man-years. We believe that the ODBMS industry
    and the corresponding market are likely to
    flourish. However, if this is not the case, a
    decision will have to be made in approximately
    the year 2000 to devote some tens of man-years of
    effort to the development of a less satisfactory
    data management system for the LHC experiments.
  • (CMS Computing Technical Proposal, section 3.2,
    page 22)

54
Commercial vs GPL
  • Robust, tested, maintained, well documented (is
    stable)
  • Response to upgrade requests is slow
  • They can not jeopardize deployed application
  • priority given to short term profit
  • difficult to understand internal details (no
    source)
  • but in principle documentation should be enough
  • can go out of business
  • Good enough for physicists
  • Require internal certification
  • Response to upgrade even too fast
  • old users usually ready to jump on new features
  • priority given to challenging requests...
  • Open source
  • often you need it.
  • Author could get bored

55
ODBMS
  • Objectivity seems to satisfy HEP technical
    requirements
  • Needs upgrade for
  • VLDB support
  • Mass storage interface
  • remote access and data distribution
  • More than a DBMS is a DB access layer
  • requires to be integrated (or interfaced) to
    application frameworks and to administration
    tools
  • It is the only real ODBMS survivor on the market
  • how long it will last?

56
ROOT
  • A physics analysis framework with I/O support
  • Classified also as a rapid-prototyping tool
    (B.Meyer)
  • Not sufficient for the management of large data
    volumes (LHC major requirement)
  • an external DBMS is required to manage Meta-Data
  • Limited experience so far (as POM in production)
  • Many motivated users actively supported by the
    authors
  • Requires major architectural changes to make it
    modular
  • for those who do not want to use it as a framework

57
Yet Another POM
  • Prototype required to understand problems and
    estimate effort
  • Usable as test-bed before asking upgrades to a
    commercial partner
  • Usable as light-pom?
  • no transaction, no journaling, no schema, just
    data
  • (Objectivity can be used in this mode, just make
    sure to write-protect input files!!!)

58
Personal Comments
  • Event Data
  • object modeling and direct navigation OK
  • DBMS tools (query processor, smart-association,
    index, names, versions) more a burden than an
    help
  • Event Catalog, Environmental data, Detector
    description
  • fit better standard (O)DBMS practices and tools
  • Statistical data
  • simple I/O is not enough, need direct relations
    with event catalog and event data
  • Relation models do not suite HEP applications

59
Personal Comments
  • My personal LEP experience brought me to the
    conclusion that a multitude of persistency
    solution are difficult to manage and integrate
    properly.
  • In particular a file-based event-store (with
    filenames encoding metadata) does not scale.
  • My current (limited) experience tends to convince
    me more and more that a coherent approach to
    persistency is the only solution for LHC given
    the resource constrains we have

60
Personal Comments
  • Applications require to be independent of
    underling technologies
  • Migration to a new technology should imply a
    finite effort
  • Market survey 0.5PY
  • Learning 1 year
  • Implementation 1PY
  • User Migration 0?
  • (P stands for Person not Peta!)

61
Questions
  • One size fit all?
  • One coherent solution
  • several tools optimized for each problem
  • how integration goes (transaction synchronization
    for instance)?
  • How integrated should a POM/DBMS be in the
    application framework?
  • Is hierarchical storage incompatible with
    transparent object navigation?
  • Optimization of distributed resources needs
    preemptive localization of data to be accessed

62
Questions
  • Do HEP applications require 4GL query processors?
  • Is not a (multiple) OO language binding enough?
  • Is Objectivity a possible choice?
  • From technical, political, managerial sides
  • Does ROOT RDBMS scale to LHC data volume?
  • RD45 was initiated with the idea that a
    file-based event store metadata catalog (LEP
    like) would not be sufficient...
  • What should be the objective of an alternative
    POM prototype?

63
Questions
  • Did we forget something?
Write a Comment
User Comments (0)
About PowerShow.com