HEP Needs Current and Future John Harvey CERN PHSFT - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

HEP Needs Current and Future John Harvey CERN PHSFT

Description:

standards will help web services resource framework ... to consumers hungry for those resources (potential of resource discovery) ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 31
Provided by: harv1
Category:

less

Transcript and Presenter's Notes

Title: HEP Needs Current and Future John Harvey CERN PHSFT


1
HEP Needs - Current and FutureJohn
Harvey(CERN/ PH-SFT)
  • GGF10 / PNPA RG Workshop
  • Humboldt University and Zuse Institute Berlin
  • March 13th, 2004

Very much from a Large Hadron Collider (LHC)
perspective
2
Acknowledgements
  • I am grateful for discussions with
  • BaBar Jacek Becla ,Igor Gaponenko
  • EDG Frank Harris, Bob Jones, Erwin Laure
  • LHC Federico Carminati, Vincenzo Innocente, Pere
    Mato, David Quarrie, Ian Willers

3
Talk Outline
  • Physics signatures and rates
  • Data handling processes and datasets
  • Event data, catalogues and collections,
    conditions data
  • Common Persistency Framework (POOL)
  • Computing Model
  • Data Challenges
  • Concluding Remarks

4
The Atlas Detector
  • The ATLAS collaboration is
  • 2000 physicists from ..
  • 150 universities and labs
  • from 34 countries
  • distributed resources
  • remote development
  • The ATLAS detector is
  • 26m long,
  • stands 20m high,
  • weighs 7000 tons
  • has 200 million read-out channels

5
Atlas Physics Signatures and Event Rates
  • LHC pp collisions at ?s 14 TeV
  • Bunches cross at 40 MHz
  • sinelastic 80 mb
  • at high L gtgt 1 pp collision/crossing
  • 109 collisions per second
  • Study different physics channels each with their
    own signature e.g.
  • Higgs
  • Supersymmetry
  • B physics
  • Interesting physics events are buried in
    backgrounds of uninteresting physics events ( 1
    in 105 - 109 )

6
HEP Processing stages and datasets
event filter (selection reconstruction)
detector
processed data
Event Summary Data (ESD)
raw data
batch physics analysis
event reconstruction
Analysis Object Data (AOD) (extracted by physics
topic)
event simulation
individual physics analysis
7
Data Hierarchy
RAW
Triggered events recorded by DAQ
Detector digitisation 109 events/yr 2 MB 2
PB/yr
2 MB/event
ESD
Reconstructed information
Pattern recognition information Clusters, track
candidates
100 kB/event
Physical information Transverse momentum,
Association of particles, jets, (best) id of
particles,
AOD
Analysis information
10 kB/event
TAG
Relevant information for fast event selection
Classification information
1 kB/event
8
CERN Centre Capacity Requirements for all
expts.(made July 2003)
9
Event Data
  • Complex data models
  • 500 structure types
  • References to describe relationships between
    event objects
  • unidirectional
  • Need to support transparent navigation
  • Need ultimate resolution on selected events
  • need to run specialised algorithms
  • work interactively
  • Not affordable if uncontrolled

10
HEP Metadata - Event Collections
Bookkeeping
11
Detector Conditions Data
  • Reflects changes in state of the detector with
    time
  • Event Data cannot be reconstructed or analyzed
    without it
  • Versioning
  • Tagging
  • Ability to extract slices of data required to run
    with job
  • Long life-time

Version
Tag1 definition
Time
12
Conditions Database
Remote Site
Remote Site
CalibrationApplication
Worker Node
CalibrationApplication
Reconstruction/Analysis Application
Algorithm
Algorithm
Cond DB
Mgr Tools
slice
Mgmt Tools
Cond DB
Cond DB
Online DB
replica
master copy
  • World-wide access use closest available copy
  • Need to accept updates and synchronise loosely
    coupled copies
  • Need tools for data management, data browsing,
    replication, slicing,
  • Use supported database features (transactions,
    replication, querying,)

13
LHC Data Management Requirements
  • Increasing focus on maintainability and change
    management for core software due to long LHC
    lifetime
  • anticipate changes in technology
  • adapt quickly to changes in environment physics
    focus
  • Common solutions will simplify considerably the
    deployment and operation of data management in
    centres distributed worldwide
  • Common persistency framework (POOL project)
  • Strong involvement of experiments from the
    beginning required to provide requirements
  • some experimentalists participate directly in
    POOL
  • some work with software providers on integration
    in experiment frameworks

14
Common Persistency Framework (POOL)
  • Provides persistency for C transient objects
  • Supports transparent navigation between objects
    across file and technology boundaries
  • without requiring user to explicitly open files
    or database connections
  • Follows a technology neutral approach
  • Abstract component C interfaces
  • Insulates experiment software from concrete
    implementations and technologies
  • Hybrid technology approach combining
  • Streaming technology for complex C objects
    (event data)
  • event data - typically write once, read many
    (concurrent access simple)
  • Transaction-safe Relational Database (RDBMS)
    services,
  • for catalogs, collections and other metadata
  • Allows data to be stored in a distributed and
    grid-enabled fashion
  • Integrated with an external File Catalog to keep
    track of the file physical location, allowing
    files to be moved or replicated

15
POOL Component Breakdown
16
A Multi-Tier Computing Model
Tier 0 (Experiment Host lab)
Tier 1 (Main Regional Centres)
Tier2
Tier 3
Desktop
User View
Manager View
17
CMS Data
Challenge DC04
DC04 Calibration challenge
T1
Calibration Jobs
Calibration sample
T2
TAG/AOD (replica)
T2
DC04 Analysis challenge
MASTER Conditions DB
T0
T1
Fake DAQ (CERN)
25Hz 50MB/s 4 TB/day
DC04 T0 challenge
CERN disk pool 40 TB
Recon- struction
Event streams
Higgs DST
T2
25Hz 1.5MB/evt RawESD
TAG/AOD (10 kB/evt)
TAG/AOD (replica)
T2
50M events 75 Tbyte
Disk cache
CERN Tape archive
SUSY Background DST
CERN Tape archive
18
Experience in Data Challenges
  • All LHC experiments have well developed
    distributed batch production systems, each
    running on 30 - 40 sites with typically gt 70
    production outside CERN
  • In past 2 years considerable experience has been
    gained with middleware coming from US and
    European grid projects, and from home-grown
    developments in experiments
  • Consequently we have a much better grasp of the
    problems.
  • Grid sites responsible for 5-10 of the total
    production
  • Productions on grid sites ran with typically
    80-90 efficiency in 2003

19
(No Transcript)
20
Experience in Data Challenges
  • Importance of interoperability of different Grids
  • essential for HEP applications to run worldwide
  • standards will help web services resource
    framework
  • Site configuration and software installation were
    source of some of biggest problems
  • sites must adhere to standards and work as
    expected
  • Site policies need to be flexible
  • anticipate paradigm-shift from well organised
    productions to interactive physics analysis
  • physicist may need to run his software
    environment on worker nodes anywhere on grid
  • HEP is providing a pilot grid application for
    grid development world-wide

21
Distributed Analysis the real challenge
  • Analysis will be performed with a mix of
    official experiment software and private user
    code
  • How can we make sure that the user code can
    execute and provide a correct result wherever it
    lands?
  • Input datasets not necessarily known a-priori
  • Possibly very sparse data access pattern when
    only a very few events match the query
  • Large number of people submitting jobs
    concurrently and in an uncoordinated fashion
    resulting into a chaotic workload
  • Wide range of user expertise
  • Need for interactivity - requirements on system
    response time rather than throughput
  • Ability to suspend an interactive session and
    resume it later, in a different location
  • Need a continuous dialogue between developers and
    users

22
Concluding Remarks
  • HEP applications are characterised by
  • amounts and complexity of the data
  • large size and geographically dispersed nature of
    the collaborations
  • Grids offer many advantages
  • to providers centres supplying resources in a
    multi-science environment in a flexible, secure
    and managed way
  • to consumers hungry for those resources
    (potential of resource discovery)
  • consistent configurations (system, applications)
  • transparent access to remote data
  • distributed analysis is where potential of the
    grid will be best exploited
  • HEP is providing a pilot grid application for
    grid development world-wide
  • technologists and experimentalists working
    closely together
  • incremental and iterative development model
    (frequent releases with rapid feedback), a good
    way to ensure requirements and priorities are met

23
backup slides
24
LHC Computing Grid Project Organisation
LCG/SC2 Software and Computing
Committee Requirements, Monitoring
LCG/PEB Project Execution Board Management of the
project
CERN Fabric
Applications
Grid Applications Group (GAG)
Grid Technology Middleware
Grid Deployment operate a single service
  • Phases
  • 1 RD 2002-2005
  • 2 Installation commissioning of the initial LHC
    service
  • Participants
  • experiments
  • computer centres
  • core developers for applications and grid

EGEE
25
LCG Service Time-line
experiments
computing service
2003 2004 2005 2006 2007
Testing, with simulated event productions
LCG-1service
LCG-2 - upgraded middleware
Data Challenges Prototype tests
Second generation middleware prototyping,
development
LCG-3 2nd generation middleware
Validation of computing models
LCG Phase 2 service acquisition, installation,
commissioning
Phase 2 service in production
Experiment setup preparation
first data
TDR technical design report
26
The LHC Detectors
27
LHCb B Physics Signatures and Event Rates
  • Choose to run at 2 1032 cm-2s-1? dominated
    by single interactions
  • Makes it simpler to identify B decays
  • Enormous production rate at LHCb 1012 bb
    pairs per year
  • At high energy ? more primary tracks, tagging
    more difficult
  • Expect 200,000 reconstructed B0 ? J/y KS
    events/year
  • Expect 26,000 reconstructed B0 ? pp-
    events/year
  • Rare decays B (Bs ? mm-) 4 10-9 expect 16
    events /year

28
Data Organisation
29
LHCb Production Tools
Production preparation
Production preparation
Production preparation
GANGA interface
Central
Application
Workflow
Production
Application
Workflow
Production
Application
Workflow
Production
Job submission
Data selection
Application configuration
Services
Application
Workflow
Production
Application
Workflow
Production
Application
Workflow
Production
Application
Workflow
Production
Production DB
Production DB
Production DB
Production DB
editor
editor
editor
packager
editor
packager
editor
packager
editor
packager
editor
editor
packager
editor
editor
packager
editor
editor
packager
editor
editor
Central Services
Edit
Edit
Edit
Instantiate
Instantiate
Instantiate
Create
Create
Create
Application
Job monitoring
workflow
workflow
workflow
application
application
application
Application
Application
tar file
tar file
tar file
packager
packager
packager
Production
Production
Production
Production
Job
Job
Job
Job
Getting results
Service
Service
Service
Service
XML
XML
Production manager
XML
Production manager
XML
Production manager
Job
Job
Job
Job
Monitoring
Monitoring
Monitoring
Monitoring
Monitoring
Monitoring
Monitoring
Monitoring
Production resources
Production resources
request
request
request
request
DB
Service
DB
Service
DB
DB
Service
Service
Job
Job
Data management
Job
Job
Site A
Site A
status
status
status
status
Agent A
Agent A
Agent A
Agent A
Bookkeeping
Bookkeeping
Bookkeeping
Bookkeeping
Bookkeeping
Bookkeeping
Bookkeeping
Bookkeeping
Meta
Meta
Meta
Meta
DB
Service
DB
Service
DB
Service
DB
Service
XML
XML
XML
XML
Replica Manager
Site B
Site B
Agent B
Agent B
Agent B
Agent B
Dataset
Dataset
Dataset
Dataset


replica
replica
replica
replica
Storage elements
Central Storage
File catalog
Site n
Site n
Castor MSS
Castor MSS
Agent n
Agent n
Agent n
Agent n
CERN
CERN
30
Distributed MC Production in LHCb Today
Transfer data to mass-store at CERN (CASTOR)
Submit jobs remotely via Web
Execute on farm
Update bookkeeping database (Oracle at CERN)
Data Quality Check on data stored at CERN
Monitor performance of farm via Web
Write a Comment
User Comments (0)
About PowerShow.com