Title: HEP Needs Current and Future John Harvey CERN PHSFT
1HEP Needs - Current and FutureJohn
Harvey(CERN/ PH-SFT)
- GGF10 / PNPA RG Workshop
- Humboldt University and Zuse Institute Berlin
- March 13th, 2004
Very much from a Large Hadron Collider (LHC)
perspective
2Acknowledgements
- I am grateful for discussions with
- BaBar Jacek Becla ,Igor Gaponenko
- EDG Frank Harris, Bob Jones, Erwin Laure
- LHC Federico Carminati, Vincenzo Innocente, Pere
Mato, David Quarrie, Ian Willers
3Talk Outline
- Physics signatures and rates
- Data handling processes and datasets
- Event data, catalogues and collections,
conditions data - Common Persistency Framework (POOL)
- Computing Model
- Data Challenges
- Concluding Remarks
4The Atlas Detector
- The ATLAS collaboration is
- 2000 physicists from ..
- 150 universities and labs
- from 34 countries
- distributed resources
- remote development
- The ATLAS detector is
- 26m long,
- stands 20m high,
- weighs 7000 tons
- has 200 million read-out channels
5Atlas Physics Signatures and Event Rates
- LHC pp collisions at ?s 14 TeV
- Bunches cross at 40 MHz
- sinelastic 80 mb
- at high L gtgt 1 pp collision/crossing
- 109 collisions per second
- Study different physics channels each with their
own signature e.g. - Higgs
- Supersymmetry
- B physics
- Interesting physics events are buried in
backgrounds of uninteresting physics events ( 1
in 105 - 109 )
6 HEP Processing stages and datasets
event filter (selection reconstruction)
detector
processed data
Event Summary Data (ESD)
raw data
batch physics analysis
event reconstruction
Analysis Object Data (AOD) (extracted by physics
topic)
event simulation
individual physics analysis
7Data Hierarchy
RAW
Triggered events recorded by DAQ
Detector digitisation 109 events/yr 2 MB 2
PB/yr
2 MB/event
ESD
Reconstructed information
Pattern recognition information Clusters, track
candidates
100 kB/event
Physical information Transverse momentum,
Association of particles, jets, (best) id of
particles,
AOD
Analysis information
10 kB/event
TAG
Relevant information for fast event selection
Classification information
1 kB/event
8CERN Centre Capacity Requirements for all
expts.(made July 2003)
9Event Data
- Complex data models
- 500 structure types
- References to describe relationships between
event objects - unidirectional
- Need to support transparent navigation
- Need ultimate resolution on selected events
- need to run specialised algorithms
- work interactively
- Not affordable if uncontrolled
10HEP Metadata - Event Collections
Bookkeeping
11Detector Conditions Data
- Reflects changes in state of the detector with
time - Event Data cannot be reconstructed or analyzed
without it - Versioning
- Tagging
- Ability to extract slices of data required to run
with job - Long life-time
Version
Tag1 definition
Time
12Conditions Database
Remote Site
Remote Site
CalibrationApplication
Worker Node
CalibrationApplication
Reconstruction/Analysis Application
Algorithm
Algorithm
Cond DB
Mgr Tools
slice
Mgmt Tools
Cond DB
Cond DB
Online DB
replica
master copy
- World-wide access use closest available copy
- Need to accept updates and synchronise loosely
coupled copies - Need tools for data management, data browsing,
replication, slicing, - Use supported database features (transactions,
replication, querying,)
13LHC Data Management Requirements
- Increasing focus on maintainability and change
management for core software due to long LHC
lifetime - anticipate changes in technology
- adapt quickly to changes in environment physics
focus - Common solutions will simplify considerably the
deployment and operation of data management in
centres distributed worldwide - Common persistency framework (POOL project)
- Strong involvement of experiments from the
beginning required to provide requirements - some experimentalists participate directly in
POOL - some work with software providers on integration
in experiment frameworks
14Common Persistency Framework (POOL)
- Provides persistency for C transient objects
- Supports transparent navigation between objects
across file and technology boundaries - without requiring user to explicitly open files
or database connections - Follows a technology neutral approach
- Abstract component C interfaces
- Insulates experiment software from concrete
implementations and technologies - Hybrid technology approach combining
- Streaming technology for complex C objects
(event data) - event data - typically write once, read many
(concurrent access simple) - Transaction-safe Relational Database (RDBMS)
services, - for catalogs, collections and other metadata
- Allows data to be stored in a distributed and
grid-enabled fashion - Integrated with an external File Catalog to keep
track of the file physical location, allowing
files to be moved or replicated
15POOL Component Breakdown
16A Multi-Tier Computing Model
Tier 0 (Experiment Host lab)
Tier 1 (Main Regional Centres)
Tier2
Tier 3
Desktop
User View
Manager View
17 CMS Data
Challenge DC04
DC04 Calibration challenge
T1
Calibration Jobs
Calibration sample
T2
TAG/AOD (replica)
T2
DC04 Analysis challenge
MASTER Conditions DB
T0
T1
Fake DAQ (CERN)
25Hz 50MB/s 4 TB/day
DC04 T0 challenge
CERN disk pool 40 TB
Recon- struction
Event streams
Higgs DST
T2
25Hz 1.5MB/evt RawESD
TAG/AOD (10 kB/evt)
TAG/AOD (replica)
T2
50M events 75 Tbyte
Disk cache
CERN Tape archive
SUSY Background DST
CERN Tape archive
18Experience in Data Challenges
- All LHC experiments have well developed
distributed batch production systems, each
running on 30 - 40 sites with typically gt 70
production outside CERN - In past 2 years considerable experience has been
gained with middleware coming from US and
European grid projects, and from home-grown
developments in experiments - Consequently we have a much better grasp of the
problems. - Grid sites responsible for 5-10 of the total
production - Productions on grid sites ran with typically
80-90 efficiency in 2003
19(No Transcript)
20Experience in Data Challenges
- Importance of interoperability of different Grids
- essential for HEP applications to run worldwide
- standards will help web services resource
framework - Site configuration and software installation were
source of some of biggest problems - sites must adhere to standards and work as
expected - Site policies need to be flexible
- anticipate paradigm-shift from well organised
productions to interactive physics analysis - physicist may need to run his software
environment on worker nodes anywhere on grid - HEP is providing a pilot grid application for
grid development world-wide
21Distributed Analysis the real challenge
- Analysis will be performed with a mix of
official experiment software and private user
code - How can we make sure that the user code can
execute and provide a correct result wherever it
lands? - Input datasets not necessarily known a-priori
- Possibly very sparse data access pattern when
only a very few events match the query - Large number of people submitting jobs
concurrently and in an uncoordinated fashion
resulting into a chaotic workload - Wide range of user expertise
- Need for interactivity - requirements on system
response time rather than throughput - Ability to suspend an interactive session and
resume it later, in a different location - Need a continuous dialogue between developers and
users
22Concluding Remarks
- HEP applications are characterised by
- amounts and complexity of the data
- large size and geographically dispersed nature of
the collaborations - Grids offer many advantages
- to providers centres supplying resources in a
multi-science environment in a flexible, secure
and managed way - to consumers hungry for those resources
(potential of resource discovery) - consistent configurations (system, applications)
- transparent access to remote data
- distributed analysis is where potential of the
grid will be best exploited - HEP is providing a pilot grid application for
grid development world-wide - technologists and experimentalists working
closely together - incremental and iterative development model
(frequent releases with rapid feedback), a good
way to ensure requirements and priorities are met
23backup slides
24LHC Computing Grid Project Organisation
LCG/SC2 Software and Computing
Committee Requirements, Monitoring
LCG/PEB Project Execution Board Management of the
project
CERN Fabric
Applications
Grid Applications Group (GAG)
Grid Technology Middleware
Grid Deployment operate a single service
- Phases
- 1 RD 2002-2005
- 2 Installation commissioning of the initial LHC
service - Participants
- experiments
- computer centres
- core developers for applications and grid
EGEE
25LCG Service Time-line
experiments
computing service
2003 2004 2005 2006 2007
Testing, with simulated event productions
LCG-1service
LCG-2 - upgraded middleware
Data Challenges Prototype tests
Second generation middleware prototyping,
development
LCG-3 2nd generation middleware
Validation of computing models
LCG Phase 2 service acquisition, installation,
commissioning
Phase 2 service in production
Experiment setup preparation
first data
TDR technical design report
26The LHC Detectors
27LHCb B Physics Signatures and Event Rates
- Choose to run at 2 1032 cm-2s-1? dominated
by single interactions - Makes it simpler to identify B decays
- Enormous production rate at LHCb 1012 bb
pairs per year - At high energy ? more primary tracks, tagging
more difficult - Expect 200,000 reconstructed B0 ? J/y KS
events/year - Expect 26,000 reconstructed B0 ? pp-
events/year - Rare decays B (Bs ? mm-) 4 10-9 expect 16
events /year
28Data Organisation
29LHCb Production Tools
Production preparation
Production preparation
Production preparation
GANGA interface
Central
Application
Workflow
Production
Application
Workflow
Production
Application
Workflow
Production
Job submission
Data selection
Application configuration
Services
Application
Workflow
Production
Application
Workflow
Production
Application
Workflow
Production
Application
Workflow
Production
Production DB
Production DB
Production DB
Production DB
editor
editor
editor
packager
editor
packager
editor
packager
editor
packager
editor
editor
packager
editor
editor
packager
editor
editor
packager
editor
editor
Central Services
Edit
Edit
Edit
Instantiate
Instantiate
Instantiate
Create
Create
Create
Application
Job monitoring
workflow
workflow
workflow
application
application
application
Application
Application
tar file
tar file
tar file
packager
packager
packager
Production
Production
Production
Production
Job
Job
Job
Job
Getting results
Service
Service
Service
Service
XML
XML
Production manager
XML
Production manager
XML
Production manager
Job
Job
Job
Job
Monitoring
Monitoring
Monitoring
Monitoring
Monitoring
Monitoring
Monitoring
Monitoring
Production resources
Production resources
request
request
request
request
DB
Service
DB
Service
DB
DB
Service
Service
Job
Job
Data management
Job
Job
Site A
Site A
status
status
status
status
Agent A
Agent A
Agent A
Agent A
Bookkeeping
Bookkeeping
Bookkeeping
Bookkeeping
Bookkeeping
Bookkeeping
Bookkeeping
Bookkeeping
Meta
Meta
Meta
Meta
DB
Service
DB
Service
DB
Service
DB
Service
XML
XML
XML
XML
Replica Manager
Site B
Site B
Agent B
Agent B
Agent B
Agent B
Dataset
Dataset
Dataset
Dataset
replica
replica
replica
replica
Storage elements
Central Storage
File catalog
Site n
Site n
Castor MSS
Castor MSS
Agent n
Agent n
Agent n
Agent n
CERN
CERN
30Distributed MC Production in LHCb Today
Transfer data to mass-store at CERN (CASTOR)
Submit jobs remotely via Web
Execute on farm
Update bookkeeping database (Oracle at CERN)
Data Quality Check on data stored at CERN
Monitor performance of farm via Web