HEP Needs Current and Future John Harvey CERN PHSFT - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

HEP Needs Current and Future John Harvey CERN PHSFT

Description:

standards will help web services resource framework ... to consumers hungry for those resources (potential of resource discovery) ... – PowerPoint PPT presentation

Number of Views:119

Avg rating:3.0/5.0

Slides: 31

Provided by: harv1

Category:

more less

Transcript and Presenter's Notes

Title: HEP Needs Current and Future John Harvey CERN PHSFT

1
HEP Needs - Current and FutureJohn
Harvey(CERN/ PH-SFT)

GGF10 / PNPA RG Workshop
Humboldt University and Zuse Institute Berlin
March 13th, 2004

Very much from a Large Hadron Collider (LHC)
perspective
2
Acknowledgements

I am grateful for discussions with
BaBar Jacek Becla ,Igor Gaponenko
EDG Frank Harris, Bob Jones, Erwin Laure
LHC Federico Carminati, Vincenzo Innocente, Pere
Mato, David Quarrie, Ian Willers

3
Talk Outline

Physics signatures and rates
Data handling processes and datasets
Event data, catalogues and collections,
conditions data
Common Persistency Framework (POOL)
Computing Model
Data Challenges
Concluding Remarks

4
The Atlas Detector

The ATLAS collaboration is
2000 physicists from ..
150 universities and labs
from 34 countries
distributed resources
remote development
The ATLAS detector is
26m long,
stands 20m high,
weighs 7000 tons
has 200 million read-out channels

5
Atlas Physics Signatures and Event Rates

LHC pp collisions at ?s 14 TeV
Bunches cross at 40 MHz
sinelastic 80 mb
at high L gtgt 1 pp collision/crossing
109 collisions per second
Study different physics channels each with their
own signature e.g.
Higgs
Supersymmetry
B physics
Interesting physics events are buried in
backgrounds of uninteresting physics events ( 1
in 105 - 109 )

6
HEP Processing stages and datasets
event filter (selection reconstruction)
detector
processed data
Event Summary Data (ESD)
raw data
batch physics analysis
event reconstruction
Analysis Object Data (AOD) (extracted by physics
topic)
event simulation
individual physics analysis
7
Data Hierarchy
RAW
Triggered events recorded by DAQ
Detector digitisation 109 events/yr 2 MB 2
PB/yr
2 MB/event
ESD
Reconstructed information
Pattern recognition information Clusters, track
candidates
100 kB/event
Physical information Transverse momentum,
Association of particles, jets, (best) id of
particles,
AOD
Analysis information
10 kB/event
TAG
Relevant information for fast event selection
Classification information
1 kB/event
8
CERN Centre Capacity Requirements for all
expts.(made July 2003)
9
Event Data

Complex data models
500 structure types
References to describe relationships between
event objects
unidirectional
Need to support transparent navigation
Need ultimate resolution on selected events
need to run specialised algorithms
work interactively
Not affordable if uncontrolled

10
HEP Metadata - Event Collections
Bookkeeping
11
Detector Conditions Data

Reflects changes in state of the detector with
time
Event Data cannot be reconstructed or analyzed
without it
Versioning
Tagging
Ability to extract slices of data required to run
with job
Long life-time

Version
Tag1 definition
Time
12
Conditions Database
Remote Site
Remote Site
CalibrationApplication
Worker Node
CalibrationApplication
Reconstruction/Analysis Application
Algorithm
Algorithm
Cond DB
Mgr Tools
slice
Mgmt Tools
Cond DB
Cond DB
Online DB
replica
master copy

World-wide access use closest available copy
Need to accept updates and synchronise loosely
coupled copies
Need tools for data management, data browsing,
replication, slicing,
Use supported database features (transactions,
replication, querying,)

13
LHC Data Management Requirements

Increasing focus on maintainability and change
management for core software due to long LHC
lifetime
anticipate changes in technology
adapt quickly to changes in environment physics
focus
Common solutions will simplify considerably the
deployment and operation of data management in
centres distributed worldwide
Common persistency framework (POOL project)
Strong involvement of experiments from the
beginning required to provide requirements
some experimentalists participate directly in
POOL
some work with software providers on integration
in experiment frameworks

14
Common Persistency Framework (POOL)

Provides persistency for C transient objects
Supports transparent navigation between objects
across file and technology boundaries
without requiring user to explicitly open files
or database connections
Follows a technology neutral approach
Abstract component C interfaces
Insulates experiment software from concrete
implementations and technologies
Hybrid technology approach combining
Streaming technology for complex C objects
(event data)
event data - typically write once, read many
(concurrent access simple)
Transaction-safe Relational Database (RDBMS)
services,
for catalogs, collections and other metadata
Allows data to be stored in a distributed and
grid-enabled fashion
Integrated with an external File Catalog to keep
track of the file physical location, allowing
files to be moved or replicated

15
POOL Component Breakdown
16
A Multi-Tier Computing Model
Tier 0 (Experiment Host lab)
Tier 1 (Main Regional Centres)
Tier2
Tier 3
Desktop
User View
Manager View
17
CMS Data
Challenge DC04
DC04 Calibration challenge
T1
Calibration Jobs
Calibration sample
T2
TAG/AOD (replica)
T2
DC04 Analysis challenge
MASTER Conditions DB
T0
T1
Fake DAQ (CERN)
25Hz 50MB/s 4 TB/day
DC04 T0 challenge
CERN disk pool 40 TB
Recon- struction
Event streams
Higgs DST
T2
25Hz 1.5MB/evt RawESD
TAG/AOD (10 kB/evt)
TAG/AOD (replica)
T2
50M events 75 Tbyte
Disk cache
CERN Tape archive
SUSY Background DST
CERN Tape archive
18
Experience in Data Challenges

All LHC experiments have well developed
distributed batch production systems, each
running on 30 - 40 sites with typically gt 70
production outside CERN
In past 2 years considerable experience has been
gained with middleware coming from US and
European grid projects, and from home-grown
developments in experiments
Consequently we have a much better grasp of the
problems.
Grid sites responsible for 5-10 of the total
production
Productions on grid sites ran with typically
80-90 efficiency in 2003

19
(No Transcript)
20
Experience in Data Challenges

Importance of interoperability of different Grids
essential for HEP applications to run worldwide
standards will help web services resource
framework
Site configuration and software installation were
source of some of biggest problems
sites must adhere to standards and work as
expected
Site policies need to be flexible
anticipate paradigm-shift from well organised
productions to interactive physics analysis
physicist may need to run his software
environment on worker nodes anywhere on grid
HEP is providing a pilot grid application for
grid development world-wide

21
Distributed Analysis the real challenge

Analysis will be performed with a mix of
official experiment software and private user
code
How can we make sure that the user code can
execute and provide a correct result wherever it
lands?
Input datasets not necessarily known a-priori
Possibly very sparse data access pattern when
only a very few events match the query
Large number of people submitting jobs
concurrently and in an uncoordinated fashion
resulting into a chaotic workload
Wide range of user expertise
Need for interactivity - requirements on system
response time rather than throughput
Ability to suspend an interactive session and
resume it later, in a different location
Need a continuous dialogue between developers and
users

22
Concluding Remarks

HEP applications are characterised by
amounts and complexity of the data
large size and geographically dispersed nature of
the collaborations
Grids offer many advantages
to providers centres supplying resources in a
multi-science environment in a flexible, secure
and managed way
to consumers hungry for those resources
(potential of resource discovery)
consistent configurations (system, applications)
transparent access to remote data
distributed analysis is where potential of the
grid will be best exploited
HEP is providing a pilot grid application for
grid development world-wide
technologists and experimentalists working
closely together
incremental and iterative development model
(frequent releases with rapid feedback), a good
way to ensure requirements and priorities are met

23
backup slides
24
LHC Computing Grid Project Organisation
LCG/SC2 Software and Computing
Committee Requirements, Monitoring
LCG/PEB Project Execution Board Management of the
project
CERN Fabric
Applications
Grid Applications Group (GAG)
Grid Technology Middleware
Grid Deployment operate a single service

Phases
1 RD 2002-2005
2 Installation commissioning of the initial LHC
service
Participants
experiments
computer centres
core developers for applications and grid

EGEE
25
LCG Service Time-line
experiments
computing service
2003 2004 2005 2006 2007
Testing, with simulated event productions
LCG-1service
LCG-2 - upgraded middleware
Data Challenges Prototype tests
Second generation middleware prototyping,
development
LCG-3 2nd generation middleware
Validation of computing models
LCG Phase 2 service acquisition, installation,
commissioning
Phase 2 service in production
Experiment setup preparation
first data
TDR technical design report
26
The LHC Detectors
27
LHCb B Physics Signatures and Event Rates

Choose to run at 2 1032 cm-2s-1? dominated
by single interactions
Makes it simpler to identify B decays
Enormous production rate at LHCb 1012 bb
pairs per year
At high energy ? more primary tracks, tagging
more difficult
Expect 200,000 reconstructed B0 ? J/y KS
events/year
Expect 26,000 reconstructed B0 ? pp-
events/year
Rare decays B (Bs ? mm-) 4 10-9 expect 16
events /year

28
Data Organisation
29
LHCb Production Tools
Production preparation
Production preparation
Production preparation
GANGA interface
Central
Application
Workflow
Production
Application
Workflow
Production
Application
Workflow
Production
Job submission
Data selection
Application configuration
Services
Application
Workflow
Production
Application
Workflow
Production
Application
Workflow
Production
Application
Workflow
Production
Production DB
Production DB
Production DB
Production DB
editor
editor
editor
packager
editor
packager
editor
packager
editor
packager
editor
editor
packager
editor
editor
packager
editor
editor
packager
editor
editor
Central Services
Edit
Edit
Edit
Instantiate
Instantiate
Instantiate
Create
Create
Create
Application
Job monitoring
workflow
workflow
workflow
application
application
application
Application
Application
tar file
tar file
tar file
packager
packager
packager
Production
Production
Production
Production
Job
Job
Job
Job
Getting results
Service
Service
Service
Service
XML
XML
Production manager
XML
Production manager
XML
Production manager
Job
Job
Job
Job
Monitoring
Monitoring
Monitoring
Monitoring
Monitoring
Monitoring
Monitoring
Monitoring
Production resources
Production resources
request
request
request
request
DB
Service
DB
Service
DB
DB
Service
Service
Job
Job
Data management
Job
Job
Site A
Site A
status
status
status
status
Agent A
Agent A
Agent A
Agent A
Bookkeeping
Bookkeeping
Bookkeeping
Bookkeeping
Bookkeeping
Bookkeeping
Bookkeeping
Bookkeeping
Meta
Meta
Meta
Meta
DB
Service
DB
Service
DB
Service
DB
Service
XML
XML
XML
XML
Replica Manager
Site B
Site B
Agent B
Agent B
Agent B
Agent B
Dataset
Dataset
Dataset
Dataset

replica
replica
replica
replica
Storage elements
Central Storage
File catalog
Site n
Site n
Castor MSS
Castor MSS
Agent n
Agent n
Agent n
Agent n
CERN
CERN
30
Distributed MC Production in LHCb Today
Transfer data to mass-store at CERN (CASTOR)
Submit jobs remotely via Web
Execute on farm
Update bookkeeping database (Oracle at CERN)
Data Quality Check on data stored at CERN
Monitor performance of farm via Web

Write a Comment

User Comments (0)