Title: Computing
1Computing Software Challenges _at_ LHCor how to
convert 100TB/s into a Nobel prizeVincenzo
Innocente(CMS Experiment CERN/ PH-SFT)
- ESAC
- Madrid
- May 14th, 2008
2Outline
- Physics
- signals, signatures and rates
- Detectors
- volumes and rates of raw data
- Analysis Model (Data Processing)
- Data representation, classification, metadata
- Computing Model
- Software architecture
- Interactive Data Analysis
- Not Covered (backup slides available for
discussion) - Organization
- Software development process
- Grid
- Object streaming and persistency (in files and
RDBMS) - Root
- Experience during commissioning and possible
future directions
3The Large Hardon Collider at CERN
First Data expected in Summer
4Collisions at the LHC summary
5pp collisions at 14 TeV at 1034 cm-2s-1
A very difficult environment
- 20 proton-proton
- collisions overlap
-
- H?ZZ
- with Z ? 2 muons
- H? 4 muons
- the cleanest
- (golden)
- signature
- And this (not the
- H though) repeats every 25 ns
6Impact on detector design
- LHC detectors must have fast response
- Otherwise will integrate over many bunch
crossings ? large pile-up - Typical response time 20-50 ns
- ? integrate over 1-2 bunch crossings ? pile-up of
25-50 min-bias - ? very challenging readout electronics
- LHC detectors must be highly granular
- Minimize probability that pile-up particles be in
the same detector element as interesting object
(e.g. ? from H ? ?? decays) - ? large number of electronic channels
- ? high cost
- LHC detectors must be radiation resistant
- high flux of particles from pp collisions ? high
radiation environment e.g. in forward
calorimeters - up to 1017 n/cm2 in 10 years of LHC operation
- up to 107 Gy (1 Gy unit of absorbed energy 1
Joule/Kg)
7A Generic Multipurpose LHC Detector
Three detector layers in a magnetic field Inner
tracker vertex and charged particles
Calorimeters energy of electrons, photons,
hadrons External tracker identify muons
8The Atlas Detector
- The ATLAS experiment is
- 26m long,
- stands 20m high,
- weighs 7000 tons
- has 200 million read-out channels
- orders of magnitude increase in complexity
- The ATLAS collaboration is
- 2000 physicists from ..
- 150 universities and labs in..
- 34 countries
- distributed resources
- remote development
9DATA ORGANIZATION
10Data and Algorithms
- HEP main data are organized in Events (particle
collisions) - Simulation, Reconstruction and Analysis programs
process one Event at the time - Events are fairly independent to each other
- Trivial parallel processing
- Event processing programs are composed of a
number of Algorithms selecting and transforming
raw Event data into processed (reconstructed)
Event data and statistics - Algorithms are mainly developed by Physicists
- Algorithms may require additional detector
conditions data (e.g. calibrations, geometry,
environmental parameters, etc. ) - Statistical data (histograms, distributions,
etc.) are typically the final data processing
results
11High Energy Analysis Model
Reconstruction goes back in time from digital
signals to the original particles produced in the
collision
MonteCarlo Simulation follows the evolution of
physics processes from collision to digital
signals
Analysis compares (at statistical level)
reconstructed events from real data with those
from simulation
12Data Hierarchy
RAW, ESD, AOD, TAG
RAW
Triggered events recorded by DAQ
Detector digitisation
2 MB/event
ESD/RECO
Reconstructed information
Pseudo-physical information Clusters, track
candidates
100 kB/event
Physical information Transverse momentum,
Association of particles, jets, (best) id of
particles,
AOD
Analysis information
10 kB/event
TAG
Relevant information for fast event selection
Classification information
1 kB/event
13Event Data
- Complex data models
- 500 structure types
- References to describe relationships between
event objects - unidirectional
- Need to support transparent navigation
- Need ultimate resolution on selected events
- need to run specialised algorithms
- work interactively
- Not affordable if uncontrolled
14DataSets - Event Collections HEP Metadata
Bookkeeping
15Selection/Skimming
16Data Bookkeeping in CMS
- Events are stored in Files
- For MC there are O(1000) events/file
- Files are grouped in Blocks
- We aim to make blocks gt size of a tape
- Blocks are grouped into Datasets
- We need to track
- Which block a file is in
- Which dataset a block is in
- File/Block/Dataset metadata
- To do all this CMS has developed DBS, the CMS
Dataset Book keeping System - Oracle Backend
- Web interface
17Examples of queries
18Detector Conditions Data
- Reflects changes in state of the detector with
time - Event Data cannot be reconstructed/analyzed
without it - Versioning
- Tagging
- Ability to extract slices of data
- World-wide access (replica, closest available
copy, etc.) - Long life-time
Version
Tag1 definition
Time
19Conditions Database
- Need tools for data management, data browsing,
replication, slicing, import/export - Database implementations (more than one)
- Use supported features (transactions,
replication, )
20COMPUTING ARCHITECTURE
21Physics Selection at LHC
22Online selection
HLT Same Hardware, same Software as
Offline Different situation Cannot repeat,
cannot afford even rare crash or small memory
leak!
23Trigger/DAQ Summary for LHC
ATLAS
No.Levels Level-1 Event Readout Filter
Out Trigger Rate (Hz) Size (Byte)
Bandw.(GB/s) MB/s (Event/s) 3 105 106 10 100
(102) LV-2 103 2 105 106 100 100
(102) 3 LV-0 106 2x105 4 40 (2x102) LV-1 4
104 4 Pp-Pp 500 5x107 5 1250
(102) p-p 103 2x106 200 (102)
CMS
LHCb
ALICE
24 HEP Data Handling and Computation
event filter (selection reconstruction)
detector
processed data
event summary data
Locally Managed
raw data
batch physics analysis
event reprocessing
Chaotic
analysis objects (extracted by physics topic)
event simulation
Centrally Managed
interactive physics analysis
25A Multi-Tier Computing Model
Tier 0 (Experiment Host lab)
Tier 1 (Main Regional Centres)
Tier2
Tier 3
Desktop
- Exploit Grid infrastructures
- Homogeneous environment
- Optimal share of resources
- Hierarchy of computing centers
- Matches computing tasks to local interests,
expertise and resources
26CMS Computing Model
- Site activities and functionality largely
predictable - Activities are driven by data location
- Organized mass processing and custodial storage
at Tier-1s - chaotic computing essentially restricted to
data analysis at T2s - Resource
evolution
10K boxes
CPU
15 PB
DISK
20 PB
TAPE
8core Clowertown 2.3GHz 10KSI2K
27Tiered Architecture
- CAF CMS Analysis Facility at CERN
- Access to full raw dataset
- Focused on latency-critical detector trigger
calibration and analysis activities - Provide some CMS central services (e.g. store
conditions and calibrations)
- Tier-0
- Accepts data from DAQ
- Prompt reconstruction
- Data archive and distribution to Tier-1s
- Tier-1s
- Real data archiving
- Re-processing
- Skimming and other data-intensive analysis tasks
- Calibration
- MC data archiving
- Tier-2s
- User data Analysis
- MC production
- Import skimmed datasets from Tier-1 and export MC
data - Calibration/alignment
28 CMSDataWork
Flow
(under test as I speak)
Prompt Precise Calibration
Analysis
Prompt Reconstruction
29CMS DQM (First Look) Central Operation
Detector Operation Efficiency
Physics Data Certification
live (secs)
hours
1 day
Operation established
To be devised
being deployed
DQM project scope has been extended to include
Tier-0 and CAF (Tier-1/2 soon to come)
30DQM _at_ Tier-0
Validation / Physics Data Certification
- DQM objects (histograms) written in same file as
data - Harvesting step extract histos
- Data Certification (input to Good-Runs-Lists)
will be done during Harvesting
- Certification data stored and managed by DBS
- Histos managed by a specific DQM GUI web server
User Access
7
31APPLICATION SOFTWARE
32Software Structure
Applications are built on top of frameworks and
implementing the required algorithms
Applications
Event
DetDesc.
Calib.
Every experiment has a framework for basic
services and various specialized frameworks
event model, detector description,
visualization, persistency, interactivity,
simulation, calibrarion, etc.
Experiment Framework
Simulation
DataMngmt.
Distrib. Analysis
Specialized domains that are common among the
experiments
Core Libraries
Core libraries and services that are widely used
and provide basic functionality
non-HEP specific software packages
General purpose non-HEP libraries
33Software Components
- Foundation Libraries
- Basic types
- Utility libraries
- System isolation libraries
- Mathematical Libraries
- Special functions
- Minimization, Random Numbers
- Data Organization
- Event Data
- Event Metadata (Event collections)
- Detector Conditions Data
- Data Management Tools
- Object Persistency
- Data Distribution and Replication
- Simulation Toolkits
- Event generators
- Detector simulation
- Statistical Analysis Tools
- Histograms, N-tuples
- Fitting
- Interactivity and User Interfaces
- GUI
- Scripting
- Interactive analysis
- Data Visualization and Graphics
- Event and Geometry displays
- Distributed Applications
- Parallel processing
- Grid computing
34Programming Languages
- Object-Oriented (O-O) programming languages have
become the norm for developing the software for
HEP experiments - C is in use by (almost) all Experiments
- Pioneered by Babar and Run II (D0 and CDF)
- LHC experiments with an initial FORTRAN code base
have basically completed the migration to C - Large common software projects in C have been
in production for many years aready - ROOT, Geant4,
- FORTRAN still in use mainly by the MC generators
- Large developments efforts are put for the
migration to C (Pythia8, Herwig, Sherpa,)
35Scripting Languages
- Scripting has been an essential component in the
HEP analysis software for the last decades - PAW macros (kumac) in the FORTRAN era
- C interpreter (CINT) in the C era
- Python recently introduced and gaining momentum
- Most of the statistical data analysis and final
presentation is done with scripts - Interactive analysis
- Rapid prototyping to test new ideas
- Driving complex procedures
- Scripts are also used to configure complex C
programs developed and used by the LHC
experiments - Simulation and Reconstruction programs with
hundreds or thousands of options to configure
36Object-Orientation
- The object-oriented paradigm has been adopted for
HEP software development - Basically all the code for the new generation
experiments is O-O - O-O has enabled us to handle reasonably well
higher complexity - The migration to O-O was not easy and took longer
than expected - The process was quite long and painful (between
4-8 years) - The community had to be re-educated to new
languages and tools - C is not a simple language
- Only specialists master it completely
- Mixing interpreted and compiled languages (e.g.
C and Python) is a workable compromise
37Software Frameworks
- Experiments develop Software Frameworks
- General Architecture of the Event processing
applications - To achieve coherency and to facilitate software
re-use - Hide technical details to the end-user Physicists
(providers of the Algorithms) - Applications are developed by customizing the
Framework - By the composition of elemental Algorithms to
form complete applications - Using third-party components wherever possible
and configuringthem
Example Gaudi framework (C) in use by LHCb
and ATLAS
38Non-HEP Packages widely used in HEP
- Non-HEP specific functionality required by HEP
programs can be implemented using existing
packages - Favoring free and open-source software
- About 30 packages are currently in use by the LHC
experiments - Here are some examples
- Boost
- Portable and free C source libraries intended
to be widely useful and usable across a broad
spectrum of applications - GSL
- GNU Scientific Library
- Coin3D
- High-level 3D graphics toolkit for developing
cross-platform real-time 3D visualization - XercesC
- XML parser written in a portable subset of C
39HEP Generic Packages (1)
- Core Libraries
- Library of basic types (e.g. 3-vector, 4-vector,
points, particle, etc.) - Extensions to C Standard Library
- Mathematical libraries
- Statistics libraries
- Utility Libraries
- Operating system isolation libraries
- Component model
- Plugin management
- C Reflexion
- Examples ROOT, CLHEP, etc.
40HEP Generic Packages (2)
- MC Generators
- This is the best example of common code used by
all the experiments - Well defined functionality and fairly simple
interfaces - Detector Simulation
- Presented in form of toolkits/frameworks
(Geant4, FLUKA) - The user needs to input the geometry description,
primary particles, user actions, etc. - Data Persistency and Management
- To store and manage the data produced by
experiments - Data Visualization
- GUI, 2D and 3D graphics
- Distributed and Grid Analysis
- To support end-users using the distributed
computing resources (PROOF, Ganga,)
41Persistency Framework
- FILES - based on ROOT I/O
- Targeted for complex data structure event data,
analysis data - Management of object relationships file
catalogues - Interface to Grid file catalogs and Grid file
access - Relational Databases Oracle, MySQL, SQLite
- Suitable for conditions, calibration, alignment,
detector description data - possibly produced by
online systems - Complex use cases and requirements, multiple
environments difficult to be satisfied by a
single vendor solution - Isolating applications from the database
implementations with a standardized relational
database interface - facilitate the life of the application developers
- no change in the application to run in different
environments - encode good practices once for all
42ROOT I/O
- ROOT provides support for object input/output
from/to platform independent files - The system is designed to be particularly
efficient for objects frequently manipulated by
physicists histograms, ntuples, trees and events - I/O is possible for any user class.
Non-intrusive, only the class dictionary needs
to be defined - Extensive support for schema evolution. Class
definitions are not immutable over the life-time
of the experiment - The ROOT I/O area is still moving after 10 years
- Recent additions Full STL support, data
compression, tree I/O from ASCII, tree indices,
etc. - All new HEP experiments rely on ROOT I/O to store
its data
43Simulation
- Event Generators
- Programs to generate high-energy physics events
following the theory and models for a number of
physics aspects - Specialized Particle Decay Packages
- Simulation of particle decays using latest
experimental data - Detector Simulation
- Simulation of the passage of particles through
matter and electromagnetic fields - Detailed geometry and material descriptions
- Extensive list of physics processes based on
theory, data or parameterization - Detector responses
- Simulation of the detecting devices and
corresponding electronics
44MC Generators
- Many MC generators and tools are available to the
experiments provided by a solid community (mostly
theorists) - Each experiment chooses the tools more adequate
for their physics - Example ATLAS alone uses currently
- Generators
- AcerMC Zbb, tt, single top, ttbb, Wbb
- Alpgen ( MLM matching) Wjets, Zjets, QCD
multijets - Charbydis black holes
- HERWIG QCD multijets, Drell-Yan, SUSY...
- Hijing Heavy Ions, Beam-gas..
- MC_at_NLO tt, Drell-Yan, boson pair production
- Pythia QCD multijets, B-physics, Higgs
production... - Decay packages
- TAUOLA Interfaced to work with Pythia, Herwig
and Sherpa, - PHOTOS Interfaced to work with Pythia, Herwig
and Sherpa, - EvtGen Used in B-physics channels.
45Detector Simulation - Geant 4
- Geant4 has become an established tool, in
production for the majority of LHC experiments
during the past two years, and in use in many
other HEP experiments and for applications in
medical, space and other fields - On going work in the physics validation
- Good example of common software
LHCb 18 million volumes
ALICE 3 million volumes
46Distributed Analysis
- Analysis will be performed with a mix of
official experiment software and private user
code - How can we make sure that the user code can
execute and provide a correct result wherever it
lands? - Input datasets not necessarily known a-priori
- Possibly very sparse data access pattern when
only a very few events match the query - Large number of people submitting jobs
concurrently and in an uncoordinated fashion
resulting into a chaotic workload - Wide range of user expertise
- Need for interactivity - requirements on system
response time rather than throughput - Ability to suspend an interactive session and
resume it later, in a different location
47Data Analysis The Spectrum
- From Batch Physics Analysis
- Run on the complete data set (from TB to PB)
- Reconstruction of non-visible particles from
decay products - Classification Events based on physical
properties - Several non-exclusive data streams with summary
information (event tags) (from GB to TB) - Costly operation
- To Interactive Physics Analysis
- Final event selection and refinements (few GB)
- Histograms, N-tuples, Fitting models
- Data Visualization
- Scripting and GUI
48Experiment Specific Analysis Frameworks
- Development of Event models and high level
analysis tools specific to the experiment physics
goals - Example DaVinci (LHCb)
49ROOT
- The ROOT system is an Object Oriented framework
for large scale data analysis written in C - It includes among others
- Efficient object persistency facilities
- C interpreter
- Advanced statistical analysis (multi dimensional
histogramming, fitting, minimization, cluster
finding algorithms) and visualization tools - The user interacts with ROOT via a graphical user
interface the command line or batch scripts - The project started in 1995 and now is a very
mature system used by many physicists worldwide
50Data Visualization
- Experiments develop interactive Event and
Geometry display programs - Help to develop detector geometries
- Help to develop pattern recognition algorithms
- Interactive data analysis and data presentation
- Ingredients
- GUI
- 3-D graphics
- Scripting
Example visualization in CMS (IGUANA)
51SUMMARY
52Three Challenges
- Detector
- 2 orders of magnitude more channels than before
- Triggers must choose correctly only 1 event in
every 500,000 - Level 23 triggers are software-based
- Geographical spread
- Communication and collaboration at a distance
- Distributed computing resources
- Remote software development and physics analysis
- Physics
- Precise and specialized algorithms for
reconstruction and calibration - Allow remote physicists to access detailed
event-information - Migrate effectively reconstruction and selection
algorithms to High Level Trigger - Main Challenge is in Managing the Complexity
53Analysis, Data and Computing Models
- Analysis (data processing) Model with several
organized steps of data reduction - Hierarchical Data Model that matches the various
steps in the Analysis Model - Multi-Tier Computing Model that
- matches the geographical location of physics
analysis groups - defines the policies for replicating data to
those sites with appropriate resources (and
interest!) to run jobs - Grids are important to
- providers centres supplying resources in a
multi-science environment in a secure and managed
way - consumers hungry for those resources (potential
of resource discovery)
54Software Solutions
- A variety of different software domains and
expertise - Data Management, Simulation, Interactive
Visualization, Distributed Computing, etc. - Application Software Frameworks
- Ensure coherency in the Event data processing
applications - Make the life of Physicists easier by hiding most
of the technicalities - Allows running the same software in different
environment and with different configuration - Withstand technology changes
- Set of HEP specific core software components
- Root, Geant4, CLHEP,Pool
- Extensive use of third-party generic software
- Open-source products favored