Computing - PowerPoint PPT Presentation

About This Presentation
Title:

Computing

Description:

... collisions high radiation environment e.g. in forward calorimeters: ... Calorimeters: energy of electrons, photons, hadrons. External tracker: identify muons ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 55
Provided by: harv1
Category:

less

Transcript and Presenter's Notes

Title: Computing


1
Computing Software Challenges _at_ LHCor how to
convert 100TB/s into a Nobel prizeVincenzo
Innocente(CMS Experiment CERN/ PH-SFT)
  • ESAC
  • Madrid
  • May 14th, 2008

2
Outline
  • Physics
  • signals, signatures and rates
  • Detectors
  • volumes and rates of raw data
  • Analysis Model (Data Processing)
  • Data representation, classification, metadata
  • Computing Model
  • Software architecture
  • Interactive Data Analysis
  • Not Covered (backup slides available for
    discussion)
  • Organization
  • Software development process
  • Grid
  • Object streaming and persistency (in files and
    RDBMS)
  • Root
  • Experience during commissioning and possible
    future directions

3
The Large Hardon Collider at CERN
First Data expected in Summer
4
Collisions at the LHC summary
5
pp collisions at 14 TeV at 1034 cm-2s-1
A very difficult environment
  • 20 proton-proton
  • collisions overlap
  • H?ZZ
  • with Z ? 2 muons
  • H? 4 muons
  • the cleanest
  • (golden)
  • signature
  • And this (not the
  • H though) repeats every 25 ns

6
Impact on detector design
  • LHC detectors must have fast response
  • Otherwise will integrate over many bunch
    crossings ? large pile-up
  • Typical response time 20-50 ns
  • ? integrate over 1-2 bunch crossings ? pile-up of
    25-50 min-bias
  • ? very challenging readout electronics
  • LHC detectors must be highly granular
  • Minimize probability that pile-up particles be in
    the same detector element as interesting object
    (e.g. ? from H ? ?? decays)
  • ? large number of electronic channels
  • ? high cost
  • LHC detectors must be radiation resistant
  • high flux of particles from pp collisions ? high
    radiation environment e.g. in forward
    calorimeters
  • up to 1017 n/cm2 in 10 years of LHC operation
  • up to 107 Gy (1 Gy unit of absorbed energy 1
    Joule/Kg)

7
A Generic Multipurpose LHC Detector
Three detector layers in a magnetic field Inner
tracker vertex and charged particles
Calorimeters energy of electrons, photons,
hadrons External tracker identify muons
8
The Atlas Detector
  • The ATLAS experiment is
  • 26m long,
  • stands 20m high,
  • weighs 7000 tons
  • has 200 million read-out channels
  • orders of magnitude increase in complexity
  • The ATLAS collaboration is
  • 2000 physicists from ..
  • 150 universities and labs in..
  • 34 countries
  • distributed resources
  • remote development

9
DATA ORGANIZATION
10
Data and Algorithms
  • HEP main data are organized in Events (particle
    collisions)
  • Simulation, Reconstruction and Analysis programs
    process one Event at the time
  • Events are fairly independent to each other
  • Trivial parallel processing
  • Event processing programs are composed of a
    number of Algorithms selecting and transforming
    raw Event data into processed (reconstructed)
    Event data and statistics
  • Algorithms are mainly developed by Physicists
  • Algorithms may require additional detector
    conditions data (e.g. calibrations, geometry,
    environmental parameters, etc. )
  • Statistical data (histograms, distributions,
    etc.) are typically the final data processing
    results

11
High Energy Analysis Model
Reconstruction goes back in time from digital
signals to the original particles produced in the
collision
MonteCarlo Simulation follows the evolution of
physics processes from collision to digital
signals
Analysis compares (at statistical level)
reconstructed events from real data with those
from simulation
12
Data Hierarchy
RAW, ESD, AOD, TAG
RAW
Triggered events recorded by DAQ
Detector digitisation
2 MB/event
ESD/RECO
Reconstructed information
Pseudo-physical information Clusters, track
candidates
100 kB/event
Physical information Transverse momentum,
Association of particles, jets, (best) id of
particles,
AOD
Analysis information
10 kB/event
TAG
Relevant information for fast event selection
Classification information
1 kB/event
13
Event Data
  • Complex data models
  • 500 structure types
  • References to describe relationships between
    event objects
  • unidirectional
  • Need to support transparent navigation
  • Need ultimate resolution on selected events
  • need to run specialised algorithms
  • work interactively
  • Not affordable if uncontrolled

14
DataSets - Event Collections HEP Metadata
Bookkeeping
15
Selection/Skimming
16
Data Bookkeeping in CMS
  • Events are stored in Files
  • For MC there are O(1000) events/file
  • Files are grouped in Blocks
  • We aim to make blocks gt size of a tape
  • Blocks are grouped into Datasets
  • We need to track
  • Which block a file is in
  • Which dataset a block is in
  • File/Block/Dataset metadata
  • To do all this CMS has developed DBS, the CMS
    Dataset Book keeping System
  • Oracle Backend
  • Web interface

17
Examples of queries
18
Detector Conditions Data
  • Reflects changes in state of the detector with
    time
  • Event Data cannot be reconstructed/analyzed
    without it
  • Versioning
  • Tagging
  • Ability to extract slices of data
  • World-wide access (replica, closest available
    copy, etc.)
  • Long life-time

Version
Tag1 definition
Time
19
Conditions Database
  • Need tools for data management, data browsing,
    replication, slicing, import/export
  • Database implementations (more than one)
  • Use supported features (transactions,
    replication, )

20
COMPUTING ARCHITECTURE
21
Physics Selection at LHC
22
Online selection
HLT Same Hardware, same Software as
Offline Different situation Cannot repeat,
cannot afford even rare crash or small memory
leak!
23
Trigger/DAQ Summary for LHC
ATLAS
No.Levels Level-1 Event Readout Filter
Out Trigger Rate (Hz) Size (Byte)
Bandw.(GB/s) MB/s (Event/s) 3 105 106 10 100
(102) LV-2 103 2 105 106 100 100
(102) 3 LV-0 106 2x105 4 40 (2x102) LV-1 4
104 4 Pp-Pp 500 5x107 5 1250
(102) p-p 103 2x106 200 (102)
CMS
LHCb
ALICE
24
HEP Data Handling and Computation
event filter (selection reconstruction)
detector
processed data
event summary data
Locally Managed
raw data
batch physics analysis
event reprocessing
Chaotic
analysis objects (extracted by physics topic)
event simulation
Centrally Managed
interactive physics analysis
25
A Multi-Tier Computing Model
Tier 0 (Experiment Host lab)
Tier 1 (Main Regional Centres)
Tier2
Tier 3
Desktop
  • Exploit Grid infrastructures
  • Homogeneous environment
  • Optimal share of resources
  • Hierarchy of computing centers
  • Matches computing tasks to local interests,
    expertise and resources

26
CMS Computing Model
  • Site activities and functionality largely
    predictable
  • Activities are driven by data location
  • Organized mass processing and custodial storage
    at Tier-1s
  • chaotic computing essentially restricted to
    data analysis at T2s
  • Resource
    evolution

10K boxes
CPU
15 PB
DISK
20 PB
TAPE
8core Clowertown 2.3GHz 10KSI2K
27
Tiered Architecture
  • CAF CMS Analysis Facility at CERN
  • Access to full raw dataset
  • Focused on latency-critical detector trigger
    calibration and analysis activities
  • Provide some CMS central services (e.g. store
    conditions and calibrations)
  • Tier-0
  • Accepts data from DAQ
  • Prompt reconstruction
  • Data archive and distribution to Tier-1s
  • Tier-1s
  • Real data archiving
  • Re-processing
  • Skimming and other data-intensive analysis tasks
  • Calibration
  • MC data archiving
  • Tier-2s
  • User data Analysis
  • MC production
  • Import skimmed datasets from Tier-1 and export MC
    data
  • Calibration/alignment

28
CMSDataWork
Flow
(under test as I speak)
Prompt Precise Calibration
Analysis
Prompt Reconstruction
29
CMS DQM (First Look) Central Operation
Detector Operation Efficiency
Physics Data Certification
live (secs)
hours
1 day
Operation established
To be devised
being deployed
DQM project scope has been extended to include
Tier-0 and CAF (Tier-1/2 soon to come)
30
DQM _at_ Tier-0
Validation / Physics Data Certification
  • DQM objects (histograms) written in same file as
    data
  • Harvesting step extract histos
  • Data Certification (input to Good-Runs-Lists)
    will be done during Harvesting
  • Certification data stored and managed by DBS
  • Histos managed by a specific DQM GUI web server

User Access
7
31
APPLICATION SOFTWARE
32
Software Structure
Applications are built on top of frameworks and
implementing the required algorithms
Applications
Event
DetDesc.
Calib.
Every experiment has a framework for basic
services and various specialized frameworks
event model, detector description,
visualization, persistency, interactivity,
simulation, calibrarion, etc.
Experiment Framework
Simulation
DataMngmt.
Distrib. Analysis
Specialized domains that are common among the
experiments
Core Libraries
Core libraries and services that are widely used
and provide basic functionality
non-HEP specific software packages
General purpose non-HEP libraries
33
Software Components
  • Foundation Libraries
  • Basic types
  • Utility libraries
  • System isolation libraries
  • Mathematical Libraries
  • Special functions
  • Minimization, Random Numbers
  • Data Organization
  • Event Data
  • Event Metadata (Event collections)
  • Detector Conditions Data
  • Data Management Tools
  • Object Persistency
  • Data Distribution and Replication
  • Simulation Toolkits
  • Event generators
  • Detector simulation
  • Statistical Analysis Tools
  • Histograms, N-tuples
  • Fitting
  • Interactivity and User Interfaces
  • GUI
  • Scripting
  • Interactive analysis
  • Data Visualization and Graphics
  • Event and Geometry displays
  • Distributed Applications
  • Parallel processing
  • Grid computing

34
Programming Languages
  • Object-Oriented (O-O) programming languages have
    become the norm for developing the software for
    HEP experiments
  • C is in use by (almost) all Experiments
  • Pioneered by Babar and Run II (D0 and CDF)
  • LHC experiments with an initial FORTRAN code base
    have basically completed the migration to C
  • Large common software projects in C have been
    in production for many years aready
  • ROOT, Geant4,
  • FORTRAN still in use mainly by the MC generators
  • Large developments efforts are put for the
    migration to C (Pythia8, Herwig, Sherpa,)

35
Scripting Languages
  • Scripting has been an essential component in the
    HEP analysis software for the last decades
  • PAW macros (kumac) in the FORTRAN era
  • C interpreter (CINT) in the C era
  • Python recently introduced and gaining momentum
  • Most of the statistical data analysis and final
    presentation is done with scripts
  • Interactive analysis
  • Rapid prototyping to test new ideas
  • Driving complex procedures
  • Scripts are also used to configure complex C
    programs developed and used by the LHC
    experiments
  • Simulation and Reconstruction programs with
    hundreds or thousands of options to configure

36
Object-Orientation
  • The object-oriented paradigm has been adopted for
    HEP software development
  • Basically all the code for the new generation
    experiments is O-O
  • O-O has enabled us to handle reasonably well
    higher complexity
  • The migration to O-O was not easy and took longer
    than expected
  • The process was quite long and painful (between
    4-8 years)
  • The community had to be re-educated to new
    languages and tools
  • C is not a simple language
  • Only specialists master it completely
  • Mixing interpreted and compiled languages (e.g.
    C and Python) is a workable compromise

37
Software Frameworks
  • Experiments develop Software Frameworks
  • General Architecture of the Event processing
    applications
  • To achieve coherency and to facilitate software
    re-use
  • Hide technical details to the end-user Physicists
    (providers of the Algorithms)
  • Applications are developed by customizing the
    Framework
  • By the composition of elemental Algorithms to
    form complete applications
  • Using third-party components wherever possible
    and configuringthem

Example Gaudi framework (C) in use by LHCb
and ATLAS
38
Non-HEP Packages widely used in HEP
  • Non-HEP specific functionality required by HEP
    programs can be implemented using existing
    packages
  • Favoring free and open-source software
  • About 30 packages are currently in use by the LHC
    experiments
  • Here are some examples
  • Boost
  • Portable and free C source libraries intended
    to be widely useful and usable across a broad
    spectrum of applications
  • GSL
  • GNU Scientific Library
  • Coin3D
  • High-level 3D graphics toolkit for developing
    cross-platform real-time 3D visualization
  • XercesC
  • XML parser written in a portable subset of C

39
HEP Generic Packages (1)
  • Core Libraries
  • Library of basic types (e.g. 3-vector, 4-vector,
    points, particle, etc.)
  • Extensions to C Standard Library
  • Mathematical libraries
  • Statistics libraries
  • Utility Libraries
  • Operating system isolation libraries
  • Component model
  • Plugin management
  • C Reflexion
  • Examples ROOT, CLHEP, etc.

40
HEP Generic Packages (2)
  • MC Generators
  • This is the best example of common code used by
    all the experiments
  • Well defined functionality and fairly simple
    interfaces
  • Detector Simulation
  • Presented in form of toolkits/frameworks
    (Geant4, FLUKA)
  • The user needs to input the geometry description,
    primary particles, user actions, etc.
  • Data Persistency and Management
  • To store and manage the data produced by
    experiments
  • Data Visualization
  • GUI, 2D and 3D graphics
  • Distributed and Grid Analysis
  • To support end-users using the distributed
    computing resources (PROOF, Ganga,)

41
Persistency Framework
  • FILES - based on ROOT I/O
  • Targeted for complex data structure event data,
    analysis data
  • Management of object relationships file
    catalogues
  • Interface to Grid file catalogs and Grid file
    access
  • Relational Databases Oracle, MySQL, SQLite
  • Suitable for conditions, calibration, alignment,
    detector description data - possibly produced by
    online systems
  • Complex use cases and requirements, multiple
    environments difficult to be satisfied by a
    single vendor solution
  • Isolating applications from the database
    implementations with a standardized relational
    database interface
  • facilitate the life of the application developers
  • no change in the application to run in different
    environments
  • encode good practices once for all

42
ROOT I/O
  • ROOT provides support for object input/output
    from/to platform independent files
  • The system is designed to be particularly
    efficient for objects frequently manipulated by
    physicists histograms, ntuples, trees and events
  • I/O is possible for any user class.
    Non-intrusive, only the class dictionary needs
    to be defined
  • Extensive support for schema evolution. Class
    definitions are not immutable over the life-time
    of the experiment
  • The ROOT I/O area is still moving after 10 years
  • Recent additions Full STL support, data
    compression, tree I/O from ASCII, tree indices,
    etc.
  • All new HEP experiments rely on ROOT I/O to store
    its data

43
Simulation
  • Event Generators
  • Programs to generate high-energy physics events
    following the theory and models for a number of
    physics aspects
  • Specialized Particle Decay Packages
  • Simulation of particle decays using latest
    experimental data
  • Detector Simulation
  • Simulation of the passage of particles through
    matter and electromagnetic fields
  • Detailed geometry and material descriptions
  • Extensive list of physics processes based on
    theory, data or parameterization
  • Detector responses
  • Simulation of the detecting devices and
    corresponding electronics

44
MC Generators
  • Many MC generators and tools are available to the
    experiments provided by a solid community (mostly
    theorists)
  • Each experiment chooses the tools more adequate
    for their physics
  • Example ATLAS alone uses currently
  • Generators
  • AcerMC Zbb, tt, single top, ttbb, Wbb
  • Alpgen ( MLM matching) Wjets, Zjets, QCD
    multijets
  • Charbydis black holes
  • HERWIG QCD multijets, Drell-Yan, SUSY...
  • Hijing Heavy Ions, Beam-gas..
  • MC_at_NLO tt, Drell-Yan, boson pair production
  • Pythia QCD multijets, B-physics, Higgs
    production...
  • Decay packages
  • TAUOLA Interfaced to work with Pythia, Herwig
    and Sherpa,
  • PHOTOS Interfaced to work with Pythia, Herwig
    and Sherpa,
  • EvtGen Used in B-physics channels.

45
Detector Simulation - Geant 4
  • Geant4 has become an established tool, in
    production for the majority of LHC experiments
    during the past two years, and in use in many
    other HEP experiments and for applications in
    medical, space and other fields
  • On going work in the physics validation
  • Good example of common software

LHCb 18 million volumes
ALICE 3 million volumes
46
Distributed Analysis
  • Analysis will be performed with a mix of
    official experiment software and private user
    code
  • How can we make sure that the user code can
    execute and provide a correct result wherever it
    lands?
  • Input datasets not necessarily known a-priori
  • Possibly very sparse data access pattern when
    only a very few events match the query
  • Large number of people submitting jobs
    concurrently and in an uncoordinated fashion
    resulting into a chaotic workload
  • Wide range of user expertise
  • Need for interactivity - requirements on system
    response time rather than throughput
  • Ability to suspend an interactive session and
    resume it later, in a different location

47
Data Analysis The Spectrum
  • From Batch Physics Analysis
  • Run on the complete data set (from TB to PB)
  • Reconstruction of non-visible particles from
    decay products
  • Classification Events based on physical
    properties
  • Several non-exclusive data streams with summary
    information (event tags) (from GB to TB)
  • Costly operation
  • To Interactive Physics Analysis
  • Final event selection and refinements (few GB)
  • Histograms, N-tuples, Fitting models
  • Data Visualization
  • Scripting and GUI

48
Experiment Specific Analysis Frameworks
  • Development of Event models and high level
    analysis tools specific to the experiment physics
    goals
  • Example DaVinci (LHCb)

49
ROOT
  • The ROOT system is an Object Oriented framework
    for large scale data analysis written in C
  • It includes among others
  • Efficient object persistency facilities
  • C interpreter
  • Advanced statistical analysis (multi dimensional
    histogramming, fitting, minimization, cluster
    finding algorithms) and visualization tools
  • The user interacts with ROOT via a graphical user
    interface the command line or batch scripts
  • The project started in 1995 and now is a very
    mature system used by many physicists worldwide

50
Data Visualization
  • Experiments develop interactive Event and
    Geometry display programs
  • Help to develop detector geometries
  • Help to develop pattern recognition algorithms
  • Interactive data analysis and data presentation
  • Ingredients
  • GUI
  • 3-D graphics
  • Scripting

Example visualization in CMS (IGUANA)
51
SUMMARY
52
Three Challenges
  • Detector
  • 2 orders of magnitude more channels than before
  • Triggers must choose correctly only 1 event in
    every 500,000
  • Level 23 triggers are software-based
  • Geographical spread
  • Communication and collaboration at a distance
  • Distributed computing resources
  • Remote software development and physics analysis
  • Physics
  • Precise and specialized algorithms for
    reconstruction and calibration
  • Allow remote physicists to access detailed
    event-information
  • Migrate effectively reconstruction and selection
    algorithms to High Level Trigger
  • Main Challenge is in Managing the Complexity

53
Analysis, Data and Computing Models
  • Analysis (data processing) Model with several
    organized steps of data reduction
  • Hierarchical Data Model that matches the various
    steps in the Analysis Model
  • Multi-Tier Computing Model that
  • matches the geographical location of physics
    analysis groups
  • defines the policies for replicating data to
    those sites with appropriate resources (and
    interest!) to run jobs
  • Grids are important to
  • providers centres supplying resources in a
    multi-science environment in a secure and managed
    way
  • consumers hungry for those resources (potential
    of resource discovery)

54
Software Solutions
  • A variety of different software domains and
    expertise
  • Data Management, Simulation, Interactive
    Visualization, Distributed Computing, etc.
  • Application Software Frameworks
  • Ensure coherency in the Event data processing
    applications
  • Make the life of Physicists easier by hiding most
    of the technicalities
  • Allows running the same software in different
    environment and with different configuration
  • Withstand technology changes
  • Set of HEP specific core software components
  • Root, Geant4, CLHEP,Pool
  • Extensive use of third-party generic software
  • Open-source products favored
Write a Comment
User Comments (0)
About PowerShow.com