Introduction to High Energy Physics Data Analysis Software Pere Mato CERN PHSFT - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Introduction to High Energy Physics Data Analysis Software Pere Mato CERN PHSFT

Description:

Allows data to be stored in a distributed and grid-enabled fashion ... Simulation of particle decays using latest experimental data. Detector Simulation ... – PowerPoint PPT presentation

Number of Views:193
Avg rating:3.0/5.0
Slides: 31
Provided by: pere83
Category:

less

Transcript and Presenter's Notes

Title: Introduction to High Energy Physics Data Analysis Software Pere Mato CERN PHSFT


1
Introduction to High Energy Physics Data
Analysis Software Pere Mato(CERN/ PH-SFT)
  • November 17th, 2004

Very much from a Large Hadron Collider (LHC)
perspective
2
Talk Outline
  • Physics signatures and rates
  • Data processing and datasets
  • Software structure and frameworks
  • Software components and domains
  • Usage of third-party software
  • Summary

3
Example The Atlas Detector
  • The ATLAS collaboration is
  • 2000 physicists from ..
  • 150 universities and labs
  • from 34 countries
  • distributed resources
  • remote development
  • The ATLAS detector is
  • 26m long,
  • stands 20m high,
  • weighs 7000 tons
  • has 200 million read-out channels

4
Atlas Physics Signatures and Event Rates
  • LHC pp collisions at ?s 14 TeV
  • Bunches cross at 40 MHz
  • sinelastic 80 mb
  • at high L gtgt 1 pp collision/crossing
  • 109 collisions per second
  • Study different physics channels each with their
    own signature e.g.
  • Higgs
  • Supersymmetry
  • B physics
  • Interesting physics events are buried in
    backgrounds of uninteresting physics events ( 1
    in 105 - 109 of recorded events)

5
HEP Processing stages and datasets
event filter (selection reconstruction)
detector
processed data
Event Summary Data (ESD)
raw data
batch physics analysis
event reconstruction
Analysis Object Data (AOD) (extracted by physics
topic)
event simulation
individual physics analysis
6
Data and Algorithms
  • HEP main data is organized in Events (particle
    collisions)
  • Simulation, Reconstruction and Analysis programs
    process one Event at the time
  • Events are fairly independent to each other
  • Trivial parallel processing
  • Event processing programs are composed of a
    number of Algorithms selecting and transforming
    raw Event data into new processed Event data
    and statistics
  • Algorithms are mainly developed by Physicists
  • Algorithms may require additional detector
    conditions data (e.g. calibrations, geometry,
    environmental parameters, etc. )
  • Statistical data (histograms, distributions,
    etc.) are typically the final data processing
    results

7
Data Hierarchy
RAW
Triggered events recorded by DAQ
Detector digitisation 109 events/yr 2 MB 2
PB/yr
2 MB/event
ESD
Reconstructed information
Pattern recognition information Clusters, track
candidates
100 kB/event
Physical information Transverse momentum,
Association of particles, jets, (best) id of
particles,
AOD
Analysis information
10 kB/event
TAG
Relevant information for fast event selection
Classification information
1 kB/event
8
Software Organization
Applications built on top of frameworks and
implementing the required algorithms.
Reconstruction
Simulation
High level triggers
Analysis
One framework for basic services various
specialized frameworks detector description,
visualization, persistency, interactivity,
simulation, etc.
Frameworks Toolkits
A series of basic libraries widely used STL,
CLHEP, GSL etc.
Foundation Libraries
9
Software Frameworks
  • Experiments develop Software Frameworks
  • General Architecture of the Event processing
    applications
  • To achieve coherency and to facilitate software
    re-use
  • Hide technical details to the end-user Physicists
    (providers of the Algorithms)
  • Applications are developed by customizing the
    Framework
  • By the composition of elemental Algorithms to
    form complete applications
  • Using third-party components wherever possible
    and configuringthem

Example Gaudi framework (C) in use by LHCb
and ATLAS
10
Software Components
  • Foundation Libraries
  • Basic types
  • Utility libraries
  • System isolation libraries
  • Mathematical Libraries
  • Special functions
  • Minimization, Random Numbers
  • Data Organization
  • Event Data
  • Event Metadata (Event collections)
  • Detector Conditions Data
  • Data Management Tools
  • Object Persistency
  • Data Distribution and Replication
  • Simulation Toolkits
  • Event generators
  • Detector simulation
  • Statistical Analysis Tools
  • Histograms, N-tuples
  • Fitting
  • Interactivity and User Interfaces
  • GUI
  • Scripting
  • Interactive analysis
  • Data Visualization and Graphics
  • Event and Geometry displays
  • Distributed Applications
  • Parallel processing
  • Grid computing

11
Components and Domains
12
Event Data
  • Complex data models
  • 500 structure types
  • References to describe relationships between
    event objects
  • unidirectional
  • Need to support transparent navigation
  • Need ultimate resolution on selected events
  • need to run specialised algorithms
  • work interactively
  • Not affordable if uncontrolled

13
HEP Metadata - Event Collections
Bookkeeping
Event tag collection Tag 1 5 0.3 Tag 2 2
1.2 Tag M 8 3.1
14
Detector Conditions Data
  • Reflects changes in state of the detector with
    time
  • Event Data cannot be reconstructed or analyzed
    without it
  • Versioning
  • Tagging
  • Ability to extract slices of data required to run
    with job
  • Long life-time

Version
Tag1 definition
Time
15
LHC Data Management Requirements
  • Increasing focus on maintainability and change
    management for core software due to long LHC
    lifetime
  • anticipate changes in technology
  • adapt quickly to changes in environment physics
    focus
  • Common solutions will simplify considerably the
    deployment and operation of data management in
    centres distributed worldwide
  • Common persistency framework (POOL project)
  • Interactive data analysis framework (ROOT
    project)
  • Strong involvement of experiments from the
    beginning required to provide requirements
  • some experimentalists participate directly in
    POOL
  • some work with software providers on integration
    in experiment frameworks

16
Common Persistency Framework (POOL)
  • Provides persistency for C transient objects
  • Supports transparent navigation between objects
    across file and technology boundaries
  • without requiring user to explicitly open files
    or database connections
  • Follows a technology neutral approach
  • Abstract component C interfaces
  • Insulates experiment software from concrete
    implementations and technologies
  • Hybrid technology approach combining
  • Streaming technology for complex C objects
    (event data)
  • event data - typically write once, read many
    (concurrent access simple)
  • Transaction-safe Relational Database (RDBMS)
    services,
  • for catalogs, collections and other metadata
  • Allows data to be stored in a distributed and
    grid-enabled fashion
  • Integrated with an external File Catalog to keep
    track of the file physical location, allowing
    files to be moved or replicated

17
Simulation
  • Event Generators
  • Programs to generate high-energy physics events
    following the theory and models for a number of
    physics aspects
  • Specialized Particle Decay Packages
  • Simulation of particle decays using latest
    experimental data
  • Detector Simulation
  • Simulation of the passage of particles through
    matter and electromagnetic fields
  • Detailed geometry and material descriptions
  • Extensive list of physics processes based on
    theory, data or parameterization
  • Detector responses
  • Simulation of the detecting devices and
    corresponding electronics

18
Distributed Analysis
  • Analysis will be performed with a mix of
    official experiment software and private user
    code
  • How can we make sure that the user code can
    execute and provide a correct result wherever it
    lands?
  • Input datasets not necessarily known a-priori
  • Possibly very sparse data access pattern when
    only a very few events match the query
  • Large number of people submitting jobs
    concurrently and in an uncoordinated fashion
    resulting into a chaotic workload
  • Wide range of user expertise
  • Need for interactivity - requirements on system
    response time rather than throughput
  • Ability to suspend an interactive session and
    resume it later, in a different location

19
Data Analysis The Spectrum
  • From Batch Physics Analysis
  • Run on the complete data set (from TB to PB)
  • Reconstruction of non-visible particles from
    decay products
  • Classification Events based on physical
    properties
  • Several non-exclusive data streams with summary
    information (event tags) (from GB to TB)
  • Costly operation
  • To Interactive Physics Analysis
  • Final event selection and refinements (few GB)
  • Histograms, N-tuples, Fitting models
  • Data Visualization
  • Scripting and GUI

20
Experiment Specific Analysis Frameworks
  • Development of Event models and high level
    analysis tools specific to the experiment physics
    goals
  • Example DaVinci (LHCb)

21
ROOT
  • The ROOT system is an Object Oriented framework
    for large scale data analysis written in C
  • It includes among others
  • Efficient object persistency facilities
  • C interpreter
  • Advanced statistical analysis (multi dimensional
    histogramming, fitting, minimization, cluster
    finding algorithms) and visualization tools
  • The user interacts with ROOT via a graphical user
    interface the command line or batch scripts
  • The project started in 1995 and now is a very
    mature system used by many physicists worldwide

22
ROOT packages
23
ROOT Graphics
24
ROOT GUI
25
ROOT Self-describing files
  • Dictionary for persistent classes written to the
    file when closing the file.
  • ROOT files can be read by foreign readers (eg
    JavaRoot)
  • Support for Backward and Forward compatibility
  • Automatic schema evolution
  • Files created in 2003 must be readable in 2015
  • Classes (data objects) for all objects in a file
    can be regenerated

26
ROOT Basic data types
  • Histograms
  • 1D, 2D, 3D and functions
  • Ntuples
  • support PAW-like ntuples
  • PAW ntuples/histograms can be imported
  • Trees
  • Extension of Ntuples for Objects
  • Collection of branches (branch has its own
    buffer)
  • Can input partial Event
  • Can have several Trees in parallel
  • Chains
  • Collections of Trees

27
ROOT Memory lt--gt Tree
Memory
T.GetEntry(6)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
T.Fill()
18
T
28
Data Visualization
  • Experiments develop interactive Event and
    Geometry display programs
  • Help to develop detector geometries
  • Help to develop pattern recognition algorithms
  • Interactive data analysis and data presentation
  • Ingredients
  • GUI
  • 3-D graphics
  • Scripting

Example ORCA visualization in CMS (IGUANA)
29
Used External Products
  • Statistical Analysis Tools
  • ROOT, GSL,
  • Interactivity and User Interfaces
  • Qt, Python, ROOT,
  • Data Visualization and Graphics
  • Coin, OpenGL,
  • Distributed Applications
  • PROOF, Globus, EDG,
  • Foundation Libraries
  • STL, Boost, CLHEP, Zlib,
  • Mathematical Libraries
  • NagC, GSL, CLHEP,
  • Data Organization
  • Oracle, MySQL, XercesC,
  • Data Management Tools
  • ROOT, Oracle, MySQL, EDG,
  • Simulation Toolkits
  • Pythia, Herwig, Geant4, Fluka,

30
Summary
  • HEP applications are characterized by
  • Amounts and complexity of the data
  • Large size and geographically dispersed nature of
    the collaborations
  • Most of the algorithmic software written by
    Physicists
  • Expected long lifetimes
  • Development of Software Frameworks
  • Ensure coherency in the Event data processing
    applications
  • Make the life of Physicists easier by hiding most
    of the technicalities
  • Withstand technology changes
  • A variety of different software domains and
    expertise required
  • Data Management, Simulation, Interactive
    Visualization, Distributed Computing, etc.
  • Extensive use of third-party generic software
  • Open-source products favored

31
The LHC Detectors
Write a Comment
User Comments (0)
About PowerShow.com