Computing in HEP - PowerPoint PPT Presentation

1 / 28
About This Presentation

Computing in HEP


Users know and love/hate the software, and they don't want to change ... extensible library based on Gemini engine. Gemini - core fitting engine based on ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 29
Provided by: andreasp9


Transcript and Presenter's Notes

Title: Computing in HEP

Computing in HEP
  • A Introduction to Data Analysis in High Energy
  • Max Sang
  • Applications for Physics Infrastructure Group
  • IT Division, CERN, Geneva

Introduction to HEP
  • Accelerators produce high intensity, high energy
    beams of particles like protons or electrons.
  • Detectors are huge, multi-layered electronic
    devices constructed around the points where the
    beams collide with targets or other beams.
  • Planned and constructed by multinational
    collaborations of hundreds of people over several
  • Once operational, they run for years (e.g. LEP
    program 1989-2000).

The Large Hadron Collider
Eight underground caverns for detectors
27km circumference 100m below surface First beam
  • Under construction now - ready 2006
  • 21 m long, 15 m diameter
  • 12500 tons
  • As much iron as the Eiffel Tower
  • 1900 physicists from 31 countries

Introduction to HEP (II)
  • Events are like photographs of individual
    subatomic interactions taken by the detectors.
  • Events produced at high rates (kHz-MHz) for
    months at a time with minimal human intervention.
    Analysis continues for years.
  • Fundamental physics processes are quantum
    (probabilistic). They are uncorrelated
    (consecutive events unconnected) but occur at a
    wide range of frequencies - some very rare. Some
    are more interesting than others...

Introduction to HEP (III)
  • Data are grouped into runs, periods, years.
    Calibrations, detector faults, beam conditions,
    etc. are associated with certain time periods,
    e.g. The calorimeter was off during run 1234
  • Event Generators simulate the collisions and
    and produce the final state particles.
  • These are processed by simulated detectors to
    produce Monte Carlo data for comparison with
    what we see in the real thing. Iterative process
    of comparison, tuning, model verification.

Extracting the Data
  • Passage of particles through detector components
    produces ionisation which is amplified to a
    detectable level.
  • Front-end electronics turn pulses into digits.
  • Hardware processing turns digits into hits.
  • Software turns hits into tracks, clusters
  • Multi-level trigger/filter decides what events to
    keep (sometimes only one event in 107).
  • Online reconstruction ? storage.

The LEP Era (Started 1989)
  • Four detectors (300 people each) producing
  • 50 kHz collision rate ? 5 Hz storage rate.
  • Event size 100kB, reconstructed by small farm of
    O(10) very high-end workstations.
  • lt 500 GB/year/experiment
  • Stored on tape (with disk caching) at CERN.
  • Analysed on mainframes by remote batch jobs.
  • Ntuples (? 100MB) returned to user for more
    (interactive) analysis and calculation. Plots
    produced for presentations and papers.

The LHC Era (Starts 2006)
  • 4 detectors (6k people in total)
  • 50 MHz collision rate ? 100 Hz storage rate.
  • 500 GB/s raw data rate after triggering.
  • Event size 1-2 MB, reconstructed by farm of 1k
  • 1 PB/year/experiment in 2007, increasing rapidly.
    Total by 2015 for all detectors 100 PB.
  • Searches may look for single events in 107. Every
    user (in 30 countries) will want to eat millions
    of events at a single sitting, with reasonably
    democratic data access.

Physicists are also Programmers
  • All data analysis done using computers
  • The physicists are all programmers, but almost
    none of them have any formal CS training
  • Some will be very experienced (usually F77). Will
    write lots of code for reconstruction, triggering
  • Others write more modest programs for their own
    data analysis.
  • Some will be fresh graduate students whove never
    written a line of code.
  • Our job is to help them do physics.

What Software do they Need?
  • Experiment-specific code
  • Triggering, data acquisition, slow controls,
    reconstruction, new physics code
  • Mostly written by the experimentalists without
  • Event generators
  • Highly technical, constantly in flux
  • Written by phenomenologists
  • We dont help with these!

What Software do they Need?(II)
  • Specialised HEP tools
  • Detector simulation tools, relativistic
    kinematics, ...
  • General purpose scientific tools with a HEP slant
  • Data visualisation, histogramming, ...
  • General purpose technical libraries
  • Random numbers, matrices, geometry, analytical
    statistics, 2D and 3D graphics, ...
  • We do help with these!

The Situation in 1995
  • Millions of lines of F77, some of it very
  • Thousands of man-years of debugging
  • Users know and love/hate the software, and they
    dont want to change
  • Serious and unavoidable maintenance commitment
    for old code - F77 is here to stay!
  • Shrinking manpower in IT division
  • Not long until the start of the LHC programme.
    Change now or wait until 2020!

The Old Software
  • Largely home-grown in 70s and 80s
  • Persistent storage and memory management ZEBRA
  • Code management PATCHY
  • Scripting KUIP/COMIS
  • Histograms and Ntuples HBOOK
  • Detector simulation GEANT 3
  • Fitting Minimisation MINUIT
  • Mathematics, random numbers, kinematics MATHLIB
  • Graphics HIGZ/HPLOT
  • Visualisation and interactive analysis PAW

The Anaphe Project
  • Provide a modern, object-oriented, more flexible,
    more powerful replacement for CERNLIB with fewer
    people in less time.
  • Identify areas where commercial and/or Open
    Source products can (or must) be used instead of
    home-grown solutions
  • Concentrate efforts on HEP-specific tasks
  • Use object-oriented techniques and plan for very
    long term maintenance and evolution
  • Detector simulation is a separate project (v. big)

Commodity Solutions
  • Luckily, computing has also evolved.
  • What can we get off-the-shelf?
  • Open Source tools
  • Code management (CVS)
  • Graphics (Qt, OpenGL)
  • Scripting (Python, Perl)
  • Commercial products
  • Persistency (Objectivity OODB)
  • Mathematics (Nag library CERN edition)

HEP Community Developments
  • Not everything is being done solely at CERN!
  • CLHEP - C class libraries for HEP
  • Random numbers
  • 3D geometry, vectors, matrices, kinematics
  • Units and dimensions
  • Generic HEP classes (particles, decay chains etc)
  • Generators being moved (slowly) to C
  • The competition (JAS, Open Scientist, Root)

Anaphe C Libraries (I)
  • Fitting FML (fitting and minimisation library)
  • Flexible, extensible library based on Gemini
  • Gemini - core fitting engine based on Nag or
  • Histograms HTL (histogram template library)
  • Histograms are statistical distributions of
    measured quantities - the workhorse of HEP
    analysis. Must be flexible, extensible and very

Anaphe C Libraries (II)
  • QPlotter Graphics package
  • For drawing histograms and more
  • Based on Qt (superset of Motif)
  • NtupleTag
  • Extends concept of ntuple ( static table of
  • Can add with new columns as you work
  • Can navigate back to original events
  • Smart clustering of data
  • See Zsolts presentation...

Interactive Analysis
  • Analysis in HEP Data Mining
  • Extract parameters from large multi-dimensional
  • Typical tasks
  • Plot one or more variables with cuts on yet
    others - exploring the variable space.
  • Perform statistical tests on distributions
    (fitting, moments etc.)
  • Produce histograms etc. for papers or talks.

Interactive Analysis (II)
  • Almost all analyses begin as interactive
    playing with the data and progress organically
    to large, complex, CPU intensive procedures.
  • Step 1 single commands to a script interpreter
    e.g. plot x for all events with y gt 5
  • Step 2 multi-command scripts/macros
  • Step 3 procedures can be translated into C
    functions and called interactively
  • Step 4 user can build new libraries and interact
    with them through the command line (etc...)

Interactive Analysis (III)
  • The progression from command line, to macro, to
    compiled library, should be smooth and simple.
  • Doing the easy things should be easy to allow
    rapid development and prototyping of algorithms.
  • Doing complex things then becomes significantly
    easier than starting from scratch in C
  • Distributed analysis must also be possible (see
    Kubas talk)

Lizard (I)
  • Interactive environment for data analysis using
    the other Anaphe components
  • First prototype (with limited functionality)
    available since CHEP 2000
  • Re-design started in April 2000
  • Beta version October 2000
  • Full version out since June 2001
  • Much more work and testing to do, but already
    approaching (and surpassing) PAW functionality
  • Embedded in Python

Lizard (II)
  • Architecture
  • Everything interacts with everything else through
    their abstract interfaces so the implementation
    is hidden.
  • Commander C classes load the implementation
    classes at run time and become proxies for them.
  • Use SWIG to generate shadow classes from the
    Commander header files. These are compiled into
    the Python library and become accessible as new
    Python objects.
  • Swapping components at run time becomes trivial.

Lizard Screenshot
Behind the Scenes
Automatically generated by SWIG
AIDA Interfaces
Controller Shadow classes
C interfaces
C implementations
Anaphe implementations
  • Use of abstract interfaces promotes weak coupling
    between components.
  • AIDA (Abstract Interfaces for Data Analysis)
    project is extending this to community-wide
    standard interfaces which will allow use of C
    components in Java and vice versa.
  • Developers only need to learn one way of
    interacting with a histogram, which works with
    all compliant implementations.

  • HEP has (and has always had) serious computing
  • The old model (F77 monoliths) is no longer
    workable in the LHC era
  • New software in C and Java uses modern software
    design to plan for the long term
  • Anaphe is CERN IT divisions contribution
  • Flexible, extensible, modular, efficient
  • The LHC is coming and we must be ready!

Further information
  • More information about the detectors and HEP in
  • http//
  • http//
  • CERN IT Division
  • http//
  • The Anaphe project
  • http//
Write a Comment
User Comments (0)