2001%20Summer%20Student%20Lectures%20Computing%20at%20CERN%20Lecture%203%20 - PowerPoint PPT Presentation

About This Presentation
Title:

2001%20Summer%20Student%20Lectures%20Computing%20at%20CERN%20Lecture%203%20

Description:

2001 Summer Student Lectures. Computing at CERN. Lecture 3 ... The LEP experiments still coped with change. 27. Tony Cass. batch. physics. analysis. event ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 47
Provided by: tony9
Category:

less

Transcript and Presenter's Notes

Title: 2001%20Summer%20Student%20Lectures%20Computing%20at%20CERN%20Lecture%203%20


1
2001 Summer Student LecturesComputing at
CERNLecture 3 Looking ForwardsTony Cass
Tony.Cass_at_cern.ch
2
Data and Computation for Physics Analysis
event filter (selection reconstruction)
detector
processed data
event summary data
raw data
batch physics analysis
event reconstruction
analysis objects (extracted by physics topic)
event simulation
interactive physics analysis
3
LEP and LHC Parameters Compared
4
Evolution of CERN Computing NeedsCPU Capacity
1997-2002
5
Evolution of CERN Computing NeedsTape Storage
1995-2000
6
Evolution of CERN Computing NeedsCPU Capacity
1997-2006
10'000'000
Infrastructure
9'000'000
8'000'000
Engineering
7'000'000
Others
6'000'000
LEP
5'000'000
NA48
CERN Units
4'000'000
NA45
3'000'000
2'000'000
COMPASS
1'000'000
LHC
0
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
year
7
Evolution of CERNs CPU requirementsA different
view
Estimated CPU Capacity required at CERN
K SI95
5,000
Moores law some measure of the capacity
technology advances provide for a constant number
of processors or investment
4,000
LHC
3,000
2,000
Other experiments
1,000
0
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
Jan 20003.5K SI95
8
Evolution of CERN Computing NeedsTape Storage
1995-2006
9
HELP!
  • The previous slides show that we have to cope
    with a dramatic increase in computing capacity
    before the start of LHC.
  • Can we afford it?
  • How many boxes are needed (i.e. can we manage the
    equipment)?
  • Fortunately, the price of computing equipment
    falls each year. Will this help us?

10
CPU and Disk Cost Predictions
CPU Costs
Disk Costs
11
CPU and Disk Cost Predictions
CPU Costs
Disk Costs
year
12
CPU and Disk Cost Predictions
CPU Costs
Disk Costs
13
CPU and Disk Cost Predictions
CPU Costs
Disk Costs
14
Disk Storage The Bad News
1996
2000
4GB 10MB/s
50GB 20MB/s
?
1TB
?
250x10MB/s2,500MB/s
20x20MB/s400MB/s
I/O
15
Tape Storage Estimates
  • Although CPU and disk costs are expected to
    decrease dramatically, current estimates are that
    the cost of tape storage and tape devices will
    fall by less than a factor of 2 over the next 8
    years.
  • These are not commodity items!
  • Most tape use is for archive storage (write once,
    read never), not HEP like usage.
  • Who backs up their home PC?
  • Tape storage and tape devices are expected to
    represent a significant fraction of the cost of
    computing for the LHC experiments, particularly
    for ALICE.

!
16
Which system architecture for LHC?
Which of the different system architectures SMP
Scalable Distributedis appropriate for the LHC
experiments?
17
Networks and CPU load
  • High bandwidth commodity networks carry the
    baggage of their low speed commodity origins.
  • The MTU1 for Gigabit Ethernet is still the 1.5KB
    of Ethernet.
  • c.f. 64KB for HiPPI.
  • processing packets takes timeand, with Gigabit
    Ethernet, the packets come thick and fast.

1MTU MaximumTransmission Unit
18
Can we build LHC Computing Farms?
  • Probably or almost certainly, depending on your
    level of optimism.
  • On the positive side
  • CPU and disk price/performance trends suggest
    that the raw processing and disk storage
    capacities will be affordable, and
  • raw data rates and volumes look manageable
  • perhaps not today for ALICE.
  • But this does not mean it will be easy.
  • Many, many boxes will be needed compared to
    todays systems.
  • Building and managing coherent systems from such
    large numbers of boxes will be a challenge.

1999 CDR _at_ 45MB/s for NA48!
2000 CDR _at_ 90MB/s for Alice!
19
LHC Computing Worldwide
This picture, from the CMS CTP, shows how a
regional centre, here Fermilab, fits into the
computing environment between CERN and
universities. It is assumed here that high
bandwidth networks are available between CERN and
this, US based, regional centre. However, the
possibility of an Air Freight link for data
transfers is also indicated.
Although regional centres, in the US and
elsewhere, will certainly exist, we do not yet
know how best to make use of the facilities they
will offer. Can we link CERN and all the regional
centres into one global facility, usable from
everywhere? Or do the regional centres just
provide resources for their local clients?
20
LHC Computing Worldwide - MONARC
  • The MONARC Project has been set up to study these
    issues.
  • Models Of Networked Analysis at Regional Centres
  • More input on the practicalities of global
    analysis is needed for
  • the Computing Progress Reports to be produced
    this year by ATLAS and CMS and maybe other
    experiments
  • Funding Agencies, especially in the US, and
  • Planning!

21
The Grid
  • Over the past year, the Grid metaphor for
    providing access to remote computing resource has
    become popular.
  • Will the Grid bind Regional centres together?
  • Studies are underway in Europe and the US.

22
The Globus Toolkit
  • Providing transparent access to different
    computing resources requires an interface layer
    which hides details of
  • batch systems (LSF, LoadLeveler, Condor),
  • security and authentication
  • The Globus Toolkit has been developed as just
    such an interface layer and is being tested at
    CERN and other HEP labs.
  • Theres still a long way to go, though!

23
LHC Computing Grid
The LHC Computing Centre
24
Authentication Kerberos vs PKI
  • Kerberos is a popular authentication and access
    control system. I prove I know something (my
    password) and a central server gives me a ticket
    to access resources.
  • I have a ticket, so I just need to type my
    password once,
  • But a central server is needed at each site.
  • In a Public Key system, I have a certificate
    signed by some trusted body which I need to show
    to prove who I am.
  • My certificate will be accepted by anybody who
    trusts the organisation that signed my
    certificate,
  • but I must protect it so you dont steal it and
    use it instead! So I have to type a password or
    passphrase whenever I need to use the
    certificate.

25
Software Concerns for LHC
  • Software will throw LHC data away.
  • software (human!) errors will lose data forever.
  • Would you take this responsibility? Can you write
    bug free code?
  • What would you do if you were managing the
    worldwide effort?
  • Object Oriented techniques are todays industry
    standard and LHC experiments must impose best
    practice.
  • There are also secondary considerations
  • widespread use of OO techniques outside HEP
    implies widespread availability of support tools
    and software, and
  • OO trained (ex) physicists will find more
    employment opportunities.

26
Software Concerns for LHC II
  • Everything will change between now and 2005.
  • The computing environment
  • Unix vs NT.
  • The programming language
  • C vs Java.
  • The in things
  • OO vs ? Java vs ? what will computers look
    like in 2005?
  • These all changed for LEP and those planning for
    LHC must take this into account.
  • But maybe were being too worried. LEP was
    planned at a time when IBM mainframes and DEC
    minis looked invincible. The LEP experiments
    still coped with change.

27
Data and Computation for Physics Analysis
event simulation
Storage Solutions Zebra, Objectivity/DB, ROOT
event summary data
Simulation Packages GEANT3, GEANT4, FLUKA
raw data
analysis objects (extracted by physics topic)
batch physics analysis
event reconstruction
event filter (selection reconstruction)
interactive Physics analysis
Experiment frameworks provide interfaces to
storage and common services. HEP toolkits and
packages provided to meet common needs.
Analysis and visualisation packages HBOOK, PAW,
ROOT, Lizard, Iguana, JAS
Everything built using language standards, e.g.
STL
28
OO Techniques and Data Storage/Management
  • HEP has added many Data Storage and Management
    systems on top of Fortran
  • e.g. Zebra for data structures, FATMEN for
    event/file management
  • With the move to OO, can HEP use OO databases for
    event storage and management?
  • Can it be done?
  • Is it efficient?
  • It seems the answer is yes. How do we really
    switch to this model?
  • LHC software designers have to embrace this model
    of working now and work to provide optimised
    storage/processing environments.

29
Why use an Object Database?
Raw data is reconstructed to produce ESD/AOD and
then interesting eventsare selected for further
study.
Hits
Tracker
In the traditional schemethis produces different
data sets and going back from a high to a low
level is difficult.
?dst/ntuple
Tracks
Raw Data
ESD
Event Header
AOD
Particles
Bookkeeping Database
AOD
Event Tags
With an object model and an object database it is
much easier to navigate between the different
levels of description of an event.
ESD
Event Header
Event Header
Raw
30
Why use an Object Database?
  • Hiding the details of the file storage is done by
    the database manager. An RDBMS (e.g. Oracle) also
    hides details of file storage, so why use an
    ODBMS?
  • With an ODBMS, the underlying details of the I/O
    are hidden. The program variables are the storage
    variables, there is no need for explicit copying
    by the programmer.
  • Physicists dont set out to select all tracks of
    a given event. They might want to access some
    tracks of an event, though. This sort of access
    maps better onto an object database.
  • An ODBMS allows applications to suggest that
    parts of an event should be stored close to each
    otherrather than storing all tracks close
    together.
  • But these dont seem to be general
    requirementsthe ODBMS market has not taken off.
    We need to be careful!

31
Data Databases in 2001
  • RDBMS vendors have been moving towards the ODBMS
    market for some timeintroducing
    Object-Relational DBMS.
  • Oracle 9i, with the recently announced C
    interface, provides all the ODBMS features of the
    previous slide.
  • You can now navigate between objects in the
    database.
  • We are now actively testing the use of Oracle 9i
    for physics data. Particular aspects being
    investigated are
  • Scalability Storage overhead
  • Mass Storage System Integration Data
    Import/Export
  • Initial results are promising.

32
Toolkits versus Frameworks
Toolkits Sets of generic procedures1 that can be invoked to perform related tasks. Do not constrain users (apart from parameter lists!). Can be provided by experiments but also by others, e.g. IT or 3rd parties.
Frameworks Systems to decide the order of execution, invoke procedures to do necessary work in determined order including user procedures. Constrain users to work within the overall architecture. Are experiment specific.
1 Note that the word procedure is used here in
a general sense. In terms of procedural
languages, procedures are subroutines and
functions. For an Object Oriented language, a
procedure is a class.
33
Toolkit Design
  • A toolkit should be
  • generic so it can be used in more than one
    framework
  • independent i.e. not forcing the use of other
    toolkits
  • well defined with clear interfaces so it can be
    replaced.

34
Data Analysis Toolkits for LHC
  • Just as for the data storage/management, HEP has
    developed a specialised, Fortran based, analysis
    environmentHBOOK, PAW and CERNLIB as a whole.
  • These needed to be rewritten/reinvented as HEP
    moved to OO techniques.
  • Can we instead profit from commercial data
    analysis tools?
  • OO based simulation packages are needed nowand
    GEANT4 is becoming a reality.
  • The GEANT4 project, launched in 1994, is also a
    demonstration of effective worldwide
    collaboration on a major software project,
  • and there is much interest in GEANT4 beyond HEP.

35
(LHC) Framework Design Choices
  • From the user point of view, an experiment
    computing framework ensures that they can write
    code for a specific purpose (e.g. analysis or
    detector reconstruction) without having to worry
    about anything else
  • The framework ensures that
  • objects and services they need are made
    available, and
  • any objects they create will be stored if
    required.
  • The three LHC frameworks are best distinguished
    by the choices they have made in two areas.
  • Exposure of the persistency model for storage. Do
    users work with transient or persistent objects?
    Do users see the inheritance from the base
    persistence class?
  • Procedure invocation. Do users themselves decide
    the order of invocation of a set of procedures to
    produce a given object? Or do they demand the
    object and leave the framework to decide which
    procedures must be invoked to produce it?

36
GAUDI (after the Catalan architect)
Converter
Converter
Application Manager
Converter
Transient Event Store
Data Files
Persistency Service
Message Service
Event Data Service
JobOptions Service
Algorithm
Algorithm
Algorithm
Data Files
Transient Detector Store
Persistency Service
Particle Prop. Service
Detec. Data Service
Other Services
Data Files
Transient Histogram Store
Persistency Service
Histogram Service
37
CARF (CMS Analysis and Reconstruction Framework)
Application Framework
Physics modules
Reconstruction Algorithms
Event Filter
Physics Analysis
Data Monitoring
Calibration Objects
Event Objects
Visualization Objects
Utility Toolkit
ODBMS
C standard library Extension toolkit
Geant4
CLHEP
PAW Successor
LHC
38
AliROOT (Alice and ROOT)
Transport Engine selected at run time
Fast MC
Generators
FLUKA
Geant4
Geant3.21
Virtual MC
Geometry Database
39
LHC Frameworks Another Comparison
  • In Object Solutions, Booch says that there are
    three basic types of object oriented
    applications.
  • If they focus on they are
  • direct visualization and manipulation of
    the user-centric
  • objects that define a certain domain
  • preserving the integrity of the persistent
    objects data-centric
  • in a system
  • the transformation of objects that are
    computation-centric interesting to the system
  • Using this categorisation, we could say that
  • AliROOT is user-centric
  • CARF is data-centric
  • GAUDI is computation-centric

40
Non-event data
  • To make sense of an event, the raw detector data
    is not enough. Non-event data is needed
  • for the overall geometry and structure of the
    detector, including information about magnetic
    fields, and
  • as they are not perfectly still, to understand
    the real positions of the subdetectors at the
    moment of the collision
  • to have the correct detector calibration at the
    time of the collision as detector response also
    changes (e.g. with temperature) and
  • about the run conditions of the accelerator, e.g.
    beam energy, at the time of the collision.
  • All of these non-event data must also be stored
    and managed.

41
The Overall Picture
Globally, then, tags point to a collection of
events which are in a collection of runseach of
which has certain properties such as energy or
calibration constants.
How do these different collections fit together?
Or event data can be kept in one database with
non event data kept in a different
databaseeither object or relational.
42
When are objects created?
  • Once
  • as part of some standard processing step (e.g.
    reconstruction) run
  • for all (interesting) events in batch mode, or
  • when needed for any individual event.
  • Many times
  • Once at least! See above
  • But also as necessary if recomputing using local
    data is faster than fetching the existing objects
    from some remote system.

43
Computer Supported Collaborative Working
  • As we have seen, LHC collaborations are huge,
    with people distributed around the globe. A
    notable change from previous CERN experiments is
    the significant contribution expected from US
    institutes. Computers aid wide spread
    collaboration in a number of ways.

44
Video Conferencing
  • There are two varieties of Video Conferencing.
  • CODEC based video conferencing works well and is
    much used commercially, but
  • it is expensive, and conferencies with 3 or more
    sites require special equipment.
  • IP based video conferencing is cheapconnections
    already existand many people can participate,
    but
  • network links, especially those to the US, are
    already overloaded, and we cant yet reserve
    bandwidth for video conferencing.
  • An LHC project is trying to make the two systems
    interoperate.
  • Use of Video Conferencing is growing and the LHC
    collaborations will benefit more if network
    bandwith increases
  • or if we can manage to reserve bandwidth solely
    for conferences.

45
Looking ForwardsSummary
  • LHC demands for CPU and I/O capacity
    significantly exceed those of the LEP
    experiments.
  • Fortunately, experiments such as COMPASS have
    intermediate requirements and allow us to study
    the problems before LHC startup.
  • CPU cost trends suggest we can afford distributed
    computing farms which provide adequate resources
  • but we have to start installing these in
    2003/2004.
  • Software quality is a major concern for the LHC
    experiments.
  • Object Oriented techniques are being adopted.
  • This allows us to consider the use of Object
    Oriented Databases for data management and other
    commercial packages for analysis work.

46
Computing at CERNConclusions
  • Computing at CERN is interesting! Computing at
    CERN is about Data!
  • Computing facilities at CERN are essential for
    designing, building and operating both
    accelerators and detectors.
  • Computers, of course, play a key role in the
    reconstruction and analysis of the raw data
    collected by experiments.
  • There are many interesting challenges as we look
    forward to high data rate experiments in the next
    couple of years and beyond to the LHC.
Write a Comment
User Comments (0)
About PowerShow.com