Title: Rene BrunCERN
1Computing in ALICE
- Rene Brun/CERN
- ACAT 2002
- Moscow, 27 June
2Alice Collaboration
The ALICE collaboration includes 1223
collaborators from 85 different institutes from
27 countries.
ALESSANDRIA ALIGARH AMSTERDAM
BARI Politecnico BARI University
BEIJING BERGEN College BERGEN
University BHUBANESWAR BIRMINGHAM
BOLOGNA BRATISLAVA BUCHAREST IPNE
BUDAPEST CAGLIARI CATANIA
CERN CHANDIGARH CLERMONT-FERRAND
COLUMBUS State University COLUMBUS
Supercomputer Center COPENHAGEN
DARMSTADT GSI DARMSTADT IKF DUBNA JINR
DUBNA RCANP FRANKFURT GATCHINA
HEIDELBERG-PHYSIKAL HEIDELBERG
KIRCHHOFF JAIPUR JAMMU
JYVASKYLA KANGNUNG KHARKOV IPT
KHARKOV SRTIIM KIEV KOLKATA SAHA
KOLKATA VECC KOSICE IEP KRAKOW
KURCHATOV LAUSANNE LEGNARO
LISBON LUND LYON MEXICO
MOSCOW INR MOSCOW ITEP MOSCOW MEPHI
MUENSTER NANTES NOVOSIBIRSK
- OAK RIDGE
- ORSAY
- OSLO
- PADOVA
- POHANG
- PRAGUE
- PROTVINO
- REZ
- ROMA LA SAPIENZA
- RONDEBOSCH
- SACLAY
- SALERNO
- SAROV VNIIEF
- ST PETERSBURG SU
- STRASBOURG
- TBILISI GA
- TBILISI SU
- TORINO
- TRIESTE
3(No Transcript)
4What a Pb-PB looks like(/100) ?
5Strategic Decision in 1998
ROOT Framework
Geant3, PAW Fortran
Geant3 Alice
Geant4
kinematics in C
Virtual MC
Hits Zebra/RZ/FZ
Geometry in C
Hits Digits ROOT Trees
NIET
6Framework
- AliRoot framework
- C 400kLOC 225kLOC (generated) macros
77kLOC - FORTRAN 13kLOC (ALICE) 914kLOC (external
packages) - Maintained on Linux (any version!), HP-UX, DEC
Unix, Solaris - Works also with new Intel icc compiler
- Two packages to install (ROOTAliRoot)
- Less that 1 second to link (thanks to 37 shared
libs) - 1-click-away install download and make
(non-recursive makefile) - AliEn
- 25kLOC of PERL5 (ALICE)
- 0.5MLOC of PERL5 (external packages) 1MLOC
(opens source components)
- Installed on more than 30 sites by physicists
- gt 50 active users participate in the development
of AliRoot from the - different detector groups
- 70 of the code developed outside, 30 by the
core Offline team
7AliRoot layout
G3
G4
FLUKA
ISAJET
HIJING
AliRoot
Virtual MC
EVGEN
MEVSIM
HBTAN
STEER
PYTHIA6
PDF
STRUCT
EMCAL
ZDC
ITS
PHOS
PMD
TRD
TOF
RICH
HBTP
CASTOR
FMD
MUON
TPC
START
RALICE
ROOT
8Whiteboard Data Communication
9Simulation
10Simulation tools
Geant3 created in 1981 Still used by the
majority of experiments
Geant4 A huge investment Slow penetration in
HEP experiments
Fluka State of the art for hadronics and neutron
physics
11The Virtual MonteCarlo
AliRoot
DAQ Online
This strategy facilitates migration or
comparisons with a common input and a common
output
Geant3
Kinematics
Geometry
Geant4
Fluka
Geant3.tar.gz includes an upgraded Geant3 with a
C interface
TVirtualMC
Hits, Digits
Geant4_mc.tar.gz includes the TVirtualMC
lt--gtGeant4 interface classes
12Detector Geometry (view 1)
Simulation program Geant4-based
C classes
Geant4 geometry
Visualisation
XML files
Reconstruction program
MySQL
13Detector Geometry (view 2)
Simulation program Geant3-based Geant4-based Fluka
-based
C classes
Geometry package
MySQL
Reconstruction program
Visualisation
14Alice
3 million nodes
15Brahms
2649 nodes
16Atlas
29 million nodes
17Reconstruction
18Tracking efficiencies
- TPCITS tracking efficiency at dN/dy 8000
- standard definition of good vs. fake tracks
requires all 6 ITS hits to be correct - most of incorrect tracks just have one bad
point - when this definition is relaxed to 5 out of 6
hits the efficiency is better by 7 - 10 (fake
track probability correspondingly lower)
19Secondary vertices
20TPC particle identification
- At dN/dy 8000
At dN/dy 4000 - s dE/dx 10
s dE/dx 7
21ITS dE/dx
4 ITS layers (silicon drift and strip) capable to
measure dE/dx s dE/dx 10-11
Separation
22Reconstruction Status
- Reconstruction development proceeds well
- lower than expected efficiency for secondary
tracks - outstanding issues
- improvement in seed finder
- kink finder for charged K decays in TPC
- V0 finding in larger fiducial volume
- recovery for electron bremsstrahlung in Kalman
filter (? started) - need an effort in calibration and alignment
software - Charged particle identification
- we have studies of performance on detector levels
(ITS, TPC, TOF, HMPID) - outstanding issues
- dependence on particle density
- correct connection with tracking detectors
- global particle identification performance
- Photon identification
- study in the real environment
23Physics Performance Report
24ALICE Physics Performance Report
- Detailed evaluation of ALICE performance based on
present design of the detector using the latest
simulation tools - Acceptance, efficiency, resolution for signals
- Need O(107) equivalent central HI events
- 24h_at_600MHz/central event 300,000 PC's for one
year! - Need alternative
- Step1 Generate parametrised background summable
digits - Step2 Generate on the flight the signals and
merge - Step3 Event analysis
- O(104) background reused O(103) times
- Pilot HI production
- Step1 10,000 central Pb-Pb events (30TB, done)
- Step2 105 signals (10TB x nsignals, just
started) - Step3 Analysis Object Data 1 TB x nsignals (to
be done) - For pp simpler situation, pilot production
already done
25Example
26Analysis pp production
27GRID activities
28ALICE GRID resources
http//www.to.infn.it/activities/experiments/alice
-grid
37 people21 insitutions
29AliEn framework
- AliEn framework in brief
- Lightweight, simplified but fully functional GRID
implementation - Distributed file catalog built on top of RDBMS
with user interface that mimics the file system - Authentication module which supports various
authentication methods (including strong
certificate based) - Task queue which holds commands to be executed in
the system (commands, inputs and outputs are all
registered in catalogue) - Metadata catalogue
- Services that support above components
- C/C/perl API
- It provides a coherent interface and shields
users from rapid changes in environment
30Components
- AliEn Components (available as separate RPMs)
- AliEn-Base
- Contains external modules (more than 100,
including Globus) - AliEn-Client
- Basic client functionality, needed to access LFNs
- AliEn-Server
- AliEn Server, one per Virtual Organization
- AliEn-SE
- Storage Element, must be installed on sites which
provide MSS functionality - AliEn-CE
- Computing Element, must be installed if site
wants to participate in production
31Components
- Optional components
- AliEn-GUI
- Graphical User interface, optional component
- AliEn-Monitor
- Monitor is required by Server to enable advanced
RB features - AliEn-Portal
- This is AliEn Web site
- AliEn-Alice (and now AliEn-Atlas)
- Description of Alice Virtual Organization
- Packages
- Commands
- The RPMs can be found at http//alicedb.cern.ch/G
RID/current
32File catalogue
Tier1
--./ --cern.ch/ --user/
--a/ --admin/
--aliprod/ --f/
--fca/ --p/
--psaiz/ --as/
--dos/
--local/
--b/ --barbera/
ALICE
LOCAL
--36/ --stderr --stdin
--stdout --37/ --stderr
--stdin --stdout --38/
--stderr --stdin --stdout
--simulation/ --2001-01/ --V3.05/
--Config.C --grun.C
Files, commands (job specification) as well as
job input and output and tags are stored in the
catalogue
33Production Summary
105 CPU hours
13 clusters, 9 sites
- 5682 events validated, 118 failed (2)
- Up to 300 concurrently running jobs worldwide (5
weeks) - 5 TB of data generated and stored at the sites
with mass storage capability (CERN 73,CCIN2P3
14, LBL, 14, OSC 1) - GSI, Karlsruhe, Dubna, Nantes, Budapest, Bari,
Zagreb, Birmingham(?), Calcutta in addition ready
by now
34Other productions
- EMCAL production
- Design and optimisation of the proposed EM
calorimeter - Entirely AliEn based
- Decided, implemented and realised in two months
- 2000 jobs, 4000h CPU
- pp production august 2001
- 10000 events generated at CERN in CASTOR
- Transport TPC digitization
- pp production october 2001
- Vertex sampling in the diamond
- gt 10000 events 80 GB in 3 days in Torino
transferred to CERN with AliEn in 20 hours
(band-width saturated) - Test and tune tracking
35Alien A lightweight working GRID
- AliEn framework is a lightweight, simplified but
functionally equivalent alternative to full blown
GRID based on standard components (SOAP, Web
services) - AliEn has been already used in production for
Alice PPR - It will be continuously developed with aim to
provide long term stable interface to GRID(s)
for Alice users
- AliEn will be used to provide GRID component for
MammoGRID 3 year, 2M Euro project funded by
EC, starting in September
36Grid PROOF
Local
Remote
Bring the KB to the PB and not the PB to the KB
37DataGrid service integration
Root RDB
Selection parameters
Grid RB
Spawn PROOF tasks
Update Root RDB
38ALICE Data challenges
Simulated Data
Raw Data
DAQ
ROOT I/O
ROOT
CERN TIER 0 TIER 1
Regional TIER 1 TIER 2
CASTOR
GRID
39ALICE Data Challenge III (2001)
- Need to run yearly DC of increasing complexity
and size to reach 1.25GB/s - ADC III gave excellent system stability during 3
months - DATE throughput 550 MB/s (max) 350 MB/s
(ALICE-like) - DATEROOTCASTOR throughput 120 MB/s, lt85gt MB/s
- 2200 runs, 2 107 events, 86 hours, 54 TB DATE
run - 500 TB in DAQ, 200 TB in DAQROOT I/O, 110 TB in
CASTOR - 105 files gt 1GB in CASTOR and in MetaData DB
- HP SMPs cost-effective alternative to
inexpensive disk servers - Online monitoring tools developed
40ALICE Data Challenge IV
- 2002
- The ALICE DAQ, ALICE Offline, CASTOR, ROOT
projects - and
- The IT/ADC, IT/CS, IT/DS groups
41DAQ Functional Goals
- DAQ
- The complete data acquisition chain
- ALICE optical links (DDLs) as data sources
- Requires to install PCI boards in some PCs
- Fast DDL/PCI interfaces able to saturate PCI 32
(PCI RORC) - DATE V4
- New data format
- LDC software scattered data transfer
- GDC software event builder (flexible policy and
load balancing) - New run control (finite-state machines)
- HLT software interface
- AFFAIR V1 improved performance reporting tools
- More realistic data traffic (different trigger
types, different readout)
42Technology Goals
- CPU servers
- Dual CPUs (LXSHARE)
- SMP machines (HP Netservers)
- Network
- 100 nodes on Gigabit Ethernet network
- 100 nodes on Fast Ethernet
- 10 Gbit Eth. backbone
- Storage
- Disk servers
- IDE-based disk servers
- SAN and/or NAS-based solution
- Tapes
- Test present generation of tapes
- Production new generation of tapes ( 30 MB/s,
200 GB/vol.)
43ADC IV Hardware Setup
4 Gigabit switches
3 Gigabit switches
TBED00
01-12
13-24
25-36
37-48
01D-12D
13D-24D
25D-36D
LXSHARE
3
3
3
3
2
2
2
TBED0007D
TOTAL 18 ports
TOTAL 32 ports
6
CPU servers on FE
2
Fibers
3
3
3
3
49-60
61-72
73-76
77-88
89-112
Backbone (4 Gbps)
4 Gigabit switches
2 Fastethernet switches
10 TAPE servers (distributed)
Total 192 CPU servers (96 on Gbe, 96 on Fe), 36
DISK servers, 10 TAPE servers
44Performances DATE
Data generation in LDC, event building, no data
recording
45Performances DATE CASTOR
Data generation in LDC, event building, data
recording to disk
46Longest DATE run
- Recording to /dev/null
- 26 LDCs
- 51 GDCs
- 0.5 day non-stop
- 575 MB/s sustained
- Recording to CASTOR (no objectification)
- 26 LDCs
- 51 GDCs
- 4.5 days non-stop
- to disk
- 140 TB
- 350 MB/s sustained
47Still to be done
- Addition of a few ALICE DDL (optical links) with
electronics data generators - Data generation in LDC, event building and data
recording to tape. The goal is to reach 200 MB/s
sustained. - Use of direct-memory access instead of a pipe for
data transfer from DATE to ALIMDC - Test with ALICE simulated data generated with
AliROOT - Test with SAN equipment connected to Gigabit
Ethernet (see architecture next slide) - Fibre Channel Disk arrays from Dot Hill
- Fibre Channel Tape robot from Qualstar
- Test with new tape generation from STK
- (30 MB/s, 200 GB/tape volume)
- Transfer of a fraction of the data to regional
centers as a Grid prototype
48Offline Organisation
49CORE Off-line staffing
- ALICE opted for a light core CERN offline team
- Concentrate on framework, software distribution
and maintenance all detector software written
outside - 17 FTEs are needed for the moment 2 are missing
(-10) - But only 3 permanent scientific staff (17)
- plus 10-15 people from the collaboration
- GRID coordination (Torino), ALICE World Computing
Model (Nantes), Detector database (Warsaw) - Efforts are made to alleviate the shortage
- LCG project should provide support for common
solutions
50Software Development Process
- Users developers are distributed but contribute
to code - A development cycle adapted to ALICE has been
elaborated - Developers work on the most important feature at
any moment - A stable production version exists
- Collective ownership of the code
- Flexible release cycle
- Simple packaging and installation
- Freely inspired by modern Software Engineering
- Kent Beck, Extreme Programming Explained, Addison
Wesley - Martin Fowler, The New Methodology,
http//www.martinfowler.com/articles/newMethodolog
y.html - Linux, gnu, ROOT, KDE, GSL
- Design micro-cycles happen continuously
- 2-3 macro-cycles per year
- Discussed implemented at Off-line meetings and
Code Reviews - Corresponding to major code releases
51Planning
- We have high-level milestones for technology and
physics - Data challenges test the basic technology and the
functional integration DAQ Off-line - Physics challenges test the functionality of
Off-line from the user viewpoint. The first
physics challenge is providing the results for
the Physics Performance Report - AliRoot is now in its fourth year of life
- We have a clear view of the high-level
requirements for the current system. These are
expressed in the Computing Technical Report (in
preparation, 3Q2002) - A fine grained planning has been defined
52Summary
- The ALICE Off-line model has a good record of
achievements - Users are involved and supportive in
collaborative software design - Technical Design Reports, PPR and Data Challenges
done with AliRoot - Large code redesigns were made without disruption
of work - Large distributed productions have been made with
AliEn - AliRoot has the potential to follow the
technology evolution to meet the needs of ALICE
at LHC - The ALICE Software Process is both lightweight
and effective - Support for basic tools (ROOT, FLUKA, CASTOR) is
needed - Current planning estimates for ALICE Offline are
based on it - ALICE has high hopes in the common projects to be
defined in the LCG - ALICE choices are now considered for common
projects