Rene BrunCERN - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Rene BrunCERN

Description:

The ALICE collaboration includes 1223 collaborators from 85 ... Brahms. 2649 nodes. 16. Computing in ALICE. R.Brun ACAT2002. Atlas. 29 million nodes. 17 ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 53
Provided by: feder85
Category:
Tags: bruncern | brahms | rene

less

Transcript and Presenter's Notes

Title: Rene BrunCERN


1
Computing in ALICE
  • Rene Brun/CERN
  • ACAT 2002
  • Moscow, 27 June

2
Alice Collaboration
The ALICE collaboration includes 1223
collaborators from 85 different institutes from
27 countries.
ALESSANDRIA ALIGARH AMSTERDAM
BARI Politecnico BARI University
BEIJING BERGEN College BERGEN
University BHUBANESWAR BIRMINGHAM
BOLOGNA BRATISLAVA BUCHAREST IPNE
BUDAPEST CAGLIARI CATANIA
CERN CHANDIGARH CLERMONT-FERRAND
COLUMBUS State University COLUMBUS
Supercomputer Center COPENHAGEN
DARMSTADT GSI DARMSTADT IKF DUBNA JINR
DUBNA RCANP FRANKFURT GATCHINA

HEIDELBERG-PHYSIKAL HEIDELBERG
KIRCHHOFF JAIPUR JAMMU
JYVASKYLA KANGNUNG KHARKOV IPT
KHARKOV SRTIIM KIEV KOLKATA SAHA
KOLKATA VECC KOSICE IEP KRAKOW
KURCHATOV LAUSANNE LEGNARO
LISBON LUND LYON MEXICO
MOSCOW INR MOSCOW ITEP MOSCOW MEPHI
MUENSTER NANTES NOVOSIBIRSK
  • OAK RIDGE
  • ORSAY
  • OSLO
  • PADOVA
  • POHANG
  • PRAGUE
  • PROTVINO
  • REZ
  • ROMA LA SAPIENZA
  • RONDEBOSCH
  • SACLAY
  • SALERNO
  • SAROV VNIIEF
  • ST PETERSBURG SU
  • STRASBOURG
  • TBILISI GA
  • TBILISI SU
  • TORINO
  • TRIESTE

3
(No Transcript)
4
What a Pb-PB looks like(/100) ?
5
Strategic Decision in 1998
ROOT Framework
Geant3, PAW Fortran
Geant3 Alice
Geant4
kinematics in C
Virtual MC
Hits Zebra/RZ/FZ
Geometry in C
Hits Digits ROOT Trees
NIET
6
Framework
  • AliRoot framework
  • C 400kLOC 225kLOC (generated) macros
    77kLOC
  • FORTRAN 13kLOC (ALICE) 914kLOC (external
    packages)
  • Maintained on Linux (any version!), HP-UX, DEC
    Unix, Solaris
  • Works also with new Intel icc compiler
  • Two packages to install (ROOTAliRoot)
  • Less that 1 second to link (thanks to 37 shared
    libs)
  • 1-click-away install download and make
    (non-recursive makefile)
  • AliEn
  • 25kLOC of PERL5 (ALICE)
  • 0.5MLOC of PERL5 (external packages) 1MLOC
    (opens source components)
  • Installed on more than 30 sites by physicists
  • gt 50 active users participate in the development
    of AliRoot from the
  • different detector groups
  • 70 of the code developed outside, 30 by the
    core Offline team

7
AliRoot layout
G3
G4
FLUKA
ISAJET
HIJING
AliRoot
Virtual MC
EVGEN
MEVSIM
HBTAN
STEER
PYTHIA6
PDF
STRUCT
EMCAL
ZDC
ITS
PHOS
PMD
TRD
TOF
RICH
HBTP
CASTOR
FMD
MUON
TPC
START
RALICE
ROOT
8
Whiteboard Data Communication
9
Simulation
10
Simulation tools
Geant3 created in 1981 Still used by the
majority of experiments
Geant4 A huge investment Slow penetration in
HEP experiments
Fluka State of the art for hadronics and neutron
physics
11
The Virtual MonteCarlo
AliRoot
DAQ Online
This strategy facilitates migration or
comparisons with a common input and a common
output
Geant3
Kinematics
Geometry
Geant4
Fluka
Geant3.tar.gz includes an upgraded Geant3 with a
C interface
TVirtualMC
Hits, Digits
Geant4_mc.tar.gz includes the TVirtualMC
lt--gtGeant4 interface classes
12
Detector Geometry (view 1)
Simulation program Geant4-based
C classes
Geant4 geometry
Visualisation
XML files
Reconstruction program
MySQL
13
Detector Geometry (view 2)
Simulation program Geant3-based Geant4-based Fluka
-based
C classes
Geometry package
MySQL
Reconstruction program
Visualisation
14
Alice
3 million nodes
15
Brahms
2649 nodes
16
Atlas
29 million nodes
17
Reconstruction
18
Tracking efficiencies
  • TPCITS tracking efficiency at dN/dy 8000
  • standard definition of good vs. fake tracks
    requires all 6 ITS hits to be correct
  • most of incorrect tracks just have one bad
    point
  • when this definition is relaxed to 5 out of 6
    hits the efficiency is better by 7 - 10 (fake
    track probability correspondingly lower)

19
Secondary vertices
  • K0s
    L

20
TPC particle identification
  • At dN/dy 8000
    At dN/dy 4000
  • s dE/dx 10
    s dE/dx 7

21
ITS dE/dx
4 ITS layers (silicon drift and strip) capable to
measure dE/dx s dE/dx 10-11
Separation
22
Reconstruction Status
  • Reconstruction development proceeds well
  • lower than expected efficiency for secondary
    tracks
  • outstanding issues
  • improvement in seed finder
  • kink finder for charged K decays in TPC
  • V0 finding in larger fiducial volume
  • recovery for electron bremsstrahlung in Kalman
    filter (? started)
  • need an effort in calibration and alignment
    software
  • Charged particle identification
  • we have studies of performance on detector levels
    (ITS, TPC, TOF, HMPID)
  • outstanding issues
  • dependence on particle density
  • correct connection with tracking detectors
  • global particle identification performance
  • Photon identification
  • study in the real environment

23
Physics Performance Report
24
ALICE Physics Performance Report
  • Detailed evaluation of ALICE performance based on
    present design of the detector using the latest
    simulation tools
  • Acceptance, efficiency, resolution for signals
  • Need O(107) equivalent central HI events
  • 24h_at_600MHz/central event 300,000 PC's for one
    year!
  • Need alternative
  • Step1 Generate parametrised background summable
    digits
  • Step2 Generate on the flight the signals and
    merge
  • Step3 Event analysis
  • O(104) background reused O(103) times
  • Pilot HI production
  • Step1 10,000 central Pb-Pb events (30TB, done)
  • Step2 105 signals (10TB x nsignals, just
    started)
  • Step3 Analysis Object Data 1 TB x nsignals (to
    be done)
  • For pp simpler situation, pilot production
    already done

25
Example
26
Analysis pp production
27
GRID activities
28
ALICE GRID resources
http//www.to.infn.it/activities/experiments/alice
-grid
37 people21 insitutions
29
AliEn framework
  • AliEn framework in brief
  • Lightweight, simplified but fully functional GRID
    implementation
  • Distributed file catalog built on top of RDBMS
    with user interface that mimics the file system
  • Authentication module which supports various
    authentication methods (including strong
    certificate based)
  • Task queue which holds commands to be executed in
    the system (commands, inputs and outputs are all
    registered in catalogue)
  • Metadata catalogue
  • Services that support above components
  • C/C/perl API
  • It provides a coherent interface and shields
    users from rapid changes in environment

30
Components
  • AliEn Components (available as separate RPMs)
  • AliEn-Base
  • Contains external modules (more than 100,
    including Globus)
  • AliEn-Client
  • Basic client functionality, needed to access LFNs
  • AliEn-Server
  • AliEn Server, one per Virtual Organization
  • AliEn-SE
  • Storage Element, must be installed on sites which
    provide MSS functionality
  • AliEn-CE
  • Computing Element, must be installed if site
    wants to participate in production

31
Components
  • Optional components
  • AliEn-GUI
  • Graphical User interface, optional component
  • AliEn-Monitor
  • Monitor is required by Server to enable advanced
    RB features
  • AliEn-Portal
  • This is AliEn Web site
  • AliEn-Alice (and now AliEn-Atlas)
  • Description of Alice Virtual Organization
  • Packages
  • Commands
  • The RPMs can be found at http//alicedb.cern.ch/G
    RID/current

32
File catalogue
Tier1
--./ --cern.ch/ --user/
--a/ --admin/
--aliprod/ --f/
--fca/ --p/
--psaiz/ --as/
--dos/
--local/
--b/ --barbera/
ALICE
LOCAL
--36/ --stderr --stdin
--stdout --37/ --stderr
--stdin --stdout --38/
--stderr --stdin --stdout
--simulation/ --2001-01/ --V3.05/
--Config.C --grun.C
Files, commands (job specification) as well as
job input and output and tags are stored in the
catalogue
33
Production Summary
105 CPU hours
13 clusters, 9 sites
  • 5682 events validated, 118 failed (2)
  • Up to 300 concurrently running jobs worldwide (5
    weeks)
  • 5 TB of data generated and stored at the sites
    with mass storage capability (CERN 73,CCIN2P3
    14, LBL, 14, OSC 1)
  • GSI, Karlsruhe, Dubna, Nantes, Budapest, Bari,
    Zagreb, Birmingham(?), Calcutta in addition ready
    by now

34
Other productions
  • EMCAL production
  • Design and optimisation of the proposed EM
    calorimeter
  • Entirely AliEn based
  • Decided, implemented and realised in two months
  • 2000 jobs, 4000h CPU
  • pp production august 2001
  • 10000 events generated at CERN in CASTOR
  • Transport TPC digitization
  • pp production october 2001
  • Vertex sampling in the diamond
  • gt 10000 events 80 GB in 3 days in Torino
    transferred to CERN with AliEn in 20 hours
    (band-width saturated)
  • Test and tune tracking

35
Alien A lightweight working GRID
  • AliEn framework is a lightweight, simplified but
    functionally equivalent alternative to full blown
    GRID based on standard components (SOAP, Web
    services)
  • AliEn has been already used in production for
    Alice PPR
  • It will be continuously developed with aim to
    provide long term stable interface to GRID(s)
    for Alice users
  • AliEn will be used to provide GRID component for
    MammoGRID 3 year, 2M Euro project funded by
    EC, starting in September

36
Grid PROOF
Local
Remote
Bring the KB to the PB and not the PB to the KB
37
DataGrid service integration
Root RDB
Selection parameters
Grid RB
Spawn PROOF tasks
Update Root RDB
38
ALICE Data challenges
Simulated Data
Raw Data
DAQ
ROOT I/O
ROOT
CERN TIER 0 TIER 1
Regional TIER 1 TIER 2
CASTOR
GRID
39
ALICE Data Challenge III (2001)
  • Need to run yearly DC of increasing complexity
    and size to reach 1.25GB/s
  • ADC III gave excellent system stability during 3
    months
  • DATE throughput 550 MB/s (max) 350 MB/s
    (ALICE-like)
  • DATEROOTCASTOR throughput 120 MB/s, lt85gt MB/s
  • 2200 runs, 2 107 events, 86 hours, 54 TB DATE
    run
  • 500 TB in DAQ, 200 TB in DAQROOT I/O, 110 TB in
    CASTOR
  • 105 files gt 1GB in CASTOR and in MetaData DB
  • HP SMPs cost-effective alternative to
    inexpensive disk servers
  • Online monitoring tools developed

40
ALICE Data Challenge IV
  • 2002
  • The ALICE DAQ, ALICE Offline, CASTOR, ROOT
    projects
  • and
  • The IT/ADC, IT/CS, IT/DS groups

41
DAQ Functional Goals
  • DAQ
  • The complete data acquisition chain
  • ALICE optical links (DDLs) as data sources
  • Requires to install PCI boards in some PCs
  • Fast DDL/PCI interfaces able to saturate PCI 32
    (PCI RORC)
  • DATE V4
  • New data format
  • LDC software scattered data transfer
  • GDC software event builder (flexible policy and
    load balancing)
  • New run control (finite-state machines)
  • HLT software interface
  • AFFAIR V1 improved performance reporting tools
  • More realistic data traffic (different trigger
    types, different readout)

42
Technology Goals
  • CPU servers
  • Dual CPUs (LXSHARE)
  • SMP machines (HP Netservers)
  • Network
  • 100 nodes on Gigabit Ethernet network
  • 100 nodes on Fast Ethernet
  • 10 Gbit Eth. backbone
  • Storage
  • Disk servers
  • IDE-based disk servers
  • SAN and/or NAS-based solution
  • Tapes
  • Test present generation of tapes
  • Production new generation of tapes ( 30 MB/s,
    200 GB/vol.)

43
ADC IV Hardware Setup
4 Gigabit switches
3 Gigabit switches
TBED00
01-12
13-24
25-36
37-48
01D-12D
13D-24D
25D-36D
LXSHARE
3
3
3
3
2
2
2
TBED0007D
TOTAL 18 ports
TOTAL 32 ports
6
CPU servers on FE
2
Fibers
3
3
3
3
49-60
61-72
73-76
77-88
89-112
Backbone (4 Gbps)
4 Gigabit switches
2 Fastethernet switches
10 TAPE servers (distributed)
Total 192 CPU servers (96 on Gbe, 96 on Fe), 36
DISK servers, 10 TAPE servers
44
Performances DATE
Data generation in LDC, event building, no data
recording
45
Performances DATE CASTOR
Data generation in LDC, event building, data
recording to disk
46
Longest DATE run
  • Recording to /dev/null
  • 26 LDCs
  • 51 GDCs
  • 0.5 day non-stop
  • 575 MB/s sustained
  • Recording to CASTOR (no objectification)
  • 26 LDCs
  • 51 GDCs
  • 4.5 days non-stop
  • to disk
  • 140 TB
  • 350 MB/s sustained

47
Still to be done
  • Addition of a few ALICE DDL (optical links) with
    electronics data generators
  • Data generation in LDC, event building and data
    recording to tape. The goal is to reach 200 MB/s
    sustained.
  • Use of direct-memory access instead of a pipe for
    data transfer from DATE to ALIMDC
  • Test with ALICE simulated data generated with
    AliROOT
  • Test with SAN equipment connected to Gigabit
    Ethernet (see architecture next slide)
  • Fibre Channel Disk arrays from Dot Hill
  • Fibre Channel Tape robot from Qualstar
  • Test with new tape generation from STK
  • (30 MB/s, 200 GB/tape volume)
  • Transfer of a fraction of the data to regional
    centers as a Grid prototype

48
Offline Organisation
49
CORE Off-line staffing
  • ALICE opted for a light core CERN offline team
  • Concentrate on framework, software distribution
    and maintenance all detector software written
    outside
  • 17 FTEs are needed for the moment 2 are missing
    (-10)
  • But only 3 permanent scientific staff (17)
  • plus 10-15 people from the collaboration
  • GRID coordination (Torino), ALICE World Computing
    Model (Nantes), Detector database (Warsaw)
  • Efforts are made to alleviate the shortage
  • LCG project should provide support for common
    solutions

50
Software Development Process
  • Users developers are distributed but contribute
    to code
  • A development cycle adapted to ALICE has been
    elaborated
  • Developers work on the most important feature at
    any moment
  • A stable production version exists
  • Collective ownership of the code
  • Flexible release cycle
  • Simple packaging and installation
  • Freely inspired by modern Software Engineering
  • Kent Beck, Extreme Programming Explained, Addison
    Wesley
  • Martin Fowler, The New Methodology,
    http//www.martinfowler.com/articles/newMethodolog
    y.html
  • Linux, gnu, ROOT, KDE, GSL
  • Design micro-cycles happen continuously
  • 2-3 macro-cycles per year
  • Discussed implemented at Off-line meetings and
    Code Reviews
  • Corresponding to major code releases

51
Planning
  • We have high-level milestones for technology and
    physics
  • Data challenges test the basic technology and the
    functional integration DAQ Off-line
  • Physics challenges test the functionality of
    Off-line from the user viewpoint. The first
    physics challenge is providing the results for
    the Physics Performance Report
  • AliRoot is now in its fourth year of life
  • We have a clear view of the high-level
    requirements for the current system. These are
    expressed in the Computing Technical Report (in
    preparation, 3Q2002)
  • A fine grained planning has been defined

52
Summary
  • The ALICE Off-line model has a good record of
    achievements
  • Users are involved and supportive in
    collaborative software design
  • Technical Design Reports, PPR and Data Challenges
    done with AliRoot
  • Large code redesigns were made without disruption
    of work
  • Large distributed productions have been made with
    AliEn
  • AliRoot has the potential to follow the
    technology evolution to meet the needs of ALICE
    at LHC
  • The ALICE Software Process is both lightweight
    and effective
  • Support for basic tools (ROOT, FLUKA, CASTOR) is
    needed
  • Current planning estimates for ALICE Offline are
    based on it
  • ALICE has high hopes in the common projects to be
    defined in the LCG
  • ALICE choices are now considered for common
    projects
Write a Comment
User Comments (0)
About PowerShow.com