Title: ALICE Physics Analysis Using the GRID
1ALICE Physics AnalysisUsing the GRID
- I.Belikov, for the ALICE Collaboration
- ICHEP06
- July 26-August 2, 2006
- Moscow, Russia
2- ALICE Collaboration
- 1/2 ATLAS, CMS, 2x LHCb
- 1000 people, 30 countries, 80 Institutes
Total weight 10,000 t Overall diameter 16.00
m Overall length 25 m Magnetic Field 0.5 T
8 kHz (160 GB/sec)
level 0,1 - special hardware
200 Hz (4 GB/sec)
level 2 - embedded processors
30 Hz (2.5 GB/sec)
level 3 (HLT) - PCs
30 Hz (1.25 GB/sec)
data recording offline analysis
3CERN computing power
- High throughput computing based on reliable
commercial components - More than 1500 double CPU PCs
- 5000 in 2007
- More than 3 PB of data on disks tapes
- gt 15 PB in 2007
- Far from being enough !
4ALICE computing model
- Three kinds of data analysis
- Fast pilot analysis of the data just collected
to tune the first reconstruction at CERN Analysis
Facility (CAF) - Scheduled batch analysis using GRID (Event
Summary Data and Analysis Object Data) - End-user interactive analysis using PROOF and
GRID (AOD and ESD) - CERN
- Does first pass reconstruction
- Stores one copy of RAW, calibration data and
first-pass ESDs - T1
- Does reconstructions and scheduled batch
analysis - Stores second collective copy of RAW, one copy
of all data to be kept, disk replicas of ESDs
and AODs - T2
- Does simulation and end-user interactive
analysis - Stores disk replicas of AODs and ESDs
5AliRoot framework
G3
G4
FLUKA
ISAJET
Virtual MC
AliEn LCG
AliRoot
HIJING
AliReconstruction
AliSimulation
EVGEN
MEVSIM
STEER
PYTHIA6
PDF
EMCAL
ZDC
ITS
PHOS
TRD
TOF
RICH
PMD
HBTP
CRT
FMD
MUON
TPC
START
RALICE
STRUCT
ESD
HBTAN
JETAN
AliAnalysis
ROOT
6Alice Environment (AliEn), since 2001
- Based on Open Source components (95 imported
code) - Offers for ALICE users a single interface into
the heterogeneous and fast-evolving GRID reality - More than 130 registered AliEn users
- Whenever possible, uses common services
- LCG/gLite CE RB, gLite FTS for scheduled file
tranfers - ALICE is taking active part in the definition and
testing of these components - The services provided by AliEn are
- ALICE job database and related distributed tools
and services - ALICE file catalogue and related distributed
tools and services - ALICE specific job reporting services
- Their (high-level) functionality is
ALICE-specific and not found elsewhere
7Batch Analysis (1)
- The jobs are described by AliEn JDL files
- Executablestartana
- PackagesROOT5.11.02,
- Splitse
- InputFileLF/alice//MyBatchAnalysis.C
- InputDataLF/alice//AliESDs.root, nodownload
- OutputFileesdAna.root_at_AliceCERNse01,noarchiv
e - Submitted to the AliEn TQ from the AliEn command
line - Submit ltjobnamegt.jdl
- Scheduled, optimized, splitted (based on the
InputData) - Can be monitored and re-prioritised
- ps trace
- The results are registered in AliEn distributed
file catalogue - The job runs on many machines in parallel, as
close to the InputData as possible
8Batch analysis (2)
File Catalogue query
User job (many events)
Data set (ESDs, AODs)
Job output
Job Optimizer
Grouped by SE files location
Sub-job 1
Sub-job 2
Sub-job n
Job Broker
Submit to CE with closest SE
CE and SE
CE and SE
CE and SE
processing
processing
processing
processing
processing
Output file 2
Output file n
File merging job
9Interactive Analysis (1)
- A user starts ROOT session on a laptop
- The analysis macros are started from the ROOT
command line - The data files on the GRID are accessed using
ROOT (AliEn) UI (via xrootd) - The results are stored locally or can be
registered on the GRID (AliEn file catalogue) - If the data files are stored on a cluster, the
interactive analysis is done in parallel using
PROOF
10Interactive Analysis (2) File Access from
ROOTall files accessible via LFNs !
11Validation of the Computing Model in ALICE DC
- Physics Data Challenge (PDC06) Running since 25
April 2006 - 29 sites participating (6 T1s, 23 T2s)
- More than 100K jobs done, 500K pp, 90K PbPb
events, 40 TB of data stored at CASTOR_at_CERN
12?????p?
Example of analysis
PDC06 is the last opportunity to exercise the
simulationreconstruction And the analysis !
13Conclusions
- Parallelism provided by the GRID offers a new
opportunity for the analysis of extremely large
sets of data - ALICE accesses the GRID using its own
environment, AliEn - AliEn, together with ROOT/PROOF, are solid
foundations to build the final system - Weve been permanently testing our GRID
infrastructure in ALICE Data Challenges - Wish us good luck !
14(No Transcript)