Title: CMS Computing risultati e prospettive
1CMS Computingrisultati e prospettive
- Outline
- Schedule
- Pre Data Challenge 04 Production
- Data Challenge 04
- Disegno e scopo
- Componenti sw e mw
- Risultati
- Lezione
- Prospettive ed attivita prossime
- Conclusioni
Nota poco pre-Challenge (PCP), ma update di
quanto presentato a Settembre a Lecce
2CMS Computing schedule
- 2004
- Mar/Apr. DC04 to study T0 Reconstruction, Data
Distribution, Real- time analysis 25 of
startup scale - May/Jul. Data available and useable by PRS
groups - Sep. PRS analysis feed-backs
- Sep. Draft CMS Computing Model in CHEP papers
- Nov. ARDA prototypes
- Nov. Milestone on Interoperability
- Dec. Computing TDR in initial draft form. NEW
milestone date - 2005
- July. LCG TDR and CMS Computing TDR NEW
milestone date - Post July?... DC05 , 50 of startup scale. NEW
milestone date - Dec. Physics TDR Based on Post-DC04
activities - 2006
- DC06 Final readiness tests
- Fall. Computing Systems in place for LHC
startup - Continuous testing and preparations for data
3CMS permanent production
T. Wildish
The system is evolving into a permanent
production effort
Strong contribution of INFN and CNAF Tier-1 to
CMS pastfuture productions 252 assids in
PCP-DC04, for all production step, both local and
(when possible) Grid
4PCP _at_ INFN statistics (4/4)
CMS production steps Generation Simulation ooHitf
ormatting Digitisation continued through DC!
2x1033 digitisation step (all CMS)
DC04
Note strong contribution to all steps by CNAF T1
but only outside DC04 (on DC too hard for CNAF T1
to be a RC also!!)
24 Mevents, 6 weeks
May 04
Feb 04
2x1033 digitisation step (INFN only)
43 Mevts in CMS 7.8 Mevts ( 18) done by INFN
D. Bonacorsi
5PCP grid-based prototypes
Constant work of integration in CMS between
? CMS software and production tools
? evolving EDG-X?LCG-Y middleware in
several phases ? CMS Stress Test stressing
EDGlt1.4, then ? PCP on the CMS/LCG-0 testbed
? PCP on LCG-1 towards DC04 with LCG-2
EU-CMS submit to LCG scheduler ? CMS-LCG
virtual Regional Center 0.5 Mevts Generation
heavy pythia (2000 jobs 8 hours each, 10
KSI2000 months) 2.1 Mevts Simulation
CMSIMOSCAR (8500 jobs 10hours each, 130
KSI2000 months) 2 TB data
OSCAR 0.6 Mevts on LCG-1
PIII 1GHz
CMSIM 1.5 Mevts on CMS/LCG-0
D. Bonacorsi
6Scopo del Data Challenge 04
- Aim of DC04
- ? reach a sustained 25Hz reconstruction rate in
the Tier-0 farm (25 of the - target conditions for LHC startup)
- ? register data and metadata to a catalogue
- ? transfer the reconstructed data to all Tier-1
centers - ? analyze the reconstructed data at the Tier-1s
as they arrive - ? publicize to the community the data produced at
Tier-1s - ? monitor and archive of performance criteria of
the ensemble of activities for - debugging and post-mortem analysis
- Not a CPU challenge, but a full chain
demonstration! - Pre-challenge production in 2003/04
- ? 70M Monte Carlo events (30M with Geant-4)
produced - ? Classic and grid (CMS/LCG-0, LCG-1, Grid3)
productions
Era un challenge, e ogni volta che si e
trovato un limite di scalabilita di una
componente, e stato un Successo!
7Data Challenge 04 layout
By C. Grandi
INFN
INFN
INFN
INFN
Unico Tier2 nel DC04 LNL
INFN
INFN
Full chain (but the Tier-0 reconstruction) done
in LCG-2, but only for INFN and PIC Not without
pain
8Data Challenge 04 numbers
- Pre Challenge Production (PCP04) Jul03-Feb04
- Eventi simulati 75 M events 750k jobs, 800k
files, 5000 KSI2000 months, 100 TB of data
(30 M Geant4) - Eventi digitizzati (raw) 35 M events 35k jobs,
105k files - Dove INFN, USA, CERN,
- In Italia 10-15 M events (20)
- Per cosa (Physics and Reconstruction Software
Groups) Muons, B-tau, e-gamma, Higgs - Data Challenge 04 Mar04-Apr04
- Eventi ricostruiti (DST) al Tier0 del CERN
25 M events 25k jobs, 400k files,
150 KSI2000 months, 6 TB of data - Eventi distribuiti al Tier1-CNAF e Tier2-LNL
gli stessi 25 M events e files - Eventi analizzati al Tier1-CNAF e Tier2-LNL
gt 10 M events 15 k jobs, ognuno di 30min
CPU - Post Data Challenge 04 May04-
- Eventi da riprocessare (DST) 25 M events
- Eventi da analizzare in Italia 50 di 75 M
events - Eventi da produrre e distribuire 50 M
9Data Challenge 04 componenti MW e SW
- CMS specific
- Transfer Agents per trasferire i files di DST (al
CERN, ai Tier1) - Mass Storage Systems su nastro (Castor, Enstore,
etc.) (al CERN ai Tier1) - RefDb, Database delle richieste e assignment di
datasets (al CERN) - Cobra, framework del software di CMS (CMS wide)
- ORCA, OSCAR (Geant4), ricostruzione e simulazione
di CMS (CMS wide) - McRunJob, sistema per preparazione dei job (CMS
wide) - BOSS, sistema per il job tracking (CMS wide)
- SRB, sistema di replicazione e catalogo di files
(al CERN, a RAL, Lyon e FZK) - MySQL-POOL, backend di POOL sul database MySQL (a
FNAL) - ORACLE database (al CERN e al Tier1-INFN)
- LCG common
- User Interfaces including Replica Manager (al
CNAF, Padova, LNL, Bari, PIC) - Storage Elements (al CNAF, LNL, PIC)
- Computing Elements (al CNAF, a LNL e a PIC)
- Replica Location Service (al CERN e al
Tier1-CNAF) - Resource Broker (al CERN e al CNAF-Tier1-Grid-it)
- Storage Replica Manager (al CERN e a FNAL)
- Berkley Database Information Index (al CERN)
- Virtual Organization Management System (al CERN)
- GridICE, sistema di monitoring (sui CE, SE, WN,
) - POOL, catalogo per la persistenza (in CERN RLS)
- US specific
- Monte carlo distributed prod system (MOP) (a
FNAL, Wisconsin, Florida, ) - MonaLisa, sistema di monitoring (CMS wide)
- Custom McRunJob, sistema di preparazione dei job
(a FNAL eforse Florida)
10Data Challenge 04 Processing Rate
- Processed about 30M events
- But DST errors make this pass not useful for
analysis - Generally kept up at T1s in CNAF, FNAL, PIC
- Got above 25Hz on many short occasions
- But only one full day above 25Hz with full system
- Working now to document the many different
problems
11Data Challenge 04 data transfer from CERN to
INFN
- A total of gt500k files and 6 TB of data
transferred CERN T0 ? CNAF T1 - max nb.files per day is 45000 on March 31st ,
- max size per day is 400 GB on March 13th (gt700
GB considering the Zips)
GARR Network use
340 Mbps (gt42 MB/s) sustained for 5 hours (max
was 383.8 Mbps)
D. Bonacorsi
12DC04 Real-Time (fake) Analysis
- CMS software installation
- CMS Software Manager (M. Corvo) installs software
via a grid job provided by LCG - RPM distribution based on CMSI or DAR
distribution - Used at CNAF, PIC, Legnaro, Ciemat and Taiwan
with RPMs - Site manager installs RPMs via LCFGng
- Used at Imperial College
- Still inadequate for general CMS users
- Real-time analysis at Tier-1
- Main difficulty is to identify complete file sets
(i.e. runs) - Information today in TMDB or via findColls
- Job processes single runs at the site close to
the data files - File access via rfio
- Output data registered in RLS
A. Fanfani C. Grandi
13DC04 Fake Analysis Architecture
Fake Analysis
Data Transfer
LCG Worker Node
LCG Resource Broker
Drop agent
Fake Analysis agent
Drop Files
- Drop agent triggers job preparation/submission
when all files are available - Fake Analysis agent prepares xml catalog, orcarc,
jdl script and submits job - Jobs record start/end timestamps in mysql DB
J. Hernandez
14Real-time DC04 analysis Turn-around time from T0
- The minimum time from T0 to T1 analysis was 10
minutes - Different problems contributed to the time
spread
- the dataset-oriented analysis made the results
dependent on which dataset were sent in real time
from CERN - Tuning of the Tier-1 Replica Agent
- Replica Agent operation affected by CASTOR
problem - Analysis Agents were not always up due to
debugging - for 1 dataset Zipped Metadata were late with
respect to data - few problems with submission
Preliminary
N. De Filippis, A. Fanfani, F. Fanzago
15DC04 Real-time Analysis
- Maximum rate of analysis jobs 194 jobs/hour
- Maximum rate of analysed events 26 Hz
- Total of 15000 analysis jobs via Grid tools
in 2 weeks (95-99 efficiency)
- Datasets examples
- B0S ? J/y j
- Bkg mu03_tt2mu, mu03_DY2mu
- tTH, H ? bbbar t? Wb W ? ln T ? Wb W
? had. - Bkg bt03_ttbb_tth
- Bkg bt03_qcd170_tth
- Bkg mu03_W1mu
- H ? WW ? 2m 2n
- Bkg mu03_tt2mu, mu03_DY2mu
N. De Filippis, A. Fanfani, F. Fanzago
16Software di ricostruzione e DST
Senza lattivita dei PRS (b-tau, muon, e-gamma)
per il software di ricostruzione non ci sarebbe
analisi ne Data Challenge (04) LINFN e il
major contributor Ba, Bo, Fi, Pi, Pd, Pg, Rm1,
To.
- Last CMS wk Today Prototype DST in place
- Huge effort by large number of people, especially
S. Wynhoff, N. Neumeister, T. Todorov, V.
Innocente for base. Also from - Emilio Meschi, David Futyan, George Daskalakis,
Pascal Vanlaer, Stefano Lacaprara, Christian
Weiser, Arno Heister, Wolfgang Adam, Marcin
Konecki, Andre Holzner, Olivier van der Aa,
Christophe Delaere, Paolo Meridiani, Nicola
Amapane, Susanna Cucciarelli, Haifeng Pi - DST constitutes first CMS summary
- Examples of doing physics with it in place.
But not complete
P. Sphicas
17PRS analysis contributions
- ttH H?bb and related backgrounds
- S. Cucciarelli, F. Ambroglini, C. Weiser, S.
Kappler. A. Bocci, R. Ranieri, A. Heister ...
- Bs?J/y f and related backgrounds
- V. Ciulli, N. Magini, Dubna group...
- A/Hsusy? tt established channel for SUSY H HLT
- People/channels
- A/H?2t?t-jet t-jet S. Gennai, S. Lehti, L.
Wendland
- Reconstruction full track reco starting from to
raw-data several algos already implemented - Studies of RecHits, sensor positions, B field,
material dist - W. Adam, M. Konecki, S. Cucciarelli, A. Frey, M.
Konecki, T. Todorov
- H???
- People G. Anagnostou, G. Daskalakis, A.
Kyriakis, K. Lassila, N. Marinelli, J. Nysten, K.
Armour, S. Bhattacharya, J. Branson, J. Letts, T.
Lee, V. Litvin, H. Newman, S. Shevchenko
- H?ZZ()?4e
- People David Futyan, Paolo Meridiani, Kate
Mackay, Emilio Meschi, Ivica Puljak, Claude
Charlot, Nikola Godinovic, Federico Ferri,
Stephane Bimbot
- H ? WW ? 2m2n
- Zanetti, Lacaprara
E molti altri !!!!
- Calibrazioni ed allineamenti
- Higgs studies
18Data Challenge 04 lezione (1/2)
- Molte componenti usate non scalano (sia CMS che
NON) - RLS
- Castor
- D-cache
- Metadata
- SRB
- Cataloghi di vario tipo e specie
- Job submission system at the Tier0
- Etc.
- Molte funzioni/componenti mancavano
- Data Transfer Management
- Global Data location per tutti (almeno) i Tier1
- Niente di male, era un challenge fatto per
questo! - Ma la vera lezione e stata (surprise?) che
- NON cera (ce) lorganizzazione, ne per LCG
ne per CMS ne per Grid3 - NON cera (ce) un consistente disegno ne di
Data ne di Computing Model - Salvo che parzialmente in Italia e in USA!
19Data Challenge 04 lezione (2/2)
Infatti, per es.
D. Bonacorsi
20Prospettive INFN
- Breve termine
- Ricostruire i DST con una versione di ORCA (sw
CMS) - Validata dalle analisi mentre avviene la
produzione - Dovunque (Tier0, Tier1s e Tier2s) sia possibile
- Distribure i DST, gli altri formati di dati
(Digi, Simhits) e i metadati - Ai Tier1 e di conseguenza ai Tier2
- Consentire lanalisi localmente distribuita
- In modo consistente per laccesso ai dati (pochi
tools lo permettono) - Medio termine
- Costruire un Data Model
- Costruire un Computing Model
- Costruire una architettura consistente e
distribita - Costruire un accesso controllato (e
semi-trasparente) ai dati - Con le componenti che ci sono e che hanno una
prospettiva di scalabilita (da misurare di
nuovo, in modo organico)
21Attivita post Data Challenge 04
- June 04 July 04
- Ricreazione dei DST
- Distribuzione dei file necessari (data e
metadata) per lanalisi - Primi risultati per i PRS e per il Physics TDR
- July 04 July 05
- Produzione di nuovi (o vecchi) datasets (inclusi
i DST) - Target 10 M events/month, steady, per il Physics
TDR - Analisi continua dei dati prodotti
- Sep 04 Oct 04
- Risultati del Data Challenge 04 per CHEP04
- Prima definizione del Data Computing Model
- Definizione dei MoUs
- Jul 05 -
- CMS Computing TDR (e LCG TDR)
- Data Challenge 05, per verificare il Computing
Model - Serviranno risorse (2005) di
- Storage per lanalisi e la produzione ai Tier1,
Tier2 e Tier3 - CPUs per la produzione e lanalisi ai Tier1 e
Tier2
22Possible evolution of CCS tasks(Core Computing
and Software)
- CCS will Reorganize to match the new requirements
and the move from RD to Implementation for
Physics - Meet the PRS Production Requirements (Physics TDR
Analysis) - Build the Data Management and Distributed
Analysis infrastructures - Production Operations group NEW
- Outside of CERN. Must find ways to reduce
manpower requirements. - Using predominantly (only?) GRID resources.
- Data Management Task NEW
- Project to respond to DM RTAG
- Physicists/ Computing to define CMS Blueprint,
relationships with suppliers (LCG/EGEE), CMS DM
task in Computing group - Expect to make major use of manpower and
experience from CDF/D0 Run II - Workload Management Task NEW
- Make the Grid useable to CMS users
- Make major use of manpower with EDG/LCG/EGEE
experience - Distributed Analysis Cross Project (DAPROM) NEW
- Coordinate and harmonize analysis activities
between CCS and PRS - Work closely with Data and Workload Management
tasks - Establish high-level Physics/Computing panel
between T1 countries to ensure Collaboration
Ownership of Computing Model for MoU and RRB
discussions
23Conclusioni
- Il Data Challenge 04 di CMS ha avuto successo
- Misurate molte funzionalita in modo
scientifico - Scoperti molte failures e bottlenecks (ma
raggiunti i 25 Hz!) - Capite (??) molte cose
- Contributo italiano (INFN) determinate
- Il Data Challenge 04 di CMS non ha avuto
successo - Non e stato programmato a sufficienza
- Ha richiesto una continua (due mesi) presenza ed
intervento di persone volonterose (20 ore per
giorno, inclusi i week-end) per soluzioni al
volo ? 30 persone, world-wide - NON ce ancora una valutazione oggettiva dei
risultati - Tutto quello che ha funzionato (nel bene e nel
male) viene a-priori criticato senza proposte
realistiche alternative - Tuttavia, CMS, superato lo stress del DC04, si
sta riprendendo
The CMS system is evolving into a permanent
Production and Analysis effort
24Milestones 2004 specifiche (1/2)
- Partecipazione di almeno tre sedi al DC04 Marzo
- Importare in Italia (Tier1-CNAF) tutti gli eventi
ricostruiti al T0 - Distribuire gli streams selezionati su almeno tre
sedi ( 6 streams, 20 M eventi, 5TB di AOD) - La selezione riguarda lanalisi di almeno 4
canali di segnale e relativi fondi, ai quali
vanno aggiunti gli studi di calibrazione - Deliverable contributo italiano al report DC04,
in funzione del C-TDR e della preparazione del
P-TDR. Risultati dell'analisi dei canali
assegnati all'Italia (almeno 3 stream e 4 canali
di segnale) - Integrazione del sistema di calcolo CMS Italia in
LCG Giugno - Il Tier1, meta dei Tier2 (LNL, Ba, Bo, Pd, Pi,
Rm1) e un terzo dei Tier3 (Ct, Fi, Mi, Na, Pg,
To) hanno il software di LCG installato e hanno
la capacita di lavorare nellenvironment di LCG - Comporta la installazione dei pacchetti software
provenienti da LCG AA e da LCG GDA (da Pool a RLS
etc.) - Completamento analisi utilizzando infrastruttura
LCG e ulteriori produzioni per circa 2 M di
eventi - Deliverable CMS Italia e integrata in LCG per
piu della meta delle risorse
Fine del DC04 slittata ad Aprile Sedi Ba, Bo,
Fi, LNL, Pd, Pi, CNAF-Tier1 2 Streams, ma 4
canali di analisi DONE, 90
Sedi integrate in LCG CNAF-Tier1, LNL, Ba, Pd,
Bo, Pi Il prolungarsi dellanalisi dei risultati
del DC04 fa slittare di almeno 3 mesi In
progress, 30
25Milestones 2004 specifiche (2/2)
- Partecipazione al C-TDR Ottobre
- Include la definizione della partecipazione
italiana al C-TDR in termini di - Risorse e sedi (possibilmente tutte)
- Man-power
- Finanziamenti e piano di interventi
- Deliverable drafts del C-TDR col contributo
italiano - Partecipazione al PCP DC05 di almeno il Tier1 e i
Tier2 Dicembre - Il Tier1 e il CNAF e i Tier2 sono LNL, Ba, Bo,
Pd, Pi, Rm1 - Produzione di 20 M di eventi per lo studio del
P-TDR, o equivalenti (lo studio potrebbe
richiedere fast-MC o speciali programmi) - Contributo alla definizione del LCG-TDR
- Deliverable produzione degli eventi necessari
alla validazione dei tools di fast-simulation e
allo studio dei P-TDR (20 M eventi sul Tier1 i
Tier2/3)
Il Computing TDR e ora dovuto per Luglio 2005 La
milestone slitta di conseguenza Stand-by/progress,
10
Il Data Challenge 05 slitta al Luglio 2005 La
milestone slitta di conseguenza Stand-by, 0
26Back-up Slides
27Computing Model di CMS
- Computing Model design
- Data location and access Model
- Analysis (user) Model
- CMS Software and Tools
- Infrastructure Organization (Tiers and LCG)
28(No Transcript)
29CPU Power Ramp Up
LHC1E34
LHC2E33
Average slope x2.5/year
DC05 P TDR LCG TDR
DC06 Readiness
DC04C TDR
Actual PCP level
Actual DC04 level
DAQTDR
Time shared Resources
Dedicated CMS Resources
30NO HEAVY IONS INCLUDED YET!
Estimates prepared as input to the MoU Task
Force Computing models under active development
31Tier-1 Centers are Crucial to CMS
- CMS expects to have (External) T1 centers at
- CNAF, FNAL, Lyon, Karlsrhue, PIC, RAL
- And a Tier-1 center at CERN (Still discussing
role of CERN T1) - Current Computing model gives total External T1
requirements - Assumed over 6 centers, but not necessarily 6
equal centers - Tier-1 centers will be crucial for
- Calibration, Reprocessing, Data-Serving
- To service the requirements of the Tier-2 centers
- Both from the region and via explicit
relationships with external T2 centers. - Servicing the analysis requirements of their
regions - Next step is to iterate with the T1 centers/CMS
Country managements to understand what they can
realistically hope to propose and to possibly
succeed in obtaining
32Possible Sizing of Regional T1s
- Assume 1 T1 at CERN and Sum of 6 External T1s
- Take truncated sum of collaboration at T1
Countries and calculate Fractions in those
countries - Share the 61 T1s according to this algorithm to
get opening scenario for discussions - CERN 1 T1 for CMS (By Definition)
- France 0.5T1 for CMS
- Germany 0.4T1
- Italy 1.7T1
- Spain 0.2T1
- UK 0.4T1
- USA 2.6T1
33Tier-2
- Ask Now for intentions from all CMS Agencies
- I have an old list, I request that you contact
me with your intentions so I can bring this up to
date. - T1 countries are making a very heavy commitment
- They may need to demonstrate sharing of costs
with the dependent T2s - T2s need to start defining with which T1 they
will enter into service agreements, and
negotiating with them to how costs will be
distributed.
34RLS performance
0.16 files/s ? 10 Hz
0.4 files/s ? 25 Hz
April 2nd, 1800
- ? Time to register the output of a single job
(16 files) left axis - ? Load on client machine at the time of
registration right axis
35RLS issues
- Total Number of files registered in the RLS
during DC04 - ? 570K LFNs each with ? 5-10 PFNs and 9 metadata
attributes - Inserting information into RLS
- Insert PFN (file catalogue) was fast enough if
using the appropriate tools, produced in-course - LRC C API programs (?0.1-0.2sec/file), POOL CLI
with GUID (secs/file) - Insert files with their attributes (file and
metadata catalogue) was slow - We more or less survived, higher data rates would
be troublesome
Sometimes the load on RLS increases and requires
intervention on the server (i.g. log partition
full, switch of server node, un-optimized
queries) ? able to keep up in optimal condition,
so and so otherwise
Time to register the output of a Tier-0 job (16
files)
36PCP set-up a hybrid model
by C.Grandi
Phys.Group asks for a new dataset
Production Manager defines assignments
RefDB
shell scripts
Data-level query
Local Batch Manager
BOSS DB
Job level query
McRunjob plug-in CMSProd
Site Manager starts an assignment
37PCP _at_ INFN statistics (1/4)
CMS production steps Generation Simulation ooHitf
ormatting Digitisation
Generation step (all CMS)
Generation step (INFN only)
contribute to this slope
Jun mid-Aug 03
79 Mevts in CMS 9.9 Mevts (13) done by INFN
(strong contribution by LNL)
38PCP _at_ INFN statistics (2/4)
CMS production steps Generation Simulation ooHitf
ormatting Digitisation
Simulation step CMSIMOSCAR (all CMS)
Simulation step CMSIMOSCAR (INFN only)
Jul Sep 03
75 Mevts in CMS 10.4 Mevts (14) done by INFN
(strong contribution by CNAF T1LNL)
39PCP _at_ INFN statistics (3/4)
CMS production steps Generation Simulation ooHitf
ormatting Digitisation
ooHitformatting step (all CMS)
ooHitformatting step (INFN only)
Dec 03
end-Feb 04
37 Mevts in CMS 7.8 Mevts (21) done by INFN
D. Bonacorsi
40OSCAR
41Evolution of Transfer Requirements
42From GDB to analysis at T1
Transfer
Replication
Job preparation
Job Submission
43Real-Time (Fake) Analysis
- Goals
- Demonstrate data can be analyzed in real time at
the T1 - Fast Feedback to reconstruction (e.g.
calibration, alignment, check of reconstruction
code, etc.) - Establish automatic data replication to T2s
- Make data available for offline analysis
- Measure time elapsed between reconstruction at T0
and analysis at T1 - Architecture
- Set of software agents communicating via local
mysql DB - Replication, data set completeness, job
preparation submission - Use LCG to run jobs
- Private Grid Information System for CMS DC04
- Private Resource Broker
J. Hernandez
44From GDB to analysis at T1
Analysis
T2
GDB
T1
EB
Reconstruction
Transfer and replication agents
Drop and Fake Analysis agents
Publisher and configuration agents
EB agent
J. Hernandez
45Real-time DC04 analysisSummary
- Real-time analysis two weeks of
quasi-continuous running! - The total number of analysis jobs submitted
15000 - Overall Grid efficiency 95-99
- Problems
- RLS query to prepare a POOL xml catalog done
using file GUID otherwise much slower - Resource Broker disk being full causing the RB
unavailability for several hours. This problem
was related to large input/output sandbox.
Possible solutions - Set quotas on RB space for sandbox
- Configure to use RB in cascade
- Network problem at CERN, not allowing
connections to the RLS and CERN RB - Legnaro CE/SE disappeared in the Information
System during one night - Failures in updating Boss database due to
overload of MySQL server (30 ). The Boss
recovery procedure was used
N. De Filippis, A. Fanfani, F. Fanzago
46Description of RLS usage in DC04
Local POOL catalogue
TMDB
Tier-1 Transfer agent
SRB GMCAT
Replica Manager
RM/SRM/SRB EB agents
4. Copy files to Tier-1s
Resource Broker
3. Copy/delete files to/from export buffers
5. Submit analysis job
LCG ORCA Analysis Job
Configuration agent
2. Find Tier-1 Location (based on metadata)
6. Process DST and register private data
CNAF RLS replica
ORACLE mirroring
XML Publication Agent
1. Register Files
Specific client tools POOL CLI, Replica Manager
CLI, C LRC API based programs, LRC java API
tools (SRB/GMCAT), Resource Broker
47Context for the agent system
Global system management/ steering
Replica managers
Configuration agent
Resource brokers?
Agents (and TMDB)
File catalogue
Metadata
Analysis A separate world?
Grid transfer tools
48DST files
b/t datasets
1. Replicate data to disk SEs at T1/2
Replica Agent
muon datasets
2. Notify that new files are available for
analysis
ORCA 8.0.1 on UI to compile analysis code
to
Real-time Analysis Agent
- Check if a file-set (run) is ready to be analyzed
(greenlight) - Prepare the job to analyze the run
- Submit the job via BOSS to the RB
CMS software (ORCA8.0.1) installed by the CMS
software manager using a GRID job based on xcmsi
tool
49Muon and Neutrino Informations
- ? transverse energy
- Muon Pt
- Isolated Muon Pt
- Isolation Efficiency
- Single muon 88 (98 wrt selection)
50Jet Informations
- Total number of Jet
- Number of B Jet
- Et of non B Jet
- Et of B Jet
51Hadronic Top
Reconstructed Masses
Hadronic W
Leptonic Top
52data transfert and job preparation
b/tau dataset
DST files
DST files
Muon dataset
Notify that new files are available for analysis
ORCA_8_0_1 available on UI to compile analysis
code
To
Submission via BOSS
CMS software is installed by the CMS Software
Manager using a GRID job based on xcmsi tool
Only If the collection file has greenlight the
agent prepares and submits a job to analyse one
run
2
53(No Transcript)
54An example Replicas to disk-SEs
CNAF T1 Castor SE
CNAF T1 Castor SE
eth I/O input from SE-EB
TCP connections
Just one day Apr, 19th
RAM memory
CNAF T1 disk-SE
eth I/O input from Castor SE
green
Legnaro T2 disk-SE
eth I/O input from Castor SE
D. Bonacorsi
55Data Transfer
Castor
CERN EB (3 disk SE)
Tier-1
Castor
Tier-1
CNAF disk SE
PIC disk SE
CNAF SE
PIC SE
Tier-2
Legnaro disk SE
Tier-2
CIEMAT disk SE
- Transfer tools
- Replica Manager CLI used for EB ? CNAF and CNAF ?
Legnaro - Java-based CLI introduces non negligible overhead
at start-up - globus-url-copy LRC C API used for EB ?PIC
and PIC ? Ciemat - Faster
- Performance has been good with both tools
- Total network throughput limited by small file
size - Some transfer problem caused by performance of
underlying MSS - Always use a disk SE in front of an MSS in the
future?
A. Fanfani
56Real-time DC04 analysis job time statistic
Dataset bt03_ttbb_ttH analysed with executable
ttHWmu
Total execution time 28 minutes
ORCA execution time 25 minutes
Time for staging input and output files 170 s
Job waiting time before starting 120 s
Overhead of GRID waiting time in queue
N. De Filippis, A. Fanfani, F. Fanzago