CMS Computing risultati e prospettive - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

CMS Computing risultati e prospettive

Description:

ORCA. Analysis. Job. MSS. ORCA. Grid Job. Unico Tier2. nel DC04: LNL ... ORCA, OSCAR (Geant4), ricostruzione e simulazione di CMS (CMS wide) ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 57
Provided by: PaoloCa7
Category:

less

Transcript and Presenter's Notes

Title: CMS Computing risultati e prospettive


1
CMS Computingrisultati e prospettive
  • Outline
  • Schedule
  • Pre Data Challenge 04 Production
  • Data Challenge 04
  • Disegno e scopo
  • Componenti sw e mw
  • Risultati
  • Lezione
  • Prospettive ed attivita prossime
  • Conclusioni

Nota poco pre-Challenge (PCP), ma update di
quanto presentato a Settembre a Lecce
2
CMS Computing schedule
  • 2004
  • Mar/Apr. DC04 to study T0 Reconstruction, Data
    Distribution, Real- time analysis 25 of
    startup scale
  • May/Jul. Data available and useable by PRS
    groups
  • Sep. PRS analysis feed-backs
  • Sep. Draft CMS Computing Model in CHEP papers
  • Nov. ARDA prototypes
  • Nov. Milestone on Interoperability
  • Dec. Computing TDR in initial draft form. NEW
    milestone date
  • 2005
  • July. LCG TDR and CMS Computing TDR NEW
    milestone date
  • Post July?... DC05 , 50 of startup scale. NEW
    milestone date
  • Dec. Physics TDR Based on Post-DC04
    activities
  • 2006
  • DC06 Final readiness tests
  • Fall. Computing Systems in place for LHC
    startup
  • Continuous testing and preparations for data

3
CMS permanent production
T. Wildish
The system is evolving into a permanent
production effort
Strong contribution of INFN and CNAF Tier-1 to
CMS pastfuture productions 252 assids in
PCP-DC04, for all production step, both local and
(when possible) Grid
4
PCP _at_ INFN statistics (4/4)
CMS production steps Generation Simulation ooHitf
ormatting Digitisation continued through DC!
2x1033 digitisation step (all CMS)
DC04
Note strong contribution to all steps by CNAF T1
but only outside DC04 (on DC too hard for CNAF T1
to be a RC also!!)
24 Mevents, 6 weeks
May 04
Feb 04
2x1033 digitisation step (INFN only)
43 Mevts in CMS 7.8 Mevts ( 18) done by INFN
D. Bonacorsi
5
PCP grid-based prototypes
Constant work of integration in CMS between
? CMS software and production tools

? evolving EDG-X?LCG-Y middleware in
several phases ? CMS Stress Test stressing
EDGlt1.4, then ? PCP on the CMS/LCG-0 testbed
? PCP on LCG-1 towards DC04 with LCG-2
EU-CMS submit to LCG scheduler ? CMS-LCG
virtual Regional Center 0.5 Mevts Generation
heavy pythia (2000 jobs 8 hours each, 10
KSI2000 months) 2.1 Mevts Simulation
CMSIMOSCAR (8500 jobs 10hours each, 130
KSI2000 months) 2 TB data
OSCAR 0.6 Mevts on LCG-1
PIII 1GHz
CMSIM 1.5 Mevts on CMS/LCG-0
D. Bonacorsi
6
Scopo del Data Challenge 04
  • Aim of DC04
  • ? reach a sustained 25Hz reconstruction rate in
    the Tier-0 farm (25 of the
  • target conditions for LHC startup)
  • ? register data and metadata to a catalogue
  • ? transfer the reconstructed data to all Tier-1
    centers
  • ? analyze the reconstructed data at the Tier-1s
    as they arrive
  • ? publicize to the community the data produced at
    Tier-1s
  • ? monitor and archive of performance criteria of
    the ensemble of activities for
  • debugging and post-mortem analysis
  • Not a CPU challenge, but a full chain
    demonstration!
  • Pre-challenge production in 2003/04
  • ? 70M Monte Carlo events (30M with Geant-4)
    produced
  • ? Classic and grid (CMS/LCG-0, LCG-1, Grid3)
    productions

Era un challenge, e ogni volta che si e
trovato un limite di scalabilita di una
componente, e stato un Successo!
7
Data Challenge 04 layout
By C. Grandi
INFN
INFN
INFN
INFN
Unico Tier2 nel DC04 LNL
INFN
INFN
Full chain (but the Tier-0 reconstruction) done
in LCG-2, but only for INFN and PIC Not without
pain
8
Data Challenge 04 numbers
  • Pre Challenge Production (PCP04) Jul03-Feb04
  • Eventi simulati 75 M events 750k jobs, 800k
    files, 5000 KSI2000 months, 100 TB of data
    (30 M Geant4)
  • Eventi digitizzati (raw) 35 M events 35k jobs,
    105k files
  • Dove INFN, USA, CERN,
  • In Italia 10-15 M events (20)
  • Per cosa (Physics and Reconstruction Software
    Groups) Muons, B-tau, e-gamma, Higgs
  • Data Challenge 04 Mar04-Apr04
  • Eventi ricostruiti (DST) al Tier0 del CERN
    25 M events 25k jobs, 400k files,
    150 KSI2000 months, 6 TB of data
  • Eventi distribuiti al Tier1-CNAF e Tier2-LNL
    gli stessi 25 M events e files
  • Eventi analizzati al Tier1-CNAF e Tier2-LNL
    gt 10 M events 15 k jobs, ognuno di 30min
    CPU
  • Post Data Challenge 04 May04-
  • Eventi da riprocessare (DST) 25 M events
  • Eventi da analizzare in Italia 50 di 75 M
    events
  • Eventi da produrre e distribuire 50 M

9
Data Challenge 04 componenti MW e SW
  • CMS specific
  • Transfer Agents per trasferire i files di DST (al
    CERN, ai Tier1)
  • Mass Storage Systems su nastro (Castor, Enstore,
    etc.) (al CERN ai Tier1)
  • RefDb, Database delle richieste e assignment di
    datasets (al CERN)
  • Cobra, framework del software di CMS (CMS wide)
  • ORCA, OSCAR (Geant4), ricostruzione e simulazione
    di CMS (CMS wide)
  • McRunJob, sistema per preparazione dei job (CMS
    wide)
  • BOSS, sistema per il job tracking (CMS wide)
  • SRB, sistema di replicazione e catalogo di files
    (al CERN, a RAL, Lyon e FZK)
  • MySQL-POOL, backend di POOL sul database MySQL (a
    FNAL)
  • ORACLE database (al CERN e al Tier1-INFN)
  • LCG common
  • User Interfaces including Replica Manager (al
    CNAF, Padova, LNL, Bari, PIC)
  • Storage Elements (al CNAF, LNL, PIC)
  • Computing Elements (al CNAF, a LNL e a PIC)
  • Replica Location Service (al CERN e al
    Tier1-CNAF)
  • Resource Broker (al CERN e al CNAF-Tier1-Grid-it)
  • Storage Replica Manager (al CERN e a FNAL)
  • Berkley Database Information Index (al CERN)
  • Virtual Organization Management System (al CERN)
  • GridICE, sistema di monitoring (sui CE, SE, WN,
    )
  • POOL, catalogo per la persistenza (in CERN RLS)
  • US specific
  • Monte carlo distributed prod system (MOP) (a
    FNAL, Wisconsin, Florida, )
  • MonaLisa, sistema di monitoring (CMS wide)
  • Custom McRunJob, sistema di preparazione dei job
    (a FNAL eforse Florida)

10
Data Challenge 04 Processing Rate
  • Processed about 30M events
  • But DST errors make this pass not useful for
    analysis
  • Generally kept up at T1s in CNAF, FNAL, PIC
  • Got above 25Hz on many short occasions
  • But only one full day above 25Hz with full system
  • Working now to document the many different
    problems

11
Data Challenge 04 data transfer from CERN to
INFN
  • A total of gt500k files and 6 TB of data
    transferred CERN T0 ? CNAF T1
  • max nb.files per day is 45000 on March 31st ,
  • max size per day is 400 GB on March 13th (gt700
    GB considering the Zips)

GARR Network use
340 Mbps (gt42 MB/s) sustained for 5 hours (max
was 383.8 Mbps)
D. Bonacorsi
12
DC04 Real-Time (fake) Analysis
  • CMS software installation
  • CMS Software Manager (M. Corvo) installs software
    via a grid job provided by LCG
  • RPM distribution based on CMSI or DAR
    distribution
  • Used at CNAF, PIC, Legnaro, Ciemat and Taiwan
    with RPMs
  • Site manager installs RPMs via LCFGng
  • Used at Imperial College
  • Still inadequate for general CMS users
  • Real-time analysis at Tier-1
  • Main difficulty is to identify complete file sets
    (i.e. runs)
  • Information today in TMDB or via findColls
  • Job processes single runs at the site close to
    the data files
  • File access via rfio
  • Output data registered in RLS

A. Fanfani C. Grandi
13
DC04 Fake Analysis Architecture


Fake Analysis
Data Transfer
LCG Worker Node
LCG Resource Broker
Drop agent
Fake Analysis agent
Drop Files
  • Drop agent triggers job preparation/submission
    when all files are available
  • Fake Analysis agent prepares xml catalog, orcarc,
    jdl script and submits job
  • Jobs record start/end timestamps in mysql DB

J. Hernandez
14
Real-time DC04 analysis Turn-around time from T0
  • The minimum time from T0 to T1 analysis was 10
    minutes
  • Different problems contributed to the time
    spread
  • the dataset-oriented analysis made the results
    dependent on which dataset were sent in real time
    from CERN
  • Tuning of the Tier-1 Replica Agent
  • Replica Agent operation affected by CASTOR
    problem
  • Analysis Agents were not always up due to
    debugging
  • for 1 dataset Zipped Metadata were late with
    respect to data
  • few problems with submission

Preliminary
N. De Filippis, A. Fanfani, F. Fanzago
15
DC04 Real-time Analysis
  • Maximum rate of analysis jobs 194 jobs/hour
  • Maximum rate of analysed events 26 Hz
  • Total of 15000 analysis jobs via Grid tools
    in 2 weeks (95-99 efficiency)
  • Datasets examples
  • B0S ? J/y j
  • Bkg mu03_tt2mu, mu03_DY2mu
  • tTH, H ? bbbar t? Wb W ? ln T ? Wb W
    ? had.
  • Bkg bt03_ttbb_tth
  • Bkg bt03_qcd170_tth
  • Bkg mu03_W1mu
  • H ? WW ? 2m 2n
  • Bkg mu03_tt2mu, mu03_DY2mu

N. De Filippis, A. Fanfani, F. Fanzago
16
Software di ricostruzione e DST
Senza lattivita dei PRS (b-tau, muon, e-gamma)
per il software di ricostruzione non ci sarebbe
analisi ne Data Challenge (04) LINFN e il
major contributor Ba, Bo, Fi, Pi, Pd, Pg, Rm1,
To.
  • Last CMS wk Today Prototype DST in place
  • Huge effort by large number of people, especially
    S. Wynhoff, N. Neumeister, T. Todorov, V.
    Innocente for base. Also from
  • Emilio Meschi, David Futyan, George Daskalakis,
    Pascal Vanlaer, Stefano Lacaprara, Christian
    Weiser, Arno Heister, Wolfgang Adam, Marcin
    Konecki, Andre Holzner, Olivier van der Aa,
    Christophe Delaere, Paolo Meridiani, Nicola
    Amapane, Susanna Cucciarelli, Haifeng Pi
  • DST constitutes first CMS summary
  • Examples of doing physics with it in place.
    But not complete

P. Sphicas
17
PRS analysis contributions
  • ttH H?bb and related backgrounds
  • S. Cucciarelli, F. Ambroglini, C. Weiser, S.
    Kappler. A. Bocci, R. Ranieri, A. Heister ...
  • Bs?J/y f and related backgrounds
  • V. Ciulli, N. Magini, Dubna group...
  • A/Hsusy? tt established channel for SUSY H HLT
  • People/channels
  • A/H?2t?t-jet t-jet S. Gennai, S. Lehti, L.
    Wendland
  • Reconstruction full track reco starting from to
    raw-data several algos already implemented
  • Studies of RecHits, sensor positions, B field,
    material dist
  • W. Adam, M. Konecki, S. Cucciarelli, A. Frey, M.
    Konecki, T. Todorov
  • H???
  • People G. Anagnostou, G. Daskalakis, A.
    Kyriakis, K. Lassila, N. Marinelli, J. Nysten, K.
    Armour, S. Bhattacharya, J. Branson, J. Letts, T.
    Lee, V. Litvin, H. Newman, S. Shevchenko
  • H?ZZ()?4e
  • People David Futyan, Paolo Meridiani, Kate
    Mackay, Emilio Meschi, Ivica Puljak, Claude
    Charlot, Nikola Godinovic, Federico Ferri,
    Stephane Bimbot
  • H ? WW ? 2m2n
  • Zanetti, Lacaprara

E molti altri !!!!
  • Calibrazioni ed allineamenti
  • Higgs studies

18
Data Challenge 04 lezione (1/2)
  • Molte componenti usate non scalano (sia CMS che
    NON)
  • RLS
  • Castor
  • D-cache
  • Metadata
  • SRB
  • Cataloghi di vario tipo e specie
  • Job submission system at the Tier0
  • Etc.
  • Molte funzioni/componenti mancavano
  • Data Transfer Management
  • Global Data location per tutti (almeno) i Tier1
  • Niente di male, era un challenge fatto per
    questo!
  • Ma la vera lezione e stata (surprise?) che
  • NON cera (ce) lorganizzazione, ne per LCG
    ne per CMS ne per Grid3
  • NON cera (ce) un consistente disegno ne di
    Data ne di Computing Model
  • Salvo che parzialmente in Italia e in USA!

19
Data Challenge 04 lezione (2/2)
Infatti, per es.
D. Bonacorsi
20
Prospettive INFN
  • Breve termine
  • Ricostruire i DST con una versione di ORCA (sw
    CMS)
  • Validata dalle analisi mentre avviene la
    produzione
  • Dovunque (Tier0, Tier1s e Tier2s) sia possibile
  • Distribure i DST, gli altri formati di dati
    (Digi, Simhits) e i metadati
  • Ai Tier1 e di conseguenza ai Tier2
  • Consentire lanalisi localmente distribuita
  • In modo consistente per laccesso ai dati (pochi
    tools lo permettono)
  • Medio termine
  • Costruire un Data Model
  • Costruire un Computing Model
  • Costruire una architettura consistente e
    distribita
  • Costruire un accesso controllato (e
    semi-trasparente) ai dati
  • Con le componenti che ci sono e che hanno una
    prospettiva di scalabilita (da misurare di
    nuovo, in modo organico)

21
Attivita post Data Challenge 04
  • June 04 July 04
  • Ricreazione dei DST
  • Distribuzione dei file necessari (data e
    metadata) per lanalisi
  • Primi risultati per i PRS e per il Physics TDR
  • July 04 July 05
  • Produzione di nuovi (o vecchi) datasets (inclusi
    i DST)
  • Target 10 M events/month, steady, per il Physics
    TDR
  • Analisi continua dei dati prodotti
  • Sep 04 Oct 04
  • Risultati del Data Challenge 04 per CHEP04
  • Prima definizione del Data Computing Model
  • Definizione dei MoUs
  • Jul 05 -
  • CMS Computing TDR (e LCG TDR)
  • Data Challenge 05, per verificare il Computing
    Model
  • Serviranno risorse (2005) di
  • Storage per lanalisi e la produzione ai Tier1,
    Tier2 e Tier3
  • CPUs per la produzione e lanalisi ai Tier1 e
    Tier2

22
Possible evolution of CCS tasks(Core Computing
and Software)
  • CCS will Reorganize to match the new requirements
    and the move from RD to Implementation for
    Physics
  • Meet the PRS Production Requirements (Physics TDR
    Analysis)
  • Build the Data Management and Distributed
    Analysis infrastructures
  • Production Operations group NEW
  • Outside of CERN. Must find ways to reduce
    manpower requirements.
  • Using predominantly (only?) GRID resources.
  • Data Management Task NEW
  • Project to respond to DM RTAG
  • Physicists/ Computing to define CMS Blueprint,
    relationships with suppliers (LCG/EGEE), CMS DM
    task in Computing group
  • Expect to make major use of manpower and
    experience from CDF/D0 Run II
  • Workload Management Task NEW
  • Make the Grid useable to CMS users
  • Make major use of manpower with EDG/LCG/EGEE
    experience
  • Distributed Analysis Cross Project (DAPROM) NEW
  • Coordinate and harmonize analysis activities
    between CCS and PRS
  • Work closely with Data and Workload Management
    tasks
  • Establish high-level Physics/Computing panel
    between T1 countries to ensure Collaboration
    Ownership of Computing Model for MoU and RRB
    discussions

23
Conclusioni
  • Il Data Challenge 04 di CMS ha avuto successo
  • Misurate molte funzionalita in modo
    scientifico
  • Scoperti molte failures e bottlenecks (ma
    raggiunti i 25 Hz!)
  • Capite (??) molte cose
  • Contributo italiano (INFN) determinate
  • Il Data Challenge 04 di CMS non ha avuto
    successo
  • Non e stato programmato a sufficienza
  • Ha richiesto una continua (due mesi) presenza ed
    intervento di persone volonterose (20 ore per
    giorno, inclusi i week-end) per soluzioni al
    volo ? 30 persone, world-wide
  • NON ce ancora una valutazione oggettiva dei
    risultati
  • Tutto quello che ha funzionato (nel bene e nel
    male) viene a-priori criticato senza proposte
    realistiche alternative
  • Tuttavia, CMS, superato lo stress del DC04, si
    sta riprendendo

The CMS system is evolving into a permanent
Production and Analysis effort
24
Milestones 2004 specifiche (1/2)
  • Partecipazione di almeno tre sedi al DC04 Marzo
  • Importare in Italia (Tier1-CNAF) tutti gli eventi
    ricostruiti al T0
  • Distribuire gli streams selezionati su almeno tre
    sedi ( 6 streams, 20 M eventi, 5TB di AOD)
  • La selezione riguarda lanalisi di almeno 4
    canali di segnale e relativi fondi, ai quali
    vanno aggiunti gli studi di calibrazione
  • Deliverable contributo italiano al report DC04,
    in funzione del C-TDR e della preparazione del
    P-TDR. Risultati dell'analisi dei canali
    assegnati all'Italia (almeno 3 stream e 4 canali
    di segnale)
  • Integrazione del sistema di calcolo CMS Italia in
    LCG Giugno
  • Il Tier1, meta dei Tier2 (LNL, Ba, Bo, Pd, Pi,
    Rm1) e un terzo dei Tier3 (Ct, Fi, Mi, Na, Pg,
    To) hanno il software di LCG installato e hanno
    la capacita di lavorare nellenvironment di LCG
  • Comporta la installazione dei pacchetti software
    provenienti da LCG AA e da LCG GDA (da Pool a RLS
    etc.)
  • Completamento analisi utilizzando infrastruttura
    LCG e ulteriori produzioni per circa 2 M di
    eventi
  • Deliverable CMS Italia e integrata in LCG per
    piu della meta delle risorse

Fine del DC04 slittata ad Aprile Sedi Ba, Bo,
Fi, LNL, Pd, Pi, CNAF-Tier1 2 Streams, ma 4
canali di analisi DONE, 90
Sedi integrate in LCG CNAF-Tier1, LNL, Ba, Pd,
Bo, Pi Il prolungarsi dellanalisi dei risultati
del DC04 fa slittare di almeno 3 mesi In
progress, 30
25
Milestones 2004 specifiche (2/2)
  • Partecipazione al C-TDR Ottobre
  • Include la definizione della partecipazione
    italiana al C-TDR in termini di
  • Risorse e sedi (possibilmente tutte)
  • Man-power
  • Finanziamenti e piano di interventi
  • Deliverable drafts del C-TDR col contributo
    italiano
  • Partecipazione al PCP DC05 di almeno il Tier1 e i
    Tier2 Dicembre
  • Il Tier1 e il CNAF e i Tier2 sono LNL, Ba, Bo,
    Pd, Pi, Rm1
  • Produzione di 20 M di eventi per lo studio del
    P-TDR, o equivalenti (lo studio potrebbe
    richiedere fast-MC o speciali programmi)
  • Contributo alla definizione del LCG-TDR
  • Deliverable produzione degli eventi necessari
    alla validazione dei tools di fast-simulation e
    allo studio dei P-TDR (20 M eventi sul Tier1 i
    Tier2/3)

Il Computing TDR e ora dovuto per Luglio 2005 La
milestone slitta di conseguenza Stand-by/progress,
10
Il Data Challenge 05 slitta al Luglio 2005 La
milestone slitta di conseguenza Stand-by, 0
26
Back-up Slides
27
Computing Model di CMS
  • Computing Model design
  • Data location and access Model
  • Analysis (user) Model
  • CMS Software and Tools
  • Infrastructure Organization (Tiers and LCG)

28
(No Transcript)
29
CPU Power Ramp Up
LHC1E34
LHC2E33
Average slope x2.5/year
DC05 P TDR LCG TDR
DC06 Readiness
DC04C TDR
Actual PCP level
Actual DC04 level
DAQTDR
Time shared Resources
Dedicated CMS Resources
30
NO HEAVY IONS INCLUDED YET!
Estimates prepared as input to the MoU Task
Force Computing models under active development
31
Tier-1 Centers are Crucial to CMS
  • CMS expects to have (External) T1 centers at
  • CNAF, FNAL, Lyon, Karlsrhue, PIC, RAL
  • And a Tier-1 center at CERN (Still discussing
    role of CERN T1)
  • Current Computing model gives total External T1
    requirements
  • Assumed over 6 centers, but not necessarily 6
    equal centers
  • Tier-1 centers will be crucial for
  • Calibration, Reprocessing, Data-Serving
  • To service the requirements of the Tier-2 centers
  • Both from the region and via explicit
    relationships with external T2 centers.
  • Servicing the analysis requirements of their
    regions
  • Next step is to iterate with the T1 centers/CMS
    Country managements to understand what they can
    realistically hope to propose and to possibly
    succeed in obtaining

32
Possible Sizing of Regional T1s
  • Assume 1 T1 at CERN and Sum of 6 External T1s
  • Take truncated sum of collaboration at T1
    Countries and calculate Fractions in those
    countries
  • Share the 61 T1s according to this algorithm to
    get opening scenario for discussions
  • CERN 1 T1 for CMS (By Definition)
  • France 0.5T1 for CMS
  • Germany 0.4T1
  • Italy 1.7T1
  • Spain 0.2T1
  • UK 0.4T1
  • USA 2.6T1

33
Tier-2
  • Ask Now for intentions from all CMS Agencies
  • I have an old list, I request that you contact
    me with your intentions so I can bring this up to
    date.
  • T1 countries are making a very heavy commitment
  • They may need to demonstrate sharing of costs
    with the dependent T2s
  • T2s need to start defining with which T1 they
    will enter into service agreements, and
    negotiating with them to how costs will be
    distributed.

34
RLS performance
0.16 files/s ? 10 Hz
0.4 files/s ? 25 Hz
April 2nd, 1800
  • ? Time to register the output of a single job
    (16 files) left axis
  • ? Load on client machine at the time of
    registration right axis

35
RLS issues
  • Total Number of files registered in the RLS
    during DC04
  • ? 570K LFNs each with ? 5-10 PFNs and 9 metadata
    attributes
  • Inserting information into RLS
  • Insert PFN (file catalogue) was fast enough if
    using the appropriate tools, produced in-course
  • LRC C API programs (?0.1-0.2sec/file), POOL CLI
    with GUID (secs/file)
  • Insert files with their attributes (file and
    metadata catalogue) was slow
  • We more or less survived, higher data rates would
    be troublesome

Sometimes the load on RLS increases and requires
intervention on the server (i.g. log partition
full, switch of server node, un-optimized
queries) ? able to keep up in optimal condition,
so and so otherwise
Time to register the output of a Tier-0 job (16
files)
36
PCP set-up a hybrid model
by C.Grandi
Phys.Group asks for a new dataset
Production Manager defines assignments
RefDB
shell scripts
Data-level query
Local Batch Manager
BOSS DB
Job level query
McRunjob plug-in CMSProd
Site Manager starts an assignment
37
PCP _at_ INFN statistics (1/4)
CMS production steps Generation Simulation ooHitf
ormatting Digitisation
Generation step (all CMS)
Generation step (INFN only)
contribute to this slope
Jun mid-Aug 03
79 Mevts in CMS 9.9 Mevts (13) done by INFN
(strong contribution by LNL)
38
PCP _at_ INFN statistics (2/4)
CMS production steps Generation Simulation ooHitf
ormatting Digitisation
Simulation step CMSIMOSCAR (all CMS)
Simulation step CMSIMOSCAR (INFN only)
Jul Sep 03
75 Mevts in CMS 10.4 Mevts (14) done by INFN
(strong contribution by CNAF T1LNL)
39
PCP _at_ INFN statistics (3/4)
CMS production steps Generation Simulation ooHitf
ormatting Digitisation
ooHitformatting step (all CMS)
ooHitformatting step (INFN only)
Dec 03
end-Feb 04
37 Mevts in CMS 7.8 Mevts (21) done by INFN
D. Bonacorsi
40
OSCAR
41
Evolution of Transfer Requirements
42
From GDB to analysis at T1
Transfer
Replication
Job preparation
Job Submission
43
Real-Time (Fake) Analysis
  • Goals
  • Demonstrate data can be analyzed in real time at
    the T1
  • Fast Feedback to reconstruction (e.g.
    calibration, alignment, check of reconstruction
    code, etc.)
  • Establish automatic data replication to T2s
  • Make data available for offline analysis
  • Measure time elapsed between reconstruction at T0
    and analysis at T1
  • Architecture
  • Set of software agents communicating via local
    mysql DB
  • Replication, data set completeness, job
    preparation submission
  • Use LCG to run jobs
  • Private Grid Information System for CMS DC04
  • Private Resource Broker

J. Hernandez
44
From GDB to analysis at T1
Analysis
T2
GDB
T1
EB
Reconstruction
Transfer and replication agents
Drop and Fake Analysis agents
Publisher and configuration agents
EB agent
J. Hernandez
45
Real-time DC04 analysisSummary
  • Real-time analysis two weeks of
    quasi-continuous running!
  • The total number of analysis jobs submitted
    15000
  • Overall Grid efficiency 95-99
  • Problems
  • RLS query to prepare a POOL xml catalog done
    using file GUID otherwise much slower
  • Resource Broker disk being full causing the RB
    unavailability for several hours. This problem
    was related to large input/output sandbox.
    Possible solutions
  • Set quotas on RB space for sandbox
  • Configure to use RB in cascade
  • Network problem at CERN, not allowing
    connections to the RLS and CERN RB
  • Legnaro CE/SE disappeared in the Information
    System during one night
  • Failures in updating Boss database due to
    overload of MySQL server (30 ). The Boss
    recovery procedure was used

N. De Filippis, A. Fanfani, F. Fanzago
46
Description of RLS usage in DC04
Local POOL catalogue
TMDB
Tier-1 Transfer agent
SRB GMCAT
Replica Manager
RM/SRM/SRB EB agents
4. Copy files to Tier-1s
Resource Broker
3. Copy/delete files to/from export buffers
5. Submit analysis job
LCG ORCA Analysis Job
Configuration agent
2. Find Tier-1 Location (based on metadata)
6. Process DST and register private data
CNAF RLS replica
ORACLE mirroring
XML Publication Agent
1. Register Files
Specific client tools POOL CLI, Replica Manager
CLI, C LRC API based programs, LRC java API
tools (SRB/GMCAT), Resource Broker
47
Context for the agent system
Global system management/ steering
Replica managers
Configuration agent
Resource brokers?
Agents (and TMDB)
File catalogue
Metadata
Analysis A separate world?
Grid transfer tools
48
DST files
b/t datasets
1. Replicate data to disk SEs at T1/2
Replica Agent
muon datasets
2. Notify that new files are available for
analysis
ORCA 8.0.1 on UI to compile analysis code
to
Real-time Analysis Agent
  1. Check if a file-set (run) is ready to be analyzed
    (greenlight)
  2. Prepare the job to analyze the run
  3. Submit the job via BOSS to the RB

CMS software (ORCA8.0.1) installed by the CMS
software manager using a GRID job based on xcmsi
tool
49
Muon and Neutrino Informations
  • ? transverse energy
  • Muon Pt
  • Isolated Muon Pt
  • Isolation Efficiency
  • Single muon 88 (98 wrt selection)

50
Jet Informations
  • Total number of Jet
  • Number of B Jet
  • Et of non B Jet
  • Et of B Jet

51
Hadronic Top
Reconstructed Masses
Hadronic W
Leptonic Top
52
data transfert and job preparation
b/tau dataset
DST files
DST files
Muon dataset
Notify that new files are available for analysis
ORCA_8_0_1 available on UI to compile analysis
code
To
Submission via BOSS
CMS software is installed by the CMS Software
Manager using a GRID job based on xcmsi tool
Only If the collection file has greenlight the
agent prepares and submits a job to analyse one
run
2
53
(No Transcript)
54
An example Replicas to disk-SEs
CNAF T1 Castor SE
CNAF T1 Castor SE
eth I/O input from SE-EB
TCP connections
Just one day Apr, 19th
RAM memory
CNAF T1 disk-SE
eth I/O input from Castor SE
green
Legnaro T2 disk-SE
eth I/O input from Castor SE
D. Bonacorsi
55
Data Transfer
Castor
CERN EB (3 disk SE)
Tier-1
Castor
Tier-1
CNAF disk SE
PIC disk SE
CNAF SE
PIC SE
Tier-2
Legnaro disk SE
Tier-2
CIEMAT disk SE
  • Transfer tools
  • Replica Manager CLI used for EB ? CNAF and CNAF ?
    Legnaro
  • Java-based CLI introduces non negligible overhead
    at start-up
  • globus-url-copy LRC C API used for EB ?PIC
    and PIC ? Ciemat
  • Faster
  • Performance has been good with both tools
  • Total network throughput limited by small file
    size
  • Some transfer problem caused by performance of
    underlying MSS
  • Always use a disk SE in front of an MSS in the
    future?

A. Fanfani
56
Real-time DC04 analysis job time statistic
Dataset bt03_ttbb_ttH analysed with executable
ttHWmu
Total execution time 28 minutes
ORCA execution time 25 minutes
Time for staging input and output files 170 s
Job waiting time before starting 120 s
Overhead of GRID waiting time in queue
N. De Filippis, A. Fanfani, F. Fanzago
Write a Comment
User Comments (0)
About PowerShow.com