Title: Summary Session I
1Summary Session I
27 May 2005
ACAT05
2Outline
19 presentations
- Data Analysis, Data Acquisition and Tools 6
- GRID Deployment 4
- Applications on the GRID 5
- High Speed Computing 4
3Data Analysis, Acquisition, Tools
- Evolution of the Babar configuration data base
design - DAQ software for SND detector
- Interactive Analysis environment of unified
accelerator libraries - DaqProVis, a toolkit for acquisition, analysis,
visualisation - The Graphics Editor in ROOT
- Parallel interactive and batch HEP data analysis
with PROOF
4Evolution of the Configuration Database Design
- Andrei Salnikov, SLAC
- For BaBar Computing GroupACAT05 DESY, Zeuthen
5BaBar database migration
- BaBar was using Objectivity/DB ODBMS for many of
its databases - About two years ago started migration from
Objectivity to ROOT for event store, which was a
success and improvement - No reason to keep pricey Objectivity only because
of secondary databases - Migration effort started in 2004 for conditions,
configuration, prompt reconstruction, and ambient
databases
6Configuration database API
- Main problem of the old database API exposed
too much to the implementation technology - Persistent objects, handles, class names, etc.
- API has to change but we dont want to make the
same mistakes again (new mistakes are more
interesting) - Pure transient-level abstract API independent on
any specific implementation technology - Always make abstract APIs to avoid problems in
the future (this may be hard and need few
iterations) - Client code should be free from any specific
database implementation details - Early prototyping could answer a lot of
questions, but five years of experience count too - Use different implementations for clients with
different requirements - Implementation would benefit from features
currently missing in C reflection,
introspection (or from completely new language)
7DAQ software for SND detector
- Budker Institute of Nuclear Physics, Novosibirsk
- M. Achasov, A. Bogdanchikov, A. Kim, A. Korol
8Main data flow
100 Hz 1 KB
1 kHz 4 KB
1 kHz 4 KB
1 kHz 1 KB
Readout and events building
Events packing
Events filtering
- Expected rates
- Events fragments 4 ?B/s are read from IO
processors over Ethernet - Event building 4 MB/s
- Event packing 1 ?B/s
- Events filtering (90 screening) 100 KB/sec.
9DAQ architecture
Detector
Off-line
Front-end electronics
Filtered events
Backup
X 12
X 16
Readout Event Building
Visualization
Buffer
Database
Calibration process
System support
10(No Transcript)
11Interactive Analysis Environment of Unified
Accelerator Libraries
- V. Fine, N. Malitsky, R.Talman
12Abstract
- Unified Accelerator Libraries (UAL,http//www.ual
.bnl.gov) software is an open accelerator
simulation environment addressing a broad
spectrum of accelerator tasks ranging from
online-oriented efficient models to full-scale
realistic beam dynamics studies. The paper
introduces a new package integrating UAL
simulation algorithms with the Qt-based Graphical
User Interface and an open collection of analysis
and visualization components. The primary user
application is implemented as an interactive and
configurable Accelerator Physics Player whose
extensibility is provided by plug-in
architecture. Its interface to data analysis and
visualization modules is based on the Qt layer
(http//root.bnl.gov) developed and supported by
the Star experiment. The present version embodies
the ROOT (http//root.cern.ch) data analysis
framework and Coin 3D (http//www.coin3d.org)
graphics library.
13Accelerator Physics Player
UALUSPASBasicPlayer player new
UALUSPASBasicPlayer() player-gtsetShell(she
ll) qApp.setMainWidget(player)
player-gtshow() qApp.exec()
An open collection of viewers
An open collection of algorithms
14Examples of the Accelerator-Specific Viewers
Turn-By-Turn BPM data (based on ROOT TH2F or
TGraph )
Twiss plots (based on ROOT TGraph)
Bunch 3D Distributions (based on COIN 3D)
15 - Parallel Interactive and Batch
- HEP-Data Analysis
- with PROOF
- Maarten Ballintijn, Marek Biskup,
- Rene Brun, Philippe Canal,
- Derek Feichtinger, Gerardo Ganis,
- Guenter Kickinger, Andreas Peters,
- Fons Rademakers
- MIT - CERN - FNAL
- PSI
16ROOT Analysis Model
standard model
- Files analyzed on a local computer
- Remote data accessed via remote fileserver
(rootd/xrootd)
Remote file (dcache, Castor, RFIO, Chirp)
Local file
Client
Rootd/xrootd server
17PROOF Basic Architecture
Single-Cluster mode
- The Master divides the work among the slaves
- After the processing finishes, merges the results
(histograms, scatter plots) - And returns the result to the Client
Slaves
Master
Files
Client
Commands, scripts
Histograms, plots
18PROOF and Selectors
The code is shipped to each slave and
SlaveBegin(), Init(), Process(), SlaveTerminate()
are executed there
Initialize each slave
Many Trees are being processed
No users control of the entries loop!
The same code works also without PROOF.
19Analysis session snapshot
What we are implementing
AQ1 1s query produces a local histogram AQ2 a
10mn query submitted to PROOF1 AQ3-gtAQ7 short
queries AQ8 a 10h query submitted to PROOF2
Monday at 10h15 ROOT session On my laptop
BQ1 browse results of AQ2 BQ2 browse temporary
results of AQ8 BQ3-gtBQ6 submit 4 10mn queries
to PROOF1
Monday at 16h25 ROOT session On my laptop
Wednesday at 8h40 session on any web browser
CQ1 Browse results of AQ8, BQ3-gtBQ6
20ROOT Graphics Editorby Ilka Antcheva
- ROOT graphics editor can be
- Embedded connected only with the canvas in the
application window - Global has own application window and can be
connected to any created canvas in a ROOT session.
21Focus on Users
- Novices (for a short time)
- Theoretical understanding, no practical
experience with ROOT - Impatient with learning concepts patient with
performing tasks - Advanced beginners (many people remain at this
level) - Focus on a few tasks and learn more on a
need-to-do basis - Perform several given tasks well
- Competent performers (fewer then previous class)
- Know and perform complex tasks that require
coordinated actions - Interested in solving problems and tracking down
errors - Experts (identified by others)
- Ability to find solution in complex functionality
- Interested in theories behind the design
- Interested in interacting with other expert
systems
22DaqProVisM.Morhac
- DaqProVis, a toolkit for acquisition, interactive
analysis, processing and visualization of
multidimensional data - Basic features
- DaqProVis is well suited for interactive analysis
of multiparameter data from small and medium
sized experiments in nuclear physics. - data acquisition part of the system allows one to
acquire multiparameter events either directly
from the experiment or from a list file, i.e.,
the system can work either in on-line or off-line
acquisition mode. - in on-line acquisition mode, events can be taken
directly from CAMAC crates or from VME system
that cooperates with DaqProVis in the
client-server working mode. - in off-line acquisition mode the system can
analyze event data even from big experiments,
e.g. from Gammasphere. - the event data can be read also from another
DaqProVis system. The capability of DaqProVis to
work simultaneously in both the client and the
server working mode enables us to realize remote
as well as distributed nuclear data acquisition,
processing and visualization systems and thus to
create multilevel configurations
23DaqProVis (Visualisation)
24DaqProVis (suite)
- DaqProVis and ROOT teams are already cooperating.
- Agreement during the workshop to extend this
cooperation
25GRID deployment
- Towards the operation of the Italian Tier-1 for
CMS Lessons learned from the CMS Data Challenge - GRID technology in production at DESY
- Grid middleware Configuration at the KIPT CMS
Linux Cluster - Storage resources management and access at Tier1
CNAF
26Towards the operations ofthe Italian Tier-1 for
CMSlessons learned from the CMS Data Challenge
- D. Bonacorsi
- (on behalf of INFN-CNAF Tier-1 staff and the CMS
experiment)
ACAT 2005 X Int. Work. on Advanced Computing
Analysis Techniques in Physics Research May
22nd-27th, 2005 - DESY, Zeuthen, Germany
27DC04 outcome (grand-summary focus on INFN T1)
- reconstruction/data-transfer/analysis may run at
25 Hz - automatic registration and distribution of data,
key role of the TMDB - was the embrional PhEDEx!
- support a (reasonable) variety of different data
transfer tools and set-up - Tier-1s different performances, related to
operational choices - SRB, LCG Replica Manager and SRM investigated
see CHEP04 talk - INFN T1 good performance of LCG-2 chain (PIC T1
also) - register all data and metadata (POOL) to a
world-readable catalogue - RLS good as a global file catalogue, bad as a
global metadata catalogue - analyze the reconstructed data at the Tier-1s as
data arrive - LCG components dedicated bdIIRB UIs, CEsWNs
at CNAF and PIC - real-time analysis at Tier-2s was demonstrated
to be possible - 15k jobs submitted
- time window between reco data availability -
start of analysis jobs can be reasonably low
(i.e. 20 mins) - reduce number of files (i.e. increase
lteventsgt/ltfilesgt) - more efficient use of bandwidth
- reduce overhead of commands
- address scalability of MSS systems (!)
28Learn from DC04 lessons
- Some general considerations may apply
- although a DC is experiment-specific, maybe its
conclusions are not - an experiment-specific problem is better
addressed if conceived as a shared one in a
shared Tier-1 - an experiment DC just provides hints, real work
gives insight - ? crucial role of the experiments at the Tier-1
- find weaknesses of CASTOR MSS system in
particular operating conditions - stress-test new LSF farm with official production
jobs by CMS - testing DNS-based load-balancing by serving data
for production and/or analysis from CMS
disk-servers - test new components, newly installed/upgraded
Grid tools, etc - find bottleneck and scalability problems in DB
services - give feedback on monitoring and accounting
activities
29PhEDEx at INFN
- INFN-CNAF is a T1 node in PhEDEx
- CMS DC04 experience was crucial to start-up
PhEDEX in INFN - CNAF node operational since the beginning
- First phase (Q3/4 2004)
- Agent code development focus on operations
T0?T1 transfers - gt1 TB/day T0?T1 demonstrated feasible
- but the aim is not to achieve peaks, but to
sustain them in normal operations - Second phase (Q1 2005)
- PhEDEx deployment in INFN to Tier-n, ngt1
- distributed topology scenario
- Tier-n agents run at remote sites, not at the T1
know-how required, T1 support - already operational at Legnaro, Pisa, Bari,
Bologna
An example data flow to T2s in daily operations
(here a test with 2000 files, 90 GB, with no
optimization)
450 Mbps CNAF T1 ? LNL-T2
205 Mbps CNAF T1 ? Pisa-T2
- Third phase (Qgt1 2005)
- Many issues.. e.g. stability of service, dynamic
routing, coupling PhEDEx to CMS official
production system, PhEDEx involvement in
SC3-phaseII, etc
30Storage resources management and access at TIER1
CNAF
Ricci Pier Paolo, Lore Giuseppe, Vagnoni Vincenzo
on behalf of INFN TIER1 Staff pierpaolo.ricci_at_cnaf
.infn.it
- ACAT 2005
- May 22-27 2005
- DESY Zeuthen, Germany
31TIER1 INFN CNAF Storage
HSM (400 TB)
NAS (20TB)
STK180 with 100 LTO-1 (10Tbyte Native)
NAS1,NAS4 3ware IDE SAS 18003200 Gbyte
Linux SL 3.0 clients (100-1000 nodes)
W2003 Server with LEGATO Networker (Backup)
RFIO
NFS
PROCOM 3600 FC NAS3 4700 Gbyte
WAN or TIER1 LAN
CASTOR HSM servers
H.A.
PROCOM 3600 FC NAS2 9000 Gbyte
STK L5500 robot (5500 slots) 6 IBM LTO-2, 2 (4)
STK 9940B drives
NFS-RFIO-GridFTP oth...
SAN 1 (200TB)
SAN 2 (40TB)
Diskservers with Qlogic FC HBA 2340
Infortrend 4 x 3200 GByte SATA A16F-R1A2-M1
IBM FastT900 (DS 4500) 3/4 x 50000 GByte 4 FC
interfaces
2 Brocade Silkworm 3900 32 port FC Switch
2 Gadzoox Slingshot 4218 18 port FC Switch
AXUS BROWIE About 2200 GByte 2 FC interface
STK BladeStore About 25000 GByte 4 FC interfaces
Infortrend 5 x 6400 GByte SATA A16F-R1211-M2
JBOD
32CASTOR HSM
Point to Point FC 2Gb/s connections
STK L5500 20003500 mixed slots 6 drives LTO2
(20-30 MB/s) 2 drives 9940B (25-30 MB/s) 1300
LTO2 (200 GB native) 650 9940B (200 GB native)
8 tapeserver Linux RH AS3.0 HBA Qlogic 2300
Sun Blade v100 with 2 internal ide disks with
software raid-0 running ACSLS 7.0 OS Solaris 9.0
1 CASTOR (CERN)Central Services server RH AS3.0
1 ORACLE 9i rel 2 DB server RH AS 3.0
EXPERIMENT Staging area (TB) Tape pool (TB native)
ALICE 8 12
ATLAS 6 20
CMS 2 15
LHCb 18 30
BABAR,AMSoth 2 4
WAN or TIER1 LAN
6 stager with diskserver RH AS3.0 15 TB Local
staging area
8 or more rfio diskservers RH AS 3.0 min 20TB
staging area
Indicates Full rendundancy FC 2Gb/s connections
(dual controller HW and Qlogic SANsurfer Path
Failover SW)
SAN 1
SAN 2
33DISK access (2)
- We have different protocols in production for
accessing the disk storage. In our diskservers
and Grid SE front-ends we corrently have - NFS on local filesystem ADV. Easy client
implementation and compatibility and possibility
of failover (RH 3.0). DIS. Bad perfomance
scalability for an high number of access (1
client 30MB/s 100 client 15MB/s throughtput) - RFIO on local filesystem ADV. Good performance
and compatibility with Grid Tools and possibility
of failover. DIS. No scalability of front-ends
for the single filesystem, no possibility of
load-balancing - Grid SE Gridftp/rfio over GPFS (CMS,CDF) ADV
Separation from GPFS servers (accessing the
disks) and SE GPFS clients. Load balancing and HA
on the GPFS servers and possibility to implement
the same on the Grid SE services (see next
slide). DIS. GPFS layer requirements on OS and
Certified Hardware for support. - Xrootd (BABAR) ADV Good performance DIS No
possibility of load-balancing for the single
filesystem backends, not grid compliant (at
present...) - NOTE The IBM GPFS 2.2 is a CLUSTERED FILESYSTEM
so is possible from many front-ends (i.e. gridftp
or rfio server) to access simultaneously the SAME
filesystem. Also can use bigger filesystem size
(we use 8-12TB). - 1
34Generic Benchmark(here shown for 1 GB files)
WRITE (MB/s) WRITE (MB/s) WRITE (MB/s) WRITE (MB/s) WRITE (MB/s) READ (MB/s) READ (MB/s) READ (MB/s) READ (MB/s) READ (MB/s)
of simultaneous client processes of simultaneous client processes 1 5 10 50 120 1 5 10 50 120
GPFS 2.3.0-1 native 114 160 151 147 147 85 301 301 305 305
GPFS 2.3.0-1 NFS 102 171 171 159 158 114 320 366 322 292
GPFS 2.3.0-1 RFIO 79 171 158 166 166 79 320 301 320 321
Lustre 1.4.1 native 102 512 512 488 478 73 366 640 453 403
Lustre 1.4.1 RFIO 93 301 320 284 281 68 269 269 314 349
- Numbers are reproducible with small fluctuations
- Lustre tests with NFS export not yet performed
35 Grid Technology in Production at DESY
Andreas Gellrich DESY ACAT 2005 24 May
2005 http//www.desy.de/gellrich/
36Grid _at_ DESY
- With the HERA-II luminosity upgrade, the demand
for MC production rapidly increased while the
outside collaborators moved there computing
resources towards LCG - The ILC group plans the usage of Grids for their
computing needs - The LQCD group develops a Data Grid to exchange
data - DESY considers a participation in LHC experiments
- EGEE and D-GRID
- dCache is a DESY / FNAL development
- Since spring 2004 an LCG-2 Grid infrastructure in
operation
37Grid Infrastructure _at_ DESY
- DESY installed (SL3.04, Quattor, yaim) and
operates a complete independent Grid
infrastructure which provides generic (non-
experiment specific) Grid services to all
experiments and groups - The DESY Production Grid is based on LCG-2_4_0
and includes - Resource Broker (RB), Information Index (BDII),
Proxy (PXY) - Replica Location Services (RLS)
- In total 24 17 WNs (48 34 82 CPUs)
- dCache-based SE with access to the entire DESY
data space - VO management for the HERA experiments (hone,
herab, hermes, szeu), LQCD (ildg), ILC
(ilc, calice), Astro-particle Physics
(baikal, icecube) - Certification services for DESY users in
cooperation with GridKa
38(No Transcript)
39Grid Middleware Configuration at the KIPT CMS
Linux Cluster
- S. Zub, L. Levchuk, P. Sorokin, D. Soroka
- Kharkov Institute of Physics Technology, 61108
Kharkov, Ukraine - http//www.kipt.kharkov.ua/cmsstah_at_kipt.kharkov
.ua
40What is our specificity?
- Small PC-farm (KCC)
- Small scientific group of 4 physicists,
combining their work with system administration - CMS tasks orientation
- No commercial software installed
- Self-security providing
- Narrow bandwidth communication channel
- Limited traffic
41Summary
- An enormous data flow expected in the LHC
experiments forces the HEP community to resort to
the Grid technology - The KCC is a specialized PC farm constructed at
the NSC KIPT for computer simulations within the
CMS physics program and preparation to the CMS
data analysis - Further development of the KCC is planned with
considerable increase of its capacities and
deeper integration into the LHC Grid (LCG)
structures - Configuration of the LCG middleware can be
troublesome (especially at small farms with poor
internet connection), since this software is
neither universal nor complete, and one has to
resort to special tips - Scripts are developed that facilitate the
installation procedure at a small PC farm with a
narrow internet bandwidth
42Applications on the Grid
- The CMS analysis chain in a distributed
environment - Monte Carlo Mass production for ZEUS on the Grid
- Metadata services on the Grid
- Performance comparison of the LCG2 and gLite File
Catalogues - Data Grids for Lattice QCD
43The CMS analysis chain in a distributed
environment
Nicola De Filippis
on behalf of the CMS collaboration
ACAT 2005 DESY, Zeuthen, Germany 22nd 27th
May, 2005
44The CMS analysis tools
- Overview
- Data management
- Data Transfer service PHEDEX
- Data Validation stuff
ValidationTools - Data Publication service
RefDB/PubDB - Analysis Strategy
- Distributed Software installation XCMSI
- Analysis job submission tool CRAB
- Job Monitoring
- System monitoring BOSS
- application job monitoring JAM
45The end-user analysis wokflow
- The user provides
- Dataset (runs,event,..)
- private code
DataSet Catalogue (PubDB/RefDB)
CRAB Job submission tool
- CRAB discovers data and sites hosting them by
querying RefDB/ PubDB
- CRAB prepares, splits and submits jobs to the
Resource Broker
Workload Management System
Resource Broker (RB)
- The RB sends jobs at sites hosting the data
provided the CMS software was installed
XCMSI
Computing Element
- CRAB retrieves automatically the output files of
the the job
Worker node
Storage Element
46- CMS first working prototype for Distributed User
Analysis is - available and used by real users
- Phedex, PubDB, ValidationTools, XCMSI, CRAB,
BOSS, JAM under development, deployment and in
production in many sites - CMS is using Grid infrastructure for physics
analyses and Monte Carlo production - tens of users, 10 million of analysed data,
10000 jobs submitted - CMS is designing a new architecture for the
analysis workflow
47(No Transcript)
48(No Transcript)
49Metadata Services on the GRID
- Nuno Santos
- ACAT05
- May 25th, 2005
50Metadata on the GRID
- Metadata is data about data
- Metadata on the GRID
- Mainly information about files
- Other information necessary for running jobs
- Usually living on DBs
- Need simple interface for Metadata access
- Advantages
- Easier to use by clients - no SQL, only metadata
concepts - Common interface clients dont have to reinvent
the wheel - Must be integrated in the File Catalogue
- Also suitably for storing information about other
resources
51ARDA Implementation
- Backends
- Currently Oracle, PostgreSQL, SQLite
- Two frontends
- TCP Streaming
- Chosen for performance
- SOAP
- Formal requirement of EGEE
- Compare SOAP with TCP Streaming
- Also implemented as standalone Python library
- Data stored on filesystem
52SOAP Toolkits performance
- Test communication performance
- No work done on the backend
- Switched 100Mbits LAN
- Language comparison
- TCP-S with similar performance in all languages
- SOAP performance varies strongly with toolkit
- Protocols comparison
- Keepalive improves performance significantly
- On Java and Python, SOAP is several times slower
than TCP-S
1000 pings
53(No Transcript)
54(No Transcript)
55(No Transcript)
56(No Transcript)
57(No Transcript)
58(No Transcript)
59High speed Computing
- Infiniband
- Analysis of SCTP and TCP based communication in
high-speed cluster - The apeNEXT Project
- Optimisation of Lattice QCD codes for the Opteron
processor
60InfiniBand Experiences at Forschungszentrum
Karlsruhe
Forschungszentrum Karlsruhein der
Helmholtz-Gemeinschaft
- A. Heiss, U. Schwickerath
- InfiniBand-Overview
- Hardware setup at IWR
- HPC applications
- MPI performance
- lattice QCD
- LM
- HTC applications
- rfio
- xrootd
Credits Inge Bischoff-Gauss Marc García
Martí Bruno Hoeft
Carsten Urbach
61Lattice QCD Benchmark GE wrt/ InfiniBand
- Memory and communi- cation intensive
application - Benchmark by
- C. Urbach
- See also CHEP04 talk given by A. Heiss
Significant speedup by using InfiniBand
Thanks to Carsten Urbach FU Berlin and DESY
Zeuthen
62RFIO/IB Point-to-Point file transfers (64bit)
PCI-X and PCI-Express throughput
Notes
best results with PCI-Express gt 800MB/s raw
transfer speed gt 400MB/s file transfer speed
RFIO/IB see ACAT03 NIM A 534(2004) 130-134
Disclaimer on PPC64 Not an official IBM
Product. Technology Prototype. (see also
slide 5 and 6)
solid file transfers cache-gt/dev/null dashed
networkprotocol only
63Xrootd and InfiniBand
Notes
First preliminary results
- IPoIB notes
- Dual Opteron V20z
- Mellanox Gold drivers
- SM on InfiniCon 9100
- same nodes as for GE
- Native IB notes
- proof of concept version
- based on Mellanox VAPI
- using IB_SEND
- dedicated send/recv buffers
- same nodes as above
- 10GE notes
- IBM xseries 345 nodes
- Xeon 32bit, single CPU
- 1 and 2 GB RAM
- 2.66GHz clock speed
- Intel PRO/10GbE LR cards
- used for long distance tests
64TCP vs. SCTP in high-speed cluster environment
- Miklos Kozlovszky
- Budapest University of Technology and Economics
- BUTE
65TCP vs. SCTP
- Both
- IPv4 IPv6 compatible
- Reliable
- Connection oriented
- Offers acknowledged, error free, non-duplicated
transfer - Almost same Flow and Congestion Control
TCP SCTP
Byte stream oriented Message oriented
3 way handshake connection init 4 way handshake connection init (cookie)
Old (more than 20 years) Quite new (2000-)
Multihoming
Path-mtu discovery
66Summary
- SCTP inherited all the good features of TCP
- SCTP want to behave like a next generation TCP
- It is more secure than TCP, and has many
attractive feature (e.g.multihoming) - Theoretically it can work better than TCP, but
TCP is faster (yet poor implementations) - Well standardized, and can be useful for cluster
67(No Transcript)
68(No Transcript)
69(No Transcript)
70(No Transcript)
71(No Transcript)
72(No Transcript)
73(No Transcript)
74(No Transcript)
75My Impressions
76Concerns
- Only a small fraction of the Session I talks
correspond to the original spirit of the
AIHEP/ACAT Session I talks. - In particular, many of the GRID talks about
deployment and infrastructure should be given to
CHEP, not here. - The large LHC collaborations have their own ACAT
a few times/year. - The huge experiment software frameworks do not
encourage cross-experiments discussions or tools. - For the next ACAT, the key people involved in the
big experiments should work together to encourage
more talks or reviews.
77Positive aspects
- ACAT continues to be a good opportunity to meet
with other cultures. Innovation may come from
small groups or non HENP fields. - Contacts (even sporadic) with Session III or
plenary talks are very beneficial, in particular
to young people.
78The Captain of Kopenick
- Question to the audience ?
- Is Friedrich Wilhelm Voigt (Captain of Kopenick)
an ancestor of Voigt, the father of the Voigt
function ?