Summary Session I presentation

About This Presentation

Transcript and Presenter's Notes

Title: Summary Session I

1
Summary Session I

René Brun

27 May 2005
ACAT05
2
Outline
19 presentations

Data Analysis, Data Acquisition and Tools 6
GRID Deployment 4
Applications on the GRID 5
High Speed Computing 4

3
Data Analysis, Acquisition, Tools

Evolution of the Babar configuration data base
design
DAQ software for SND detector
Interactive Analysis environment of unified
accelerator libraries
DaqProVis, a toolkit for acquisition, analysis,
visualisation
The Graphics Editor in ROOT
Parallel interactive and batch HEP data analysis
with PROOF

4
Evolution of the Configuration Database Design

Andrei Salnikov, SLAC
For BaBar Computing GroupACAT05 DESY, Zeuthen

5
BaBar database migration

BaBar was using Objectivity/DB ODBMS for many of
its databases
About two years ago started migration from
Objectivity to ROOT for event store, which was a
success and improvement
No reason to keep pricey Objectivity only because
of secondary databases
Migration effort started in 2004 for conditions,
configuration, prompt reconstruction, and ambient
databases

6
Configuration database API

Main problem of the old database API exposed
too much to the implementation technology
Persistent objects, handles, class names, etc.
API has to change but we dont want to make the
same mistakes again (new mistakes are more
interesting)
Pure transient-level abstract API independent on
any specific implementation technology
Always make abstract APIs to avoid problems in
the future (this may be hard and need few
iterations)
Client code should be free from any specific
database implementation details
Early prototyping could answer a lot of
questions, but five years of experience count too
Use different implementations for clients with
different requirements
Implementation would benefit from features
currently missing in C reflection,
introspection (or from completely new language)

7
DAQ software for SND detector

Budker Institute of Nuclear Physics, Novosibirsk
M. Achasov, A. Bogdanchikov, A. Kim, A. Korol

8
Main data flow
100 Hz 1 KB
1 kHz 4 KB
1 kHz 4 KB
1 kHz 1 KB
Readout and events building
Events packing
Events filtering

Storage

Expected rates
Events fragments 4 ?B/s are read from IO
processors over Ethernet
Event building 4 MB/s
Event packing 1 ?B/s
Events filtering (90 screening) 100 KB/sec.

9
DAQ architecture
Detector
Off-line
Front-end electronics
Filtered events
Backup
X 12
X 16
Readout Event Building
Visualization
Buffer
Database
Calibration process
System support
10
(No Transcript)
11
Interactive Analysis Environment of Unified
Accelerator Libraries

V. Fine, N. Malitsky, R.Talman

12
Abstract

Unified Accelerator Libraries (UAL,http//www.ual
.bnl.gov) software is an open accelerator
simulation environment addressing a broad
spectrum of accelerator tasks ranging from
online-oriented efficient models to full-scale
realistic beam dynamics studies. The paper
introduces a new package integrating UAL
simulation algorithms with the Qt-based Graphical
User Interface and an open collection of analysis
and visualization components. The primary user
application is implemented as an interactive and
configurable Accelerator Physics Player whose
extensibility is provided by plug-in
architecture. Its interface to data analysis and
visualization modules is based on the Qt layer
(http//root.bnl.gov) developed and supported by
the Star experiment. The present version embodies
the ROOT (http//root.cern.ch) data analysis
framework and Coin 3D (http//www.coin3d.org)
graphics library.

13
Accelerator Physics Player
UALUSPASBasicPlayer player new
UALUSPASBasicPlayer() player-gtsetShell(she
ll) qApp.setMainWidget(player)
player-gtshow() qApp.exec()
An open collection of viewers
An open collection of algorithms
14
Examples of the Accelerator-Specific Viewers
Turn-By-Turn BPM data (based on ROOT TH2F or
TGraph )
Twiss plots (based on ROOT TGraph)
Bunch 3D Distributions (based on COIN 3D)
15

Parallel Interactive and Batch
HEP-Data Analysis
with PROOF
Maarten Ballintijn, Marek Biskup,
Rene Brun, Philippe Canal,
Derek Feichtinger, Gerardo Ganis,
Guenter Kickinger, Andreas Peters,
Fons Rademakers

- MIT - CERN - FNAL
- PSI
16
ROOT Analysis Model
standard model

Files analyzed on a local computer
Remote data accessed via remote fileserver
(rootd/xrootd)

Remote file (dcache, Castor, RFIO, Chirp)
Local file
Client
Rootd/xrootd server
17
PROOF Basic Architecture
Single-Cluster mode

The Master divides the work among the slaves
After the processing finishes, merges the results
(histograms, scatter plots)
And returns the result to the Client

Slaves
Master
Files
Client
Commands, scripts
Histograms, plots
18
PROOF and Selectors
The code is shipped to each slave and
SlaveBegin(), Init(), Process(), SlaveTerminate()
are executed there
Initialize each slave
Many Trees are being processed
No users control of the entries loop!
The same code works also without PROOF.
19
Analysis session snapshot
What we are implementing
AQ1 1s query produces a local histogram AQ2 a
10mn query submitted to PROOF1 AQ3-gtAQ7 short
queries AQ8 a 10h query submitted to PROOF2
Monday at 10h15 ROOT session On my laptop
BQ1 browse results of AQ2 BQ2 browse temporary
results of AQ8 BQ3-gtBQ6 submit 4 10mn queries
to PROOF1
Monday at 16h25 ROOT session On my laptop
Wednesday at 8h40 session on any web browser
CQ1 Browse results of AQ8, BQ3-gtBQ6
20
ROOT Graphics Editorby Ilka Antcheva

ROOT graphics editor can be
Embedded connected only with the canvas in the
application window
Global has own application window and can be
connected to any created canvas in a ROOT session.

21
Focus on Users

Novices (for a short time)
Theoretical understanding, no practical
experience with ROOT
Impatient with learning concepts patient with
performing tasks
Advanced beginners (many people remain at this
level)
Focus on a few tasks and learn more on a
need-to-do basis
Perform several given tasks well
Competent performers (fewer then previous class)
Know and perform complex tasks that require
coordinated actions
Interested in solving problems and tracking down
errors
Experts (identified by others)
Ability to find solution in complex functionality
Interested in theories behind the design
Interested in interacting with other expert
systems

22
DaqProVisM.Morhac

DaqProVis, a toolkit for acquisition, interactive
analysis, processing and visualization of
multidimensional data
Basic features
DaqProVis is well suited for interactive analysis
of multiparameter data from small and medium
sized experiments in nuclear physics.
data acquisition part of the system allows one to
acquire multiparameter events either directly
from the experiment or from a list file, i.e.,
the system can work either in on-line or off-line
acquisition mode.
in on-line acquisition mode, events can be taken
directly from CAMAC crates or from VME system
that cooperates with DaqProVis in the
client-server working mode.
in off-line acquisition mode the system can
analyze event data even from big experiments,
e.g. from Gammasphere.
the event data can be read also from another
DaqProVis system. The capability of DaqProVis to
work simultaneously in both the client and the
server working mode enables us to realize remote
as well as distributed nuclear data acquisition,
processing and visualization systems and thus to
create multilevel configurations

23
DaqProVis (Visualisation)
24
DaqProVis (suite)

DaqProVis and ROOT teams are already cooperating.
Agreement during the workshop to extend this
cooperation

25
GRID deployment

Towards the operation of the Italian Tier-1 for
CMS Lessons learned from the CMS Data Challenge
GRID technology in production at DESY
Grid middleware Configuration at the KIPT CMS
Linux Cluster
Storage resources management and access at Tier1
CNAF

26
Towards the operations ofthe Italian Tier-1 for
CMSlessons learned from the CMS Data Challenge

D. Bonacorsi
(on behalf of INFN-CNAF Tier-1 staff and the CMS
experiment)

ACAT 2005 X Int. Work. on Advanced Computing
Analysis Techniques in Physics Research May
22nd-27th, 2005 - DESY, Zeuthen, Germany
27
DC04 outcome (grand-summary focus on INFN T1)

reconstruction/data-transfer/analysis may run at
25 Hz
automatic registration and distribution of data,
key role of the TMDB
was the embrional PhEDEx!
support a (reasonable) variety of different data
transfer tools and set-up
Tier-1s different performances, related to
operational choices
SRB, LCG Replica Manager and SRM investigated
see CHEP04 talk
INFN T1 good performance of LCG-2 chain (PIC T1
also)
register all data and metadata (POOL) to a
world-readable catalogue
RLS good as a global file catalogue, bad as a
global metadata catalogue
analyze the reconstructed data at the Tier-1s as
data arrive
LCG components dedicated bdIIRB UIs, CEsWNs
at CNAF and PIC
real-time analysis at Tier-2s was demonstrated
to be possible
15k jobs submitted
time window between reco data availability -
start of analysis jobs can be reasonably low
(i.e. 20 mins)
reduce number of files (i.e. increase
lteventsgt/ltfilesgt)
more efficient use of bandwidth
reduce overhead of commands
address scalability of MSS systems (!)

28
Learn from DC04 lessons

Some general considerations may apply
although a DC is experiment-specific, maybe its
conclusions are not
an experiment-specific problem is better
addressed if conceived as a shared one in a
shared Tier-1
an experiment DC just provides hints, real work
gives insight
? crucial role of the experiments at the Tier-1
find weaknesses of CASTOR MSS system in
particular operating conditions
stress-test new LSF farm with official production
jobs by CMS
testing DNS-based load-balancing by serving data
for production and/or analysis from CMS
disk-servers
test new components, newly installed/upgraded
Grid tools, etc
find bottleneck and scalability problems in DB
services
give feedback on monitoring and accounting
activities

29
PhEDEx at INFN

INFN-CNAF is a T1 node in PhEDEx
CMS DC04 experience was crucial to start-up
PhEDEX in INFN
CNAF node operational since the beginning
First phase (Q3/4 2004)
Agent code development focus on operations
T0?T1 transfers
gt1 TB/day T0?T1 demonstrated feasible
but the aim is not to achieve peaks, but to
sustain them in normal operations
Second phase (Q1 2005)
PhEDEx deployment in INFN to Tier-n, ngt1
distributed topology scenario
Tier-n agents run at remote sites, not at the T1
know-how required, T1 support
already operational at Legnaro, Pisa, Bari,
Bologna

An example data flow to T2s in daily operations
(here a test with 2000 files, 90 GB, with no
optimization)
450 Mbps CNAF T1 ? LNL-T2
205 Mbps CNAF T1 ? Pisa-T2

Third phase (Qgt1 2005)
Many issues.. e.g. stability of service, dynamic
routing, coupling PhEDEx to CMS official
production system, PhEDEx involvement in
SC3-phaseII, etc

30
Storage resources management and access at TIER1
CNAF
Ricci Pier Paolo, Lore Giuseppe, Vagnoni Vincenzo
on behalf of INFN TIER1 Staff pierpaolo.ricci_at_cnaf
.infn.it

ACAT 2005
May 22-27 2005
DESY Zeuthen, Germany

31
TIER1 INFN CNAF Storage
HSM (400 TB)
NAS (20TB)
STK180 with 100 LTO-1 (10Tbyte Native)
NAS1,NAS4 3ware IDE SAS 18003200 Gbyte
Linux SL 3.0 clients (100-1000 nodes)
W2003 Server with LEGATO Networker (Backup)
RFIO
NFS
PROCOM 3600 FC NAS3 4700 Gbyte
WAN or TIER1 LAN
CASTOR HSM servers
H.A.
PROCOM 3600 FC NAS2 9000 Gbyte
STK L5500 robot (5500 slots) 6 IBM LTO-2, 2 (4)
STK 9940B drives
NFS-RFIO-GridFTP oth...
SAN 1 (200TB)
SAN 2 (40TB)
Diskservers with Qlogic FC HBA 2340
Infortrend 4 x 3200 GByte SATA A16F-R1A2-M1
IBM FastT900 (DS 4500) 3/4 x 50000 GByte 4 FC
interfaces
2 Brocade Silkworm 3900 32 port FC Switch
2 Gadzoox Slingshot 4218 18 port FC Switch
AXUS BROWIE About 2200 GByte 2 FC interface
STK BladeStore About 25000 GByte 4 FC interfaces
Infortrend 5 x 6400 GByte SATA A16F-R1211-M2
JBOD
32
CASTOR HSM
Point to Point FC 2Gb/s connections
STK L5500 20003500 mixed slots 6 drives LTO2
(20-30 MB/s) 2 drives 9940B (25-30 MB/s) 1300
LTO2 (200 GB native) 650 9940B (200 GB native)
8 tapeserver Linux RH AS3.0 HBA Qlogic 2300
Sun Blade v100 with 2 internal ide disks with
software raid-0 running ACSLS 7.0 OS Solaris 9.0
1 CASTOR (CERN)Central Services server RH AS3.0
1 ORACLE 9i rel 2 DB server RH AS 3.0
EXPERIMENT Staging area (TB) Tape pool (TB native)
ALICE 8 12
ATLAS 6 20
CMS 2 15
LHCb 18 30
BABAR,AMSoth 2 4
WAN or TIER1 LAN
6 stager with diskserver RH AS3.0 15 TB Local
staging area
8 or more rfio diskservers RH AS 3.0 min 20TB
staging area
Indicates Full rendundancy FC 2Gb/s connections
(dual controller HW and Qlogic SANsurfer Path
Failover SW)
SAN 1
SAN 2
33
DISK access (2)

We have different protocols in production for
accessing the disk storage. In our diskservers
and Grid SE front-ends we corrently have
NFS on local filesystem ADV. Easy client
implementation and compatibility and possibility
of failover (RH 3.0). DIS. Bad perfomance
scalability for an high number of access (1
client 30MB/s 100 client 15MB/s throughtput)
RFIO on local filesystem ADV. Good performance
and compatibility with Grid Tools and possibility
of failover. DIS. No scalability of front-ends
for the single filesystem, no possibility of
load-balancing
Grid SE Gridftp/rfio over GPFS (CMS,CDF) ADV
Separation from GPFS servers (accessing the
disks) and SE GPFS clients. Load balancing and HA
on the GPFS servers and possibility to implement
the same on the Grid SE services (see next
slide). DIS. GPFS layer requirements on OS and
Certified Hardware for support.
Xrootd (BABAR) ADV Good performance DIS No
possibility of load-balancing for the single
filesystem backends, not grid compliant (at
present...)
NOTE The IBM GPFS 2.2 is a CLUSTERED FILESYSTEM
so is possible from many front-ends (i.e. gridftp
or rfio server) to access simultaneously the SAME
filesystem. Also can use bigger filesystem size
(we use 8-12TB).
1

34
Generic Benchmark(here shown for 1 GB files)
WRITE (MB/s) WRITE (MB/s) WRITE (MB/s) WRITE (MB/s) WRITE (MB/s) READ (MB/s) READ (MB/s) READ (MB/s) READ (MB/s) READ (MB/s)
of simultaneous client processes of simultaneous client processes 1 5 10 50 120 1 5 10 50 120
GPFS 2.3.0-1 native 114 160 151 147 147 85 301 301 305 305
GPFS 2.3.0-1 NFS 102 171 171 159 158 114 320 366 322 292
GPFS 2.3.0-1 RFIO 79 171 158 166 166 79 320 301 320 321
Lustre 1.4.1 native 102 512 512 488 478 73 366 640 453 403
Lustre 1.4.1 RFIO 93 301 320 284 281 68 269 269 314 349

Numbers are reproducible with small fluctuations
Lustre tests with NFS export not yet performed

35
Grid Technology in Production at DESY
Andreas Gellrich DESY ACAT 2005 24 May
2005 http//www.desy.de/gellrich/
36
Grid _at_ DESY

With the HERA-II luminosity upgrade, the demand
for MC production rapidly increased while the
outside collaborators moved there computing
resources towards LCG
The ILC group plans the usage of Grids for their
computing needs
The LQCD group develops a Data Grid to exchange
data
DESY considers a participation in LHC experiments
EGEE and D-GRID
dCache is a DESY / FNAL development
Since spring 2004 an LCG-2 Grid infrastructure in
operation

37
Grid Infrastructure _at_ DESY

DESY installed (SL3.04, Quattor, yaim) and
operates a complete independent Grid
infrastructure which provides generic (non-
experiment specific) Grid services to all
experiments and groups
The DESY Production Grid is based on LCG-2_4_0
and includes
Resource Broker (RB), Information Index (BDII),
Proxy (PXY)
Replica Location Services (RLS)
In total 24 17 WNs (48 34 82 CPUs)
dCache-based SE with access to the entire DESY
data space
VO management for the HERA experiments (hone,
herab, hermes, szeu), LQCD (ildg), ILC
(ilc, calice), Astro-particle Physics
(baikal, icecube)
Certification services for DESY users in
cooperation with GridKa

38
(No Transcript)
39
Grid Middleware Configuration at the KIPT CMS
Linux Cluster

S. Zub, L. Levchuk, P. Sorokin, D. Soroka
Kharkov Institute of Physics Technology, 61108
Kharkov, Ukraine
http//www.kipt.kharkov.ua/cmsstah_at_kipt.kharkov
.ua

40
What is our specificity?

Small PC-farm (KCC)
Small scientific group of 4 physicists,
combining their work with system administration
CMS tasks orientation
No commercial software installed
Self-security providing
Narrow bandwidth communication channel
Limited traffic

41
Summary

An enormous data flow expected in the LHC
experiments forces the HEP community to resort to
the Grid technology
The KCC is a specialized PC farm constructed at
the NSC KIPT for computer simulations within the
CMS physics program and preparation to the CMS
data analysis
Further development of the KCC is planned with
considerable increase of its capacities and
deeper integration into the LHC Grid (LCG)
structures
Configuration of the LCG middleware can be
troublesome (especially at small farms with poor
internet connection), since this software is
neither universal nor complete, and one has to
resort to special tips
Scripts are developed that facilitate the
installation procedure at a small PC farm with a
narrow internet bandwidth

42
Applications on the Grid

The CMS analysis chain in a distributed
environment
Monte Carlo Mass production for ZEUS on the Grid
Metadata services on the Grid
Performance comparison of the LCG2 and gLite File
Catalogues
Data Grids for Lattice QCD

43
The CMS analysis chain in a distributed
environment
Nicola De Filippis
on behalf of the CMS collaboration
ACAT 2005 DESY, Zeuthen, Germany 22nd 27th
May, 2005
44
The CMS analysis tools

Overview
Data management
Data Transfer service PHEDEX
Data Validation stuff
ValidationTools
Data Publication service
RefDB/PubDB
Analysis Strategy
Distributed Software installation XCMSI
Analysis job submission tool CRAB
Job Monitoring
System monitoring BOSS
application job monitoring JAM

45
The end-user analysis wokflow

The user provides
Dataset (runs,event,..)
private code

DataSet Catalogue (PubDB/RefDB)
CRAB Job submission tool

CRAB discovers data and sites hosting them by
querying RefDB/ PubDB

CRAB prepares, splits and submits jobs to the
Resource Broker

Workload Management System
Resource Broker (RB)

The RB sends jobs at sites hosting the data
provided the CMS software was installed

XCMSI
Computing Element

CRAB retrieves automatically the output files of
the the job

Worker node
Storage Element
46

CMS first working prototype for Distributed User
Analysis is
available and used by real users
Phedex, PubDB, ValidationTools, XCMSI, CRAB,
BOSS, JAM under development, deployment and in
production in many sites
CMS is using Grid infrastructure for physics
analyses and Monte Carlo production
tens of users, 10 million of analysed data,
10000 jobs submitted
CMS is designing a new architecture for the
analysis workflow

47
(No Transcript)
48
(No Transcript)
49
Metadata Services on the GRID

Nuno Santos
ACAT05
May 25th, 2005

50
Metadata on the GRID

Metadata is data about data
Metadata on the GRID
Mainly information about files
Other information necessary for running jobs
Usually living on DBs
Need simple interface for Metadata access
Advantages
Easier to use by clients - no SQL, only metadata
concepts
Common interface clients dont have to reinvent
the wheel
Must be integrated in the File Catalogue
Also suitably for storing information about other
resources

51
ARDA Implementation

Backends
Currently Oracle, PostgreSQL, SQLite
Two frontends
TCP Streaming
Chosen for performance
SOAP
Formal requirement of EGEE
Compare SOAP with TCP Streaming
Also implemented as standalone Python library
Data stored on filesystem

52
SOAP Toolkits performance

Test communication performance
No work done on the backend
Switched 100Mbits LAN
Language comparison
TCP-S with similar performance in all languages
SOAP performance varies strongly with toolkit
Protocols comparison
Keepalive improves performance significantly
On Java and Python, SOAP is several times slower
than TCP-S

1000 pings
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
High speed Computing

Infiniband
Analysis of SCTP and TCP based communication in
high-speed cluster
The apeNEXT Project
Optimisation of Lattice QCD codes for the Opteron
processor

60
InfiniBand Experiences at Forschungszentrum
Karlsruhe
Forschungszentrum Karlsruhein der
Helmholtz-Gemeinschaft

A. Heiss, U. Schwickerath

InfiniBand-Overview
Hardware setup at IWR
HPC applications
MPI performance
lattice QCD
LM
HTC applications
rfio
xrootd

Credits Inge Bischoff-Gauss Marc García
Martí Bruno Hoeft
Carsten Urbach
61
Lattice QCD Benchmark GE wrt/ InfiniBand

Memory and communi- cation intensive
application
Benchmark by
C. Urbach
See also CHEP04 talk given by A. Heiss

Significant speedup by using InfiniBand
Thanks to Carsten Urbach FU Berlin and DESY
Zeuthen
62
RFIO/IB Point-to-Point file transfers (64bit)
PCI-X and PCI-Express throughput
Notes
best results with PCI-Express gt 800MB/s raw
transfer speed gt 400MB/s file transfer speed
RFIO/IB see ACAT03 NIM A 534(2004) 130-134
Disclaimer on PPC64 Not an official IBM
Product. Technology Prototype. (see also
slide 5 and 6)
solid file transfers cache-gt/dev/null dashed
networkprotocol only
63
Xrootd and InfiniBand
Notes
First preliminary results

IPoIB notes
Dual Opteron V20z
Mellanox Gold drivers
SM on InfiniCon 9100
same nodes as for GE
Native IB notes
proof of concept version
based on Mellanox VAPI
using IB_SEND
dedicated send/recv buffers
same nodes as above
10GE notes
IBM xseries 345 nodes
Xeon 32bit, single CPU
1 and 2 GB RAM
2.66GHz clock speed
Intel PRO/10GbE LR cards
used for long distance tests

64
TCP vs. SCTP in high-speed cluster environment

Miklos Kozlovszky
Budapest University of Technology and Economics
BUTE

65
TCP vs. SCTP

Both
IPv4 IPv6 compatible
Reliable
Connection oriented
Offers acknowledged, error free, non-duplicated
transfer
Almost same Flow and Congestion Control

TCP SCTP
Byte stream oriented Message oriented
3 way handshake connection init 4 way handshake connection init (cookie)
Old (more than 20 years) Quite new (2000-)
Multihoming
Path-mtu discovery
66
Summary

SCTP inherited all the good features of TCP
SCTP want to behave like a next generation TCP
It is more secure than TCP, and has many
attractive feature (e.g.multihoming)
Theoretically it can work better than TCP, but
TCP is faster (yet poor implementations)
Well standardized, and can be useful for cluster

67
(No Transcript)
68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
(No Transcript)
74
(No Transcript)
75
My Impressions
76
Concerns

Only a small fraction of the Session I talks
correspond to the original spirit of the
AIHEP/ACAT Session I talks.
In particular, many of the GRID talks about
deployment and infrastructure should be given to
CHEP, not here.
The large LHC collaborations have their own ACAT
a few times/year.
The huge experiment software frameworks do not
encourage cross-experiments discussions or tools.
For the next ACAT, the key people involved in the
big experiments should work together to encourage
more talks or reviews.

77
Positive aspects

ACAT continues to be a good opportunity to meet
with other cultures. Innovation may come from
small groups or non HENP fields.
Contacts (even sporadic) with Session III or
plenary talks are very beneficial, in particular
to young people.

78
The Captain of Kopenick

Question to the audience ?
Is Friedrich Wilhelm Voigt (Captain of Kopenick)
an ancestor of Voigt, the father of the Voigt
function ?

Write a Comment

User Comments (0)

About PowerShow.com

Summary Session I PowerPoint PPT Presentation