ATLAS computing in Geneva - PowerPoint PPT Presentation

1 / 13

About This Presentation

Title:

ATLAS computing in Geneva

Description:

53 workers, 5 file servers, 3 service nodes. 188 CPU cores in the workers ... Another power line is being laid for our 20 new worker nodes, the 'blades' ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 14

Provided by: dpncU

Category:

more less

Transcript and Presenter's Notes

Title: ATLAS computing in Geneva

1
ATLAS computing in Geneva

Szymon Gadomski

description of the hardware
the functionality we need
the current status
list of projects

2
The cluster at Uni Dufour (1)
3
The cluster at Uni Dufour (2)
12 worker nodes in 2005
and 20 in 2007!
21 in 2006
4
The cluster at Uni Dufour (3)
three nodes for services (grid,
batch, storage abstraction)
direct line from CERN
power and network cabling of worker nodes
5
The cluster in numbers

61 computers to manage
53 workers, 5 file servers, 3 service nodes
188 CPU cores in the workers
75 ? 5 TB of disk storage
can burn up to 30 kW (power supply specs)

6
The functionality we need

our local cluster computing
log in and have an environment to work with ATLAS
software, both offline and trigger
develop code, compile,
interact with ATLAS software repository at CERN
work with nightly releases of ATLAS software,
normally not distributed off-site but visible on
/afs
disk space
use of final analysis tools, in particular ROOT
a convenient way to run batch jobs
grid computing
tools to transfer data from CERN as well as from
and to other Grid sites worldwide
ways to submit our jobs to other grid sites
a way for ATLAS colleagues to submit jobs to us

7
The system in production

1 file server (1 if needed), 3 login machines
and 18 batch worker nodes
30 ATLAS people have accounts
ATLAS GE friends and relations
people rely on the service
maintenance of the system (0.3 FTE, top priority)
creation of user accounts,
web-based documentation for users,
installation of ATLAS releases,
maintenance of worker nodes, file servers and the
batch system,
assistance to users executing data transfers to
the cluster,
help with problems related to running of ATLAS
software off-CERN-site, e,g. access to data bases
at CERN, firewall issues e.t.c.
raid recovery from hardware failures.

Description at https//twiki.cern.ch/twiki/bin/vi
ew/Atlas/GenevaATLASClusterDescription
8
Our systemin the Grid

Geneva is in NorduGrid since 2005
In company of Berne and Manno (out Tier 2)

9
One recent setback

We used to have a system of up to 35 machines in
production.
Problems with power to our racks since last
August
A major blackout in Plainpalais area on August
2nd UPS has gave up after 10 in the machine
room all University services went down. A major
disaster.
When recovering, we lost power again the next
day. No explanation from the DINF.
Slightly smaller system in use since then. Power
lost again on Friday Sept 7th.
Right now only a minimal service. Need to work
together with the DINF, measure power consumption
of our machines under full load. Also need to
understand the limits of the infrastructure.
Another power line is being laid for our 20 new
worker nodes, the blades. The power cut has
nothing to do with that.

10
Things to do (and to research)

Configuration of worker nodes
configuration of the CERN Scientific Linux
system,
torque batch system software,
other added software, as requested by the users.
General cluster management issues
security,
a way to install the system on multiple machines
(three types of worker nodes),
automatic shutdown when UPS turns on,
monitoring of temperature, CPU use, network use.
Storage management
operating system for the SunFire X4500 file
servers (SUN Solaris or CERN Scientific Linux),
a solution for storage management (e.g. dCache or
DPM).
Grid nodes and grid software
configuration of the CERN Scientific Linux for
the grid interface nodes,
choice and installation of a batch system,
choice and installation of grid middleware.
Tools for interactive use of multiple machines
(e.g. PROOF, Ganga).
Grid job submission interfaces (e.g. Ganga,
GridPilot)

11
Diagram of the system for 1st data

all the hardware
is in place,
(not all powered up)
some open
questions
biggest new issue
is storage
management
with multiple
servers

12
Summary

The ATLAS cluster in Geneva is a large Tier 3
now 188 workers CPU cores and 75 TB
not all hardware is integrated yet
A part of the system is in production
a Grid site since 2005, runs ATLAS simulation
like a Tier 2, plan to continue that.
since Spring in constant interactive use by the
Geneva group, plan to continue and to develop
further. The group needs local computing.
Busy program for several months to have all
hardware integrated. With a larger scale come new
issues to deal with.

13
Comments about future evolution

Interactive work is vital.
Everyone needs to login somewhere.
The more we can do interactively, the better for
our efficiency.
A larger fraction of the cluster will be
available for login.
Plan to remain a Grid site.
Bern and Geneva have been playing a role of a
Tier 2 in ATLAS. We plan to continue that.
Data transfer are too unreliable in ATLAS.
Need to find ways to make them work much better.
Data placement from FZK directly to Geneva would
be welcome. No way to do that (LCGgtNorduGrid) at
the moment..
Be careful with extrapolations from present
experience. Real data volume will be 200x larger
then a large Monte Carlo production.