ATLAS computing in Geneva - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

ATLAS computing in Geneva

Description:

53 workers, 5 file servers, 3 service nodes. 188 CPU cores in the workers ... Another power line is being laid for our 20 new worker nodes, the 'blades' ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 14
Provided by: dpncU
Category:

less

Transcript and Presenter's Notes

Title: ATLAS computing in Geneva


1
ATLAS computing in Geneva
  • Szymon Gadomski
  • description of the hardware
  • the functionality we need
  • the current status
  • list of projects

2
The cluster at Uni Dufour (1)
3
The cluster at Uni Dufour (2)
12 worker nodes in 2005
and 20 in 2007!
21 in 2006
4
The cluster at Uni Dufour (3)
three nodes for services (grid,
batch, storage abstraction)
direct line from CERN
power and network cabling of worker nodes
5
The cluster in numbers
  • 61 computers to manage
  • 53 workers, 5 file servers, 3 service nodes
  • 188 CPU cores in the workers
  • 75 ? 5 TB of disk storage
  • can burn up to 30 kW (power supply specs)

6
The functionality we need
  • our local cluster computing
  • log in and have an environment to work with ATLAS
    software, both offline and trigger
  • develop code, compile,
  • interact with ATLAS software repository at CERN
  • work with nightly releases of ATLAS software,
    normally not distributed off-site but visible on
    /afs
  • disk space
  • use of final analysis tools, in particular ROOT
  • a convenient way to run batch jobs
  • grid computing
  • tools to transfer data from CERN as well as from
    and to other Grid sites worldwide
  • ways to submit our jobs to other grid sites
  • a way for ATLAS colleagues to submit jobs to us

7
The system in production
  • 1 file server (1 if needed), 3 login machines
    and 18 batch worker nodes
  • 30 ATLAS people have accounts
  • ATLAS GE friends and relations
  • people rely on the service
  • maintenance of the system (0.3 FTE, top priority)
  • creation of user accounts,
  • web-based documentation for users,
  • installation of ATLAS releases,
  • maintenance of worker nodes, file servers and the
    batch system,
  • assistance to users executing data transfers to
    the cluster,
  • help with problems related to running of ATLAS
    software off-CERN-site, e,g. access to data bases
    at CERN, firewall issues e.t.c.
  • raid recovery from hardware failures.

Description at https//twiki.cern.ch/twiki/bin/vi
ew/Atlas/GenevaATLASClusterDescription
8
Our systemin the Grid
  • Geneva is in NorduGrid since 2005
  • In company of Berne and Manno (out Tier 2)

9
One recent setback
  • We used to have a system of up to 35 machines in
    production.
  • Problems with power to our racks since last
    August
  • A major blackout in Plainpalais area on August
    2nd UPS has gave up after 10 in the machine
    room all University services went down. A major
    disaster.
  • When recovering, we lost power again the next
    day. No explanation from the DINF.
  • Slightly smaller system in use since then. Power
    lost again on Friday Sept 7th.
  • Right now only a minimal service. Need to work
    together with the DINF, measure power consumption
    of our machines under full load. Also need to
    understand the limits of the infrastructure.
  • Another power line is being laid for our 20 new
    worker nodes, the blades. The power cut has
    nothing to do with that.

10
Things to do (and to research)
  • Configuration of worker nodes
  • configuration of the CERN Scientific Linux
    system,
  • torque batch system software,
  • other added software, as requested by the users.
  • General cluster management issues
  • security,
  • a way to install the system on multiple machines
    (three types of worker nodes),
  • automatic shutdown when UPS turns on,
  • monitoring of temperature, CPU use, network use.
  • Storage management
  • operating system for the SunFire X4500 file
    servers (SUN Solaris or CERN Scientific Linux),
  • a solution for storage management (e.g. dCache or
    DPM).
  • Grid nodes and grid software
  • configuration of the CERN Scientific Linux for
    the grid interface nodes,
  • choice and installation of a batch system,
  • choice and installation of grid middleware.
  • Tools for interactive use of multiple machines
    (e.g. PROOF, Ganga).
  • Grid job submission interfaces (e.g. Ganga,
    GridPilot)

11
Diagram of the system for 1st data
  • all the hardware
  • is in place,
  • (not all powered up)
  • some open
  • questions
  • biggest new issue
  • is storage
  • management
  • with multiple
  • servers

12
Summary
  • The ATLAS cluster in Geneva is a large Tier 3
  • now 188 workers CPU cores and 75 TB
  • not all hardware is integrated yet
  • A part of the system is in production
  • a Grid site since 2005, runs ATLAS simulation
    like a Tier 2, plan to continue that.
  • since Spring in constant interactive use by the
    Geneva group, plan to continue and to develop
    further. The group needs local computing.
  • Busy program for several months to have all
    hardware integrated. With a larger scale come new
    issues to deal with.

13
Comments about future evolution
  • Interactive work is vital.
  • Everyone needs to login somewhere.
  • The more we can do interactively, the better for
    our efficiency.
  • A larger fraction of the cluster will be
    available for login.
  • Plan to remain a Grid site.
  • Bern and Geneva have been playing a role of a
    Tier 2 in ATLAS. We plan to continue that.
  • Data transfer are too unreliable in ATLAS.
  • Need to find ways to make them work much better.
  • Data placement from FZK directly to Geneva would
    be welcome. No way to do that (LCGgtNorduGrid) at
    the moment..
  • Be careful with extrapolations from present
    experience. Real data volume will be 200x larger
    then a large Monte Carlo production.
Write a Comment
User Comments (0)
About PowerShow.com