Running the multi-platform, multi-experiment cluster at CCIN2P3 - PowerPoint PPT Presentation

About This Presentation
Title:

Running the multi-platform, multi-experiment cluster at CCIN2P3

Description:

... (all in the STK robots): 3490 DLT4000/7000 9840 (Eagles) Limited support for Redwood HPSS ... IN2P3 18 physics labs (in all big towns in France) ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 30
Provided by: WojcikW7
Category:

less

Transcript and Presenter's Notes

Title: Running the multi-platform, multi-experiment cluster at CCIN2P3


1
Running the multi-platform, multi-experiment
cluster at CCIN2P3
  • Wojciech A. Wojcik

IN2P3 Computing Center
e-mail wojcik_at_in2p3.fr URL
http//webcc.in2p3.fr
2
IN2P3 Computer Center
  • Provides the computing and data services for the
    French high energy and nuclear physicists
  • IN2P3 18 physics labs (in all big towns in
    France)
  • CEA/DAPNIA
  • French groups are involved in 35 experiments at
    CERN, SLAC, FNAL, BNL, DESY and other sites (also
    astrophysics).
  • Specific situation our CC is not directly
    connected to experimental facilities, like CERN,
    FNAL, SLAC, DESY, BNL.

3
General rules
  • All groups/experiments share the same interactive
    and batch (BQS) clusters and other type of
    services (disk servers, tapes, HPSS and
    networking). Some exceptions later
  • /usr/bin and lib (OS and compilers) are local
  • /usr/local/ on AFS, specific for each platform
  • /scratch local tmp disk space
  • System, group and user profiles define the proper
    environment

4
General rules
  • User has the AFS account with access to the
    following AFS disk spaces
  • HOME - backup by CC
  • THRONG_DIR (up to 2GB) - backup by CC
  • GROUP_DIR (n 2GB), no backup
  • Data are on disks (GROUP_DIR, Objectivity),
    tapes (xtage system) or in HPSS
  • Data exchange on the following media
  • DLT, 9480
  • Network (bbftp)
  • ssh/ssf - access to/from external domains
    recommended.

5
Supported platforms
  • Supported platforms
  • Linux (RedHat 6.1, kernel 2.2.17-14smp) with
    different egcs compilers (gcc 2.91.66, gcc
    2.91.66 with patch for Objy 5.2, gcc 2.95.2
    installed on /usr/local), requested by different
    experiments
  • Solaris 2.6, 2.7 soon
  • AIX 4.3.2
  • HP-UX 10.20 end of this service already
    announced

6
Support for experiments
  • About 35 different High Energy, Astrophysics and
    Nuclear Physics experiments.
  • LHC experiments CMS, Atlas, Alice and LHCb.
  • Big non-CERN experiments BaBar, D0, STAR,
    PHENIX, AUGER, EROS II.

7
(No Transcript)
8
(No Transcript)
9
Disk space
  • Need to make the disk storage independent of the
    operating system.
  • Disk servers based on
  • A3500 from Sun with 3.4 TB
  • VSS from IBM with 2.2 TB
  • ESS from IBM with 7.2 TB
  • 9960 from Hitachi with 21.0 TB

10
Mass storage
  • Supported medias (all in the STK robots)
  • 3490
  • DLT4000/7000
  • 9840 (Eagles)
  • Limited support for Redwood
  • HPSS local developments
  • Interface with RFIO
  • API C, Fortran (via cfio from CERNLIB)
  • API C (iostream)
  • bbftp secure parallel ftp using RFIO interface

11
Mass storage
  • HPSS test and production services
  • HPSS_TEST_SERVER/hpsstest/in2p3.fr/
  • HPSS_SERVER/hpss/in2p3.fr/
  • HPSS usage
  • BaBar - usage via ams/oofs and RFIO
  • EROS II already 1.6 TB in HPSS
  • AUGER, D0, ATLAS, LHCb
  • Other experiments on tests SNovae, DELPHI,
    ALICE, PHENIX, CMS

12
Networking - LAN
  • Fast Ethernet (100 Mb full duplex) --gt to
    interactive and batch services
  • Giga Ethernet (1 Gb full duplex) --gt to disk
    servers and Objectivity/DB server

13
Networking - WAN
  • Academic public network Renater 2 based on
    virtual networking (ATM) with guaranteed
    bandwidth (VPN on ATM)
  • Lyon ?? CERN at 34Mb (155 Mb in June 2001)
  • Lyon ?? US is going through CERN
  • Lyon ?? Esnet (via STAR TAP), 30-40 Mb, reserved
    for the traffic to/from ESnet, except FNAL.

14
BAHIA - interactive front-end
  • Based on multi-processors
  • Linux (RedHat 6.1) -gt 10 PentiumII450 12
    PentiumIII1GHz (2 processors)
  • Solaris 2.6 -gt 4 Ultra-4/E450
  • Solaris 2.7 -gt 2 Ultra-4/E450
  • AIX 4.3.2 -gt 6 F40
  • HP-UX 10.20 -gt 7 HP9000/780/J282

15
Batch system - BQS
  • Batch based on BQS (CCIN2P3 product)
  • In constant development, used since 7 years
  • Posix compliant, platform independent (portable)
  • Possibilities to define the resources for the job
    (the class of job is calculated by scheduler as a
    function of)
  • CPU time, memory
  • CPU bound or I/O bound
  • Platform(s)
  • System resources local scratch disk, stdin/out
    size
  • User resources (switches, counters)

16
Batch system - BQS
  • Scheduler takes into account
  • Targets for groups (declared twice a year for the
    big production runs)
  • Consumption of cpu time in last periods month,
    week, day for user and group
  • Proper aging and interleave in the class queues
  • Possibility to open the worker for any
    combination of classes.

17
Batch system - configuration
  • Linux (RedHat 6.1) -gt 96 dual PIII 750MHz 110
    dual PIII1GHz
  • Solaris 2.6 -gt 25 Ultra60
  • Solaris 2.7 -gt 2 Ultra60 (test service)
  • AIX 4.3.2 -gt 29 RS390 20 43P-B50
  • HP-UX 10.20 -gt 52 HP9000/780

18
Batch system cpu usage
19
Batch system Linux cluster
20
Regional Center for
  • EROS II (Expérience de Recherches dObjets
    Sombres par effet de lentilles gravitationnelles)
  • BaBar
  • Auger (PAO)
  • D0

21
EROS II
  • Raw data (from ESO site in Chili) on DLTs (tar
    format).
  • Restructuring of the data from DLT to 3490 or
    9480, creation of metadata on Oracle DB.
  • Data server (on development) - 7TB of data
    actually, 20TB at the end of experiment using
    HPSS WEB server.

22
BaBar
  • AIX and HP-UX not supported by BaBar, Solaris 2.6
    with Workshop 4.2 and Linux (RedHat 6.1). Solaris
    2.7 in preparation.
  • Data are stored in ObjectivityDB, import/export
    of data is done using bbftp. The import/export on
    the tapes has been abandoned.
  • Objectivity (ams/oofs) servers (dedicated only to
    BaBar) have been installed (10 servers).
  • Usage of HPSS for staging the ObjectivityDB files.

23
Experiment PAO
24
PAO - sites
25
PAO - AUGER
  • CCIN2P3 is acting as AECC (AUGER European CC).
  • Access granted to all AUGER users (AFS accounts
    provided).
  • CVS repository for AUGER software has been
    installed at CCIN2P3, access from AFS (from the
    local and non-local cells) and from non-AFS
    environment using ssh.
  • Linux is the preferred platform.
  • Simulation software based on Fortran programs.

26
D0
  • Linux is one of D0 supported platforms and is
    available at CCIN2P3.
  • D0 software is using the KAI C compiler
  • Import/export of D0 data (using internal Enstore
    format) is a complicated work. We will try to use
    the bbftp as a file transfer program.

27
Import/export
CERN CASTOR HPSS
SLAC HPSS
CCIN2P3
?
?
?
?
HPSS
FNAL ENSTORE SAM
BNL HPSS
28
Problems
  • To add the new Objy servers (for other
    experiments) is very complicated. It needs the
    new separate machines, with modified port numbers
    in /etc/services. Under development for CMS.
  • The OS system versions and levels
  • The compilers versions (mainly for Objy for
    different experiments).
  • Solutions?

29
Conclusions
  • The data exchange should be done using the
    standards (e.g. files or tapes) and common access
    interfaces (bbftp and rfio are the good
    examples).
  • Needs for better coordination and similar
    requirements on supported system and compiler
    levels between experiments.
  • The choice of the CASE technologie is out of the
    control of our CC acting as Regional Computer
    Center ?.
  • GRID will require more uniform configuration of
    the distributed elements.
  • Who can help? HEPCCC? HEPiX? GRID?
Write a Comment
User Comments (0)
About PowerShow.com