Site Report - PowerPoint PPT Presentation

About This Presentation
Title:

Site Report

Description:

LHC experiments(Alice, Atlas, CMS, LHCb), Virgo, CDF, BABAR, AMS, MAGIC, ... Homogeneous characteristics. 48 Copper Ethernet ports. Support of main standards ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 40
Provided by: tracyb2
Learn more at: https://www.racf.bnl.gov
Category:

less

Transcript and Presenter's Notes

Title: Site Report


1
Site Report
  • Roberto Gomezel
  • INFN

2
Outline of Presentation
  • Computing Environment
  • Security
  • Services
  • Network
  • AFS
  • BBS
  • INFN Farms
  • Tier 1 at CNAF

3
Computing Environment and security
  • 95 of boxes are PCs running Linux or Windows
  • Mac OS boxes keep on living
  • Just a few commercial unix boxes only used for
    specific tasks or needs
  • VPNs available in many sites
  • Cisco boxes using IPsec
  • NetScreen boxes using IPsec
  • SSL VPNs are under evaluation
  • The use of SSL eliminates the need of installing
    client software
  • it enables instant access for users simply using
    a Web browser
  • Network Security
  • Dedicated Firewall machines just in a few sites
  • Implemented with access lists on router connected
    to WAN

INFN Site Report R.Gomezel
4
Desktop
  • PCs running Linux and Windows
  • Automatic installation using Kickstart for Linux
    and RIS for Windows
  • Metaframe Citrix or Vmware used to reduce the
    need to install Windows OS on all PCs for desktop
    applications
  • A few sites chose to outsource support for
    desktop environment due to lack of personnel

INFN Site Report R.Gomezel
5
Backup
  • Tape Libraries used
  • AIT2 a few sites
  • IBM Magstar just used at LNF
  • DLT, LTO wide spread
  • Backup tools
  • IBM Tivoli quite used
  • HP Omniback quite used
  • Atempo Time Navigator just a few sites
  • Domestic tool - widespread

INFN Site Report R.Gomezel
6
Wireless LAN
  • Access point running standard 802.11b,g
  • All sites are using wireless connection as
    meeting or conferences are running
  • Most of them use it to give connection to laptop
    computers
  • Security issues
  • Permission based on Secure Port filtering (MAC
    Address) poor security
  • No encryption used
  • Some sites are using 802.1X

INFN Site Report R.Gomezel
7
E-mail
  • Mail Transfer Agent
  • Sendmail widespread and more used (86)
  • Postfix a few sites (14)
  • But there is an increasing number of sites
    planning to move from sendmail to postfix
  • Hardware and OS

INFN Site Report R.Gomezel
8
E-mail user agent
  • All INFN sites provide an HTTP mail user agent
  • One-third uses IMP
  • One-third uses SQUIRREL
  • Others
  • IMHO, Open WebMail, CyrusRoxen
  • Other mail user agents
  • Pine, Internet Explorer, Mozilla

INFN Site Report R.Gomezel
9
E-mail antivirus
INFN Site Report R.Gomezel
10
E-mail antispam
  • 75 of INFN sites are using SPAM Assassin as tool
    to reduce junk e-mail
  • Some sites use RAV or Sophos
  • Just a few sites (5) are using nothing
  • An acl filter was set on port 25 in order to
    avoid that hosts not authorized can act as mail
    relay
  • Only authorized mail relay are allowed to send
    and receive mail for a specific site

INFN Site Report R.Gomezel
11
Security issues
Monitored by GARR-CERT
Incidents coming from INFN hosts (percentage)
  • Goal by the end 2004
  • define a new policy for ACL setting
  • Input filter default deny
  • services just on hosts checked very
    strictly
  • Output filter
  • port 25

INFN Site Report R.Gomezel
12
INFN network
  • LAN backbone network mainly based on Gigabit
    Ethernet
  • Layer 2 and 3 switching
  • No layer 4 switching
  • The INFN WAN network is completely integrated
    into the GARR, nation-wide infrastructure,
    providing a backbone connectivity at 2.5 Gigabit
  • POP typical access bandwidth for INFN sites
    34Mbps, 155 Mbps, Gigabit ethernet
  • There is a trend to have a Gigabit Ethernet
    access in any site with a bandwidth management
    through rate limiting mechanism (CAR) according
    to the needs of the specific site

INFN Site Report R.Gomezel
13
AFS
  • INFN sites keep on using AFS services to share
    data and software throughout sites
  • Most of local cells have completely moved server
    functionality to Linux boxes running OpenAFS
    software
  • Authentication and file server functionalities of
    the nation-wide cell INFN.IT are running on
    Linux boxes with OpenAFS
  • The migration of INFN.IT authentication servers
    from Kerberos IV to Kerberos V is expected to be
    accomplished by the end of the year

INFN Site Report R.Gomezel
14
BBS - Bologna Batch System
  • The Bologna Batch System (BBS) is a software tool
    that allows users from INFN Bologna to submit
    batch jobs to a set of well defined machines,
    from any INFN Bologna machines with Condor
    installed.
  • Collaboration between the C. S. Dept., Univ. of
    Wisconsin-Madison and the INFN Bologna.
  • Main features of BBS
  • Any executable can be submitted to the system
    (scripts, compiled and linked programs, etc.).
  • Two different 'queues' , short and long. Short
    and long jobs have a different priority (nice)
    when running on the same machine.
  • Short jobs may run for no longer than an hour,
    but run at a higher priority.
  • BBS tries to balance the load of the BBS CPUs. 

P.Mazzanti
15
BBS
Presently the system consists of 16 2-CPU
servers, Linux RedHat 9 and a single CPU machine.
7 machines are from ALICE experiment. BBS
machines belong to the large INFN WAN Pool they
may be accessed from outside when no BBS job is
running, while becoming IMMEDIATELY available
when a BBS job asks to be run. Only short jobs
will be accepted by the 7 ALICE machines if
submitted non ALICE group user.
P.Mazzanti
16
Aggregate jobs, daily
Aggregate jobs, weekly
P.Mazzanti
17
boi1.bo.infn.it daily Load
boi1.bo.infn.it weekly Load
P.Mazzanti
18
INFN Site Farm a new challenge
  • Some sites are planning to reconfigure and
    integrate computing facilities and local
    experiment-specific farm into a unique computing
    farm
  • Reason in order to avoid the increasing
    deployment of a lot of little and private farms
    for each single experiment in addition to the
    general purpose computing facility
  • Introduction of SAN infrastructure to connect
    storage systems and computing units
  • GFS file system is under evaluation as an
    efficient way of providing a cluster file sytem
    and volume manager
  • Interesting because it is part of the SL3
    distribution
  • A lot of work for designing a mechanism to
    provide computing resources to different
    experiments according to their needs in a dynamic
    way
  • We can learn from the experience coming from CNAF
    Tier1 and other Labs

INFN Site Report R.Gomezel
19
Hardware solutions for the Tier1 at CNAF
  • Luca dellAgnello
  • Stefano Zani
  • (INFN CNAF, Italy)

Luca dellAgnello -Stefano Zani
20
Tier1
  • INFN computing facility for HEP community
  • Ending prototype phase last year, now fully
    operational
  • Location INFN-CNAF, Bologna (Italy)
  • One of the main nodes on GARR network
  • Personnel 10 FTEs
  • 3 FTE's dedicated to experiments
  • Multi-experiment
  • LHC experiments(Alice, Atlas, CMS, LHCb), Virgo,
    CDF, BABAR, AMS, MAGIC, ...
  • Resources dynamically assigned to experiments
    according to their needs
  • 50 of the Italian resource for LCG
  • Participation to experiments data challenge
  • Integrated with Italian Grid
  • Resources accessible also in traditional way

Luca dellAgnello -Stefano Zani
21
Logistics
  • Moved to a new location (last January)
  • Hall in the basement (-2nd floor)
  • 1000 m2 of total space
  • Computing Nodes
  • Storage Devices
  • Electric Power System (UPS)
  • Cooling and Air conditioning system
  • Garr GPop
  • Easily accessible with lorries from the road
  • Not suitable for office use (remote control
    needed)

Luca dellAgnello -Stefano Zani
22
Electric Power
  • Electric Power Generator
  • 1250 KVA ( 1000 KW)
  • ? up to 160 racks
  • Uninterruptible Power Supply (UPS)
  • Located into a separate room (conditioned and
    ventilated)
  • 800 KVA ( 640 KW)
  • 380 V three-phase distributed to all racks
    (Blindo)
  • Rack power controls output 3 independent 220 V
    lines for computers
  • Rack power controls sustain burden up to 16 or 32
    A
  • 32 A power controls needed for Xeon 36
    bi-processors racks
  • 3 APC power distribution modules (24 outlets
    each)

Luca dellAgnello -Stefano Zani
23
Cooling Air Conditioning
  • RLS (Airwell) on the roof
  • 700 KW
  • Water cooling
  • Need booster pump (20 mts T1 ?? roof)
  • Noise insulation
  • 1 Air Conditioning Unit (uses 20 of RLS
    refreshing power and controls humidity)
  • 12 Local Cooling Systems (Hiross) in the
    computing room

Luca dellAgnello -Stefano Zani
24
WN typical Rack Composition
  • Power Controls (3U)
  • 1 network switch (1-2U)
  • 48 FE copper interfaces
  • 2 GE fiber uplinks
  • 34-36 1U WNs
  • Connected to network switch via FE
  • Connected to KVM system

Luca dellAgnello -Stefano Zani
25
Remote console control
  • Paragon UTM8 (Raritan)
  • 8 Analog (UTP/Fiber) output connections
  • Supports up to 32 daisy chains of 40 nodes
    (UKVMSPD modules needed)
  • Costs 6 KEuro 125 Euro/server (UKVMSPD module)
  • IP-reach (expansion to support IP transport)
    evaluted but not used
  • Autoview 2000R (Avocent)
  • 1 Analog 2 Digital (IP transport) output
    connections
  • Supports connections up to 16 nodes
  • Optional expansion to 16x8 nodes
  • Compatible with Paragon (gateway to IP)

Luca dellAgnello -Stefano Zani
26
Networking (1)
  • Main Network infrastructure based on optical
    fibres ( 20 Km)
  • To ease adoption of new (High Performances)
    transmission technologies
  • To insure a better electrical insulation on long
    distances
  • Local (Rack wide) links with UTP (copper) cables
  • LAN has a classical star topology
  • GE core switch (Enterasys ER16)
  • NEW core switch (Black Diamond 10808 ) is in pre
    production
  • 120 Gbit Fiber (Scale up to 480 ports)
  • 12 10 Gbit Ethernet (Scale up to max 48 ports)
  • Farms up-link via GE trunk (Channel) to core
    switch
  • Disk Servers directly connected to GE switch
    (mainly fibre)

Luca dellAgnello -Stefano Zani
27
Networking (2)
  • WN's connected via FE to rack switch (1 switch
    per rack)
  • Not a single brand for switches (as for wn's)
  • 3 Extreme Summit 48 FE 2 GE ports
  • 3 3550 Cisco 48 FE 2 GE ports
  • 8 Enterasys 48 FE 2GE ports
  • 10 switch Summit400 48 GE copper 2 GE ports
    (2x10Gb ready)
  • Homogeneous characteristics
  • 48 Copper Ethernet ports
  • Support of main standards (e.g. 802.1q)
  • 2 Gigabit up-links (optical fibers) to core
    switch
  • CNAF interconnected to GARR-G backbone at 1 Gbps.

Luca dellAgnello -Stefano Zani
28
Network Configuration
Internal services
1 Gb/s
SSR8600
1st Floor
F.C.
F.C.
F.C.
F.C.
F.C.
FarmSWG2
Disk Servers
F.C.
131.154.99.121
T1
S.Zani
29
L2 Configuration
  • Each Experiment has its own VLAN
  • Solution adopted for complete granularity
  • Port based VLAN
  • VLAN identifiers are propagated across switches
    (802.1q)
  • Avoid recabling (or physical moving) of machines
    to change farm topology
  • Level 2 isolation of farms
  • Possibility to define multi-tag (Trunk) ports
    (for servers)

Luca dellAgnello -Stefano Zani
30
Power Switches
  • 2 models used at Tier1
  • Old APC MasterSwitch Control Unit AP9224
    controlling 3x8 outlets 9222 PDU from 1 Ethernet
  • New APC PDU Control Unit AP7951 controlling 24
    outlets from 1 Ethernet
  • zero Rack Unit (vertical mount)
  • Access to the configuration/control menu via
    serial/telnet/web/snmp
  • 1 Dedicated machine running APC Infrastruxure
    Manager Software (in progress)

Luca dellAgnello -Stefano Zani
31
Remote Power Distribution Unit
Screenshot of APC Infrastruxure Manager Software
with the status of all TIER1 PDU
Luca dellAgnello -Stefano Zani
32
Computing units
  • 800 1U rack-mountable Intel dual processor
    servers
  • 800 MHz 3.06 GHz
  • 700 wns ( 1400 CPUs) available for LCG
  • Tendering
  • HPC farm with MPI
  • Servers interconnected via Infiniband
  • Opteron farm (near future)

Luca dellAgnello -Stefano Zani
33
Storage Resources
  • 200 TB RAW Disk Space ON LINE.
  • NAS
  • NAS1NAS4 (3Ware low cost) Tot 4.2 TB
  • NAS2NAS3 (Procom) Tot 13.2 TB
  • SAN
  • Dell Powervault 660f Tot 7 TB
  • Axus (Brownie) Tot 2 TB
  • STK Bladestore Tot 9 TB
  • Infortrend ES A16F-R Tot 12 TB
  • IBM Fast-T 900 Tot 150 TB

Luca dellAgnello -Stefano Zani
34
STORAGE resource
CLIENT SIDE
STK180 with 100 LTO (10Tbyte Native)
CASTOR Serverstaging
WAN or TIER1 LAN
RAIDTEC 1800 Gbyte 2 SCSI interfaces
IDE NAS1,NAS4 Nas4.cnaf.infn.it 18002000
Gbyte CDF LHCB
STK L5500 robot (max 5000) 6 LTO-2
Gadzoox Slingshot FC Switch 18 port
Fileserver CMS diskserv-cms-1
Fileserver Fcds2 Alias diskserv-ams-1
diskserv-atlas-1
Infortrend ES A16F-R 12 TB
PROCOM NAS2 Nas2.cnaf.infn.it 8100 Gbyte VIRGO
ATLAS
PROCOM NAS3 Nas3.cnaf.infn.it 4700 Gbyte ALICE
ATLAS
AXUS BROWIE Circa 2200 GByte 2 FC interface
DELL POWERVAULT 7100 GByte 2 FC interface
STK BladeStore Circa 10000 GByte 4 FC interface
FAIL-OVER support
Luca dellAgnello -Stefano Zani
35
Storage management and access (1)
  • Tier1 storage resources accessible as classical
    storage or via grid
  • Non grid disk storage accessible via NFS
  • Generic WNs also have AFS client
  • NFS mount volumes configured via autofs and ldap
  • unique configuration repository eases
    maintenance
  • in progress integration of ldap configuration
    with Tier1 db data
  • Scalability issues with NFS
  • Experienced stalled mount points

Luca dellAgnello -Stefano Zani
36
Storage management and access (2)
  • Part of disk storage used as front-end to CASTOR
  • Balance between disk and CASTOR according to
    experiments needs
  • 1 stager for each experiment (installation in
    progress)
  • CASTOR accessible both directly or via grid
  • CASTOR SE available
  • ALICE Data Challenge used CASTOR architecture
  • Feedback to CASTOR team
  • Need optimization for file restaging

Luca dellAgnello -Stefano Zani
37
Tier1 Database
  • Resource database and management interface
  • Postgres database as back end
  • Web interface (apachemod_sslphp)
  • Hw servers characteristics
  • Sw servers configuration
  • Servers allocation
  • Possible direct access to db for some
    applications
  • Monitoring system
  • Nagios
  • Interface to configure switches and interoperate
    with installation system.

Luca dellAgnello -Stefano Zani
38
Installation issues
  • Centralized installation system
  • LCFG (EDG WP4)
  • Integration with a central Tier1 db
  • Moving from a farm to another implies just
    changes in IP address (not name)
  • Unique dhcp server for all VLANs
  • Support for DDNS (cr.cnaf.infn.it)
  • Investigating Quattor for future needs

Luca dellAgnello -Stefano Zani
39
Our Desired Solution for Resource Access
  • SHARED RESOURCES among all experiments
  • Priorities and reservations managed by the
    scheduler
  • Most of Tier1 computing machines installed as LCG
    Worker Nodes, with light modifications to support
    more VOs
  • Application Software not directly installed on
    WNs but accessed from outside (NFS, AFS, )
  • One or more Resource Manager to manage all the
    WNs in a centralized way
  • Standard way to access Storage for each
    application

Luca dellAgnello -Stefano Zani
Write a Comment
User Comments (0)
About PowerShow.com