Tier 1 - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Tier 1

Description:

Autoview 2000R (Avocent) 1 Analog 2 Digital (IP transport) output connections ... Avocent. Storage. Access to on-line data: DAS, NAS, SAN. 32 TB ( 70 TB this month) ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 22
Provided by: infn
Category:
Tags: avocent | tier

less

Transcript and Presenter's Notes

Title: Tier 1


1
Tier 1
  • Luca dellAgnello
  • INFN CNAF, Bologna
  • Workshop CCR
  • Paestum, 9-12 Giugno 2003

2
INFN Tier1
  • INFN computing facility for HNEP community
  • Location INFN-CNAF, Bologna (Italy)
  • One of the main nodes on GARR network
  • Ending prototype phase this year
  • Fully operational next year
  • Personnel 10 FTEs
  • Multi-experiment
  • LHC experiments, Virgo, CDF
  • BABAR (3rd quarter 2003)
  • Resources dynamically assigned to experiments
    according to their needs
  • Main (50) Italian resource for LCG
  • Coordination with Tier0 and other Tier1
    (management, security etc..)
  • Coordination with Italian tier2s, tier3s
  • Participation to grid test-beds (EDG,EDT,GLUE)
  • Participation to CMS, ATLAS, LHCb , Alice data
    challenge
  • GOC (deployment in progress)

3
Networking
  • CNAF interconnected to GARR-B backbone at 1 Gbps.
  • Giga-PoP co-located
  • GARR-B backbone at 2.5 Gbps.
  • LAN star topology
  • Computing elements connected via FE to rack
    switch
  • 3 Extreme Summit 48 FE 2 GE ports
  • 3 3550 Cisco 48 FE 2 GE ports
  • 8 Enterasys 48 FE 2GE ports
  • Servers connected to GE switch
  • 1 3Com L2 24 GE ports
  • Uplink via GE to core switch
  • Extreme 7i with 32 GE ports
  • ER16 Gigabit switch router Enterasys
  • Disk servers connected via GE to core switch.

4
GARR
LAN CNAF
1 Gbps
LAN TIER1
Switch-lanCNAF ()
LHCBSW1 ()
FarmSWG1 ()
Fcds2
Fcds3
Fcds1
() vlan tagging enabled
1 Gbps link
5
Vlan Tagging
  • Define VLANs across switches
  • Independent from switch brand (Standard 802.1q)
  • Adopted solution for complete granularity
  • To each switch port is associated one VLAN
    identifier
  • Each rack switch uplink propagates VLAN
    information
  • VLAN identifiers are propagated across switches
  • Each farm has its own VLAN
  • Avoid recabling (or physical moving) of hw to
    change the topology
  • Level 2 isolation of farms
  • Aid for enforcement of security measures
  • Possible to define multi-tag ports (for servers)

6
Computing units (1)
  • 160 1U rack-mountable Intel dual processor
    servers
  • 800 MHz 2.2 GHz
  • 160 1U bi-processors Pentium IV 2.4 GHz to be
    shipped this month
  • 1 switch per rack
  • 48 FastEthernet ports
  • 2 Gigabit uplinks
  • Interconnected to core switch via 2 couples of
    optical fibers
  • Also 4 UTP cables available
  • 1 network power control per rack
  • 380 V three-phase power as input
  • Outputs 3 independent 220 V lines
  • Completely programmable (permits gradual servers
    switching on).
  • Remotely manageable via web

7
Computing units (2)
  • OS Linux RedHat (6.2, 7.2, 7.3, 7.3.2)
  • Experiment specific library software
  • Goal have generic computing units
  • Experiment specific library software in standard
    position (e.g. /opt/cms)
  • Centralized installation system
  • LCFG (EDG WP4)
  • Integration with central Tier1 db (see below)
  • Each farm on a distinct VLAN
  • Moving from a farm to another a server changes IP
    address (not name)
  • Unique dhcp server on all VLANs
  • Support for DDNS (cr.cnaf.infn.it) in progress
  • Queue manages PBS
  • Not possible to have version Pro (only for edu)
  • Free version not flexible enough
  • Tests of integration with MAUI in progress

8
Tier1 Database
  • Resource database and management interface
  • Hw servers characteristics
  • Sw servers configuration
  • Servers allocation
  • Postgres database as back end
  • Web interface (apachemod_sslphp)
  • Possible direct access to db for some
    applications
  • Monitoring system
  • nagios
  • Interface to configure switches and interoperate
    with LCFG

9
Monitoring/Alarms
  • Monitoring system developed at CNAF
  • Socket server on each computer
  • Centralized collector
  • 100 variables collected every 5 minutes
  • Data archived on flat file
  • In progress XML structure for data archives
  • User interface http//tier1.cnaf.infn.it/monitor/
  • Next release JAVA interface (collaboration with
    D. Galli, LHCb)
  • Critical parameters periodically checked by
    nagios
  • Connectivity (i.e. ping), system load, bandwidth
    use, ssh daemon, pbs etc
  • User interface http//tier1.cnaf.infn.it/nagios/
  • In progress configuration interface

10
Remote control
  • KVM switches permit remote control of servers
    console
  • 2 models under test
  • Paragon UTM8 (Raritan)
  • 8 Analog (UTP/Fiber) output connections
  • Supports up to 32 daisy chains of 40 servers
    (UKVMSPD modules needed)
  • Costs 6 KEuro 125 Euro/server (UKVMSPD module)
  • IP-reach (expansion to support IP transport) 8
    KEuro
  • Autoview 2000R (Avocent)
  • 1 Analog 2 Digital (IP transport) output
    connections
  • Supports connections up to 16 servers
  • 3 switches needed for a standard rack
  • Costs 4.5 KEuro
  • NPCs (Network Power Control) permit remote and
    scheduled power cycling via snmp calls or web
  • Bid under evaluation

11
Raritan
12
Avocent
13
Storage
  • Access to on-line data DAS, NAS, SAN
  • 32 TB (gt 70 TB this month)
  • Data served via NFS v3
  • Test of several hw technologies (EIDE, SCSI, FC)
  • Bid for FC switch
  • Study of large file system solutions (gt2TB) and
    load balancing/failover architectures
  • GFS (load balancing)
  • Problems with lock server (better in hw?)
  • GPFS (load balancing, large file systems)
  • Not that easy to install and configure.
  • HA (failover)
  • SAN on WAN tests (collaboration with CASPUR)
  • Tests with PVFS (LHCb, Alice)

14
STORAGE CONFIGURATION
CLIENT SIDE (Gateway or all Farm must access
Storage)
CASTOR Serverstaging
WAN or TIER1 LAN
RAIDTEC 1800 Gbyte 2 SCSI interfaces
IDE NAS4 Nas4.cnaf.infn.it 1800Gbyte CDF LHCB
STK180 with 100 LTO (10Tbyte Native)
Fileserver CMS (or more in cluster or
HA) diskserv-cms-1.cnaf.infn.it
Fileserver Fcds3.cnaf.infn.it
FAIL-OVER support
FC Switch In order
PROCOM NAS2 Nas2.cnaf.infn.it 8100 Gbyte VIRGO
ATLAS
PROCOM NAS3 Nas3.cnaf.infn.it 4700 Gbyte ALICE
ATLAS
AXUS BROWIE Circa 2200 Gbyte 2 FC interface
DELL POWERVAULT 7100 Gbyte 2 FC interface
15
Mass Storage Resources
  • StorageTek library with 9840 and LTO drives
  • 180 tapes (100/200 GB each)
  • StorageTek L5500 with 2000-5000 slots in order
  • LTOv2 (200/400 GB each)
  • 6 I/O drives
  • 500 tapes ordered
  • CASTOR as front-end software for archiving
  • Direct access for end-users
  • Oracle as back-end

16
TAPE HARDWARE
17
CASTOR
  • Developed and maintained at CERN
  • Chosen as front-end for archiving
  • Features
  • Needs a staging area on disk ( 20 of tape)
  • ORACLE database as back-end for full capability
    (a MySQL interface is also included)
  • ORACLE database is under day-policy backup
  • Every client needs to install the CASTOR client
    packet (works on almost major OSs including
    Windows)
  • Access via rfio command
  • CNAF setup
  • Experiment access from TIER1 farms via rfio with
    UID/GID protection from single server
  • National Archive support via rfio with UID/GID
    protection from single server
  • Grid SE tested and working

18
CASTOR at CNAF
STK L180
2 drive 9840
LEGATO NSR (Backup)
SCSI
LAN
Robot access via SCSI
ACSLS
SCSI
CASTOR
4 drives LTO Ultrium
2 TB Staging Disk
19
New Location
  • The present location (at CNAF office level) is
    not suitable, mainly due to
  • Insufficient space.
  • Weight ( 700 kg./0.5 m2 for a standard rack with
    40 1U servers).
  • Moving to the final location (early) this summer.
  • New hall in the basement (-2nd floor) almost
    ready.
  • 1000 m2 of total space
  • Computers
  • Electric Power System (UPS, MPU)
  • Air conditioning system
  • Easily accessible with lorries from the road
  • Not suitable for office use (remote control)

20
Electric Power
  • 220 V mono-phase needed for computers.
  • 4 8 KW per standard rack (with 40
    bi-processors) ? 16-32 A.
  • 380 V three-phase for other devices (tape
    libraries, air conditioning etc..).
  • To avoid black-outs, Tier1 has standard
    protection systems.
  • Installed in the new location
  • UPS (Uninterruptible Power Supply).
  • Located in a separate room (conditioned and
    ventilated).
  • 800 KVA ( 640 KW).
  • Electric Generator.
  • 1250 KVA ( 1000 KW).
  • ? up to 80-160 racks.

21
Summary conclusions
  • INFN-TIER1 is closing the prototype phase
  • But still testing new technological solutions
  • Going to move the resources to the final location
  • Interoperation with grid projects (EDG,EDT,LCG)
  • Starting integration with LCG
  • Participating to CMS DC04
  • 70 computing servers
  • 4M events (40 of Italian commitment)
    1560(Tier0) TB of data (July to December 03)
  • Analysis of simulated events (January to February
    04)
  • Interoperation with Tier0 (CERN) and Tier2 (LNL)
Write a Comment
User Comments (0)
About PowerShow.com