Title: Soluzioni HW per il Tier 1 al CNAF
1Soluzioni HW per il Tier 1 al CNAF
- Luca dellAgnello
- Stefano Zani
- (INFN CNAF, Italy)
- III CCR Workshop
- May 24-27 2004
2Tier1
- INFN computing facility for HEP community
- Ending prototype phase last year, now fully
operational - Location INFN-CNAF, Bologna (Italy)
- One of the main nodes on GARR network
- Personnel 10 FTEs
- 3 FTE's dedicated to experiments
- Multi-experiment
- LHC experiments(Alice, Atlas, CMS, LHCb), Virgo,
CDF, BABAR, AMS, MAGIC, ... - Resources dynamically assigned to experiments
according to their needs - 50 of the Italian resource for LCG
- Participation to experiments data challenge
- Integrated with Italian Grid
- Resources accessible also in traditional way
3 Logistics
- Recently moved to a new location (last January)
- Hall in the basement (-2nd floor)
- 1000 m2 of total space
- Computing Nodes
- Storage Devices
- Electric Power System (UPS)
- Cooling and Air conditioning system
- Garr GPop
- Easily accessible with lorries from the road
- Not suitable for office use (remote control
needed)
4Electric Power
- Electric Power Generator
- 1250 KVA ( 1000 KW)
- ? up to 160 racks
- Uninterruptible Power Supply (UPS)
- Located into a separate room (conditioned and
ventilated) - 800 KVA ( 640 KW)
- 380 V three-phase distributed to all racks
(Blindo) - Rack power controls output 3 independent 220 V
lines for computers - Rack power controls sustain burden up to 16 or 32
A - 32 A power controls needed for Xeon 36
bi-processors racks - 3 APC power distribution modules (24 outlets
each) - Completely programmable (allows gradual servers
switching on) - Remotely manageable via web
- 380 V three-phase for other devices (tape
libraries, air conditioning, etc)
5Cooling Air Conditioning
- RLS (Airwell) on the roof
- 700 KW
- Water cooling
- Need booster pump (20 mts T1 ?? roof)
- Noise insulation
- 1 Air Conditioning Unit (uses 20 of RLS
refreshing power and controls humidity) - 12 Local Cooling Systems (Hiross) in the
computing room
6WN typical Rack Composition
- Power Controls (3U)
- 1 network switch (1-2U)
- 48 FE copper interfaces
- 2 GE fiber uplinks
- 34-36 1U WNs
- Connected to network switch via FE
- Connected to KVM system
7Remote console control
- Paragon UTM8 (Raritan)
- 8 Analog (UTP/Fiber) output connections
- Supports up to 32 daisy chains of 40 nodes
(UKVMSPD modules needed) - Costs 6 KEuro 125 Euro/server (UKVMSPD module)
- IP-reach (expansion to support IP transport)
evaluted but not used - Autoview 2000R (Avocent)
- 1 Analog 2 Digital (IP transport) output
connections - Supports connections up to 16 nodes
- Optional expansion to 16x8 nodes
- Compatible with Paragon (gateway to IP)
- Evaluating Cyclades Alterpath KVM via serial line
(cheaper)
8Networking (1)
- Main Network infrastructure based on optical
fibres ( 20 Km) - To ease adoption of new (High Performances)
transmission technologies - To insure a better electrical insulation on long
distances - Local (Rack wide) links with UTP (copper) cables
- LAN has a classical star topology
- GE core switch (Enterasys ER16)
- NEW core switch is going to be shipped (Next
july) - 120 Gbit Fiber (Scale up to 480 ports)
- 12 10 Gbit Ethernet (Scale up to max 48 ports)
- Farms up-link via GE trunk (Channel) to core
switch - Disk Servers directly connected to GE switch
(mainly fibre)
9Networking (2)
- WN's connected via FE to rack switch (1 switch
per rack) - Not a single brand for switches (as for wn's)
- 3 Extreme Summit 48 FE 2 GE ports
- 3 3550 Cisco 48 FE 2 GE ports
- 8 Enterasys 48 FE 2GE ports
- 7 switch Summit400 48 GE copper 2 GE ports
(2x10Gb ready) - Homogeneous characteristics
- 48 Copper Ethernet ports
- Support of main standards (e.g. 802.1q)
- 2 Gigabit up-links (optical fibers) to core
switch - CNAF interconnected to GARR-G backbone at 1 Gbps.
- Giga-PoP co-located
- 2 x 1 Gbps test links to CERN, Karlsruhe
10Network Configuration
Internal services
SSR8600
1st Floor
F.C.
F.C.
F.C.
F.C.
F.C.
FarmSWG2
Disk Server
F.C.
131.154.99.121
T1
S.Zani
11L2 Configuration
- Each Experiment has its own VLAN
- Solution adopted for complete granularity
- Port based VLAN
- VLAN identifiers are propagated across switches
(802.1q) - Avoid recabling (or physical moving) of machines
to change farm topology - Level 2 isolation of farms
- Possibility to define multi-tag (Trunk) ports
(for servers)
12Power Switches
- 2 models used at Tier1
- Old APC MasterSwitch Control Unit AP9224
controlling 3x8 outlets 9222 PDU from 1 Ethernet - New APC PDU Control Unit AP7951 controlling 24
outlets from 1 Ethernet - zero Rack Unit (vertical mount)
- Access to the configuration/control menu via
serial/telnet/web/snmp - 1 Dedicated machine running APC Infrastruxure
Manager Software (in progress) - See also http//www.cnaf.infn.it/cnafdoc/CD0044.d
oc
13Remote Power Distribution Unit
Screenshot of APC Infrastruxure Manager Software
with the status of all TIER1 PDU
14Computing units
- 400 1U rack-mountable Intel dual processor
servers - 800 MHz 2.4 GHz
- 240 wns ( 480 CPUs) available for LCG
- To be shipped June 2004
- 32 1U bi-processors Pentium 2.4 GHz
- 350 1U bi-processors Pentium IV 3.06 GHz
- 2 x 120 GB HDs
- 4 GB RAM
- 2159 euro each
- Tendering
- HPC farm with MPI
- Servers interconnected via Infiniband
- Opteron farm (near future)
- To allow experiments to test their software on
AMD architecture
15Storage Resources
- 50 TB RAW Disk Space ON LINE.
- NAS
- NAS1NAS4 (3Ware low cost) Tot 4.2 TB
- NAS2NAS3 (Procom) Tot 13.2 TB
- SAN
- Dell Powervault 660f Tot 7 TB
- Axus (Brownie) Tot 2 TB
- STK Bladestore Tot 9 TB
- Infortrend ES A16F-R Tot 12 TB
- IBM Fast-T 900 (in few weeks) Tot 150 TB
- See also
- http//www.lnf.infn.it/sis/preprint/pdf/INFN-TC-03
-19.pdf
16STORAGE resource
CLIENT SIDE
STK180 with 100 LTO (10Tbyte Native)
CASTOR Serverstaging
WAN or TIER1 LAN
RAIDTEC 1800 Gbyte 2 SCSI interfaces
IDE NAS1,NAS4 Nas4.cnaf.infn.it 18002000
Gbyte CDF LHCB
STK L5500 robot (max 5000) 6 LTO-2
Gadzoox Slingshot FC Switch 18 port
Fileserver CMS diskserv-cms-1
Fileserver Fcds2 Alias diskserv-ams-1
diskserv-atlas-1
Infortrend ES A16F-R 12 TB
PROCOM NAS2 Nas2.cnaf.infn.it 8100 Gbyte VIRGO
ATLAS
PROCOM NAS3 Nas3.cnaf.infn.it 4700 Gbyte ALICE
ATLAS
AXUS BROWIE Circa 2200 GByte 2 FC interface
DELL POWERVAULT 7100 GByte 2 FC interface
STK BladeStore Circa 10000 GByte 4 FC interface
FAIL-OVER support
17Storage management and access (1)
- Tier1 storage resources accessible as classical
storage or via grid - Non grid disk storage accessible via NFS
- Generic WNs also have AFS client
- NFS mount volumes configured via autofs and ldap
- unique configuration repository eases
maintenance - in progress integration of ldap configuration
with Tier1 db data - Scalability issues with NFS
- Experienced stalled mount points
- Recent nfs versions use synchronous export
needed to revert to async and use reduced rsize
and wsize to avoid huge amount of retransmissions
18Storage management and access (2)
- Part of disk storage used as front-end to CASTOR
- Balance between disk and CASTOR according to
experiments needs - 1 stager for each experiment (installation in
progress) - CASTOR accessible both directly or via grid
- CASTOR SE available
- ALICE Data Challenge used CASTOR architecture
- Feedback to CASTOR team
- Need optimization for file restaging
19Tier1 Database
- Resource database and management interface
- Postgres database as back end
- Web interface (apachemod_sslphp)
- Hw servers characteristics
- Sw servers configuration
- Servers allocation
- Possible direct access to db for some
applications - Monitoring system
- Nagios
- Interface to configure switches and interoperate
with installation system. - Vlan tags
- dns
- dhcp
20Installation issues
- Centralized installation system
- LCFG (EDG WP4)
- Integration with a central Tier1 db
- Moving from a farm to another implies just
changes in IP address (not name) - Unique dhcp server for all VLANs
- Support for DDNS (cr.cnaf.infn.it)
- Investigating Quattor for future needs
21Our Desired Solution for Resource Access
- SHARED RESOURCES among all experiments
- Priorities and reservations managed by the
scheduler - Most of Tier1 computing machines installed as LCG
Worker Nodes, with light modifications to support
more VOs - Application Software not directly installed on
WNs but accessed from outside (NFS, AFS, ) - One or more Resource Manager to manage all the
WNs in a centralized way - Standard way to access Storage for each
application