Infrastructure and Provisioning at the Fermilab High Density Computing Facility - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Infrastructure and Provisioning at the Fermilab High Density Computing Facility

Description:

Feynman center will be UPS and generator-backed facility for important servers ... Made slave frontend install servers during mass reinstall phases. ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 26
Provided by: steve112
Category:

less

Transcript and Presenter's Notes

Title: Infrastructure and Provisioning at the Fermilab High Density Computing Facility


1
Infrastructure and Provisioning at the Fermilab
High Density Computing Facility
  • Steven C. Timm
  • Fermilab
  • HEPiX conference
  • May 24-26, 2004

2
Outline
  • Current Fermilab facilities
  • Expected need for future Fermilab facilities
  • Construction activity at High Density Computing
    Facility
  • Networking and power infrastructure
  • Provisioning and management at remote location

3
A cast of thousands.
  • HDCF design done by Fermilab Facilities
    Engineering,
  • Construction by outside contractor
  • Managed by CD Operations (G. Bellendir et al)
  • Requirements planning by taskforce of Computing
    Division personnel including system
    administrators, department heads, networking,
    facilities people.
  • Rocks development work by S. Timm, M. Greaney, J.
    Kaiser

4
Current Fermilab facilities
  • Feynman Computing Center built in 1988 (to house
    large IBM-compatible mainframe).
  • 18000 square feet of computer rooms
  • 200 tons of cooling
  • Maximum input current 1800A
  • Computer rooms backed up with UPS
  • Full building backed up with generator
  • 1850 dual-CPU compute servers, 200 multi-TB IDE
    RAID servers in FCC right now
  • Many other general-purpose servers, file servers,
    tape robots.

5
Current facilities continued
  • Satellite computing facility in former
    experimental hall New Muon Lab
  • Historically for Lattice QCD clusters (208?512)
  • Now contains gt320 other nodes waiting for
    construction of new facility

6
The long hot summer
  • In summer it takes considerably more energy to
    run the air conditioning.
  • Dependent on shallow pond for cooling water.
  • In May building has already come within 25A (out
    of 1800) from having to shut down equipment to
    shed power load and avoid brownout.
  • Current equipment exhausts the cooling capacity
    of Feynman computing center as well as the
    electric
  • No way to increase either in existing building
    without long service outages.

7
Computers just keep getting hotter
  • Anticipate that in fall 04 we can buy dual Intel
    3.6 GHz Nocona chip, 105W apiece
  • Expect at least 2.5A current draw per node, maybe
    more, 12-13 kVA per rack of 40 nodes.
  • In FCC we have 32 computers per rack, 8-9 kVA
  • Have problems cooling the top nodes even now.
  • New facility will have 5x more cooling, 270 tons
    for 2000 square feet
  • New facility will have up to 3000A of electrical
    current available.

8
We keep needing more computers
  • Moores law doubling time isnt holding true in
    commodity market
  • Computing needs are growing faster than Moores
    law and must be met with more computers
  • 5 year projections are based on plans from
    experiments.

9
Fermi Cycles as a function of time
YR2(X/F) Moores law says F1.5 years,
F2.02 years and growing. 1000 Fermi Cycles ?
PIII 1 GHz
10
Fermi Cycles per ampere as function of time
11
Fermi cycles per dollar as function of time
12
Strategy
  • Feynman center will be UPS and generator-backed
    facility for important servers
  • New HDCF will have UPS for graceful shutdown but
    no generator backup. Designed for high-density
    compute nodes (plus a few tape robots).
  • 10-20 racks of existing 1U will be moved to new
    facility and reracked
  • Anticipate10-15 racks of new purchase this fall
    also in new building

13
Location ofHDCF
1.5 miles away from FCC No administrators will be
housed therewill manage lights out
14
Floor plan of HDCF
Room for 72 racks in each of 2 computer rooms.
15
Cabling plan
Network Infrastructure Will use bundles of
individual Cat-6 cables
16
Current status
  • Construction began early May
  • Occupancy Nov/Dec 2004 (est).
  • Phase III, space for 56 racks at that time.
  • Expected cost US2.8M.

17
Power/console infrastructure
  • Cyclades AlterPath series
  • Includes console servers, network-based KVM
    adapters, and power strips
  • Alterpath ACS48 runs PPC Linux
  • Supports Kerberos 5 authentication
  • Access control can be divided by each port
  • Any number of power strip outlets can be
    associated with each machine on each console
    port.
  • All configurable via command line or Java-based
    GUI

18
Power/console infrastructure
PM-10 Power strip 120VAC 30A 10
nodes/circuit Four units/rack
19
Installation with NPACI-Rocks
  • NPACI (National Partnership for Advanced
    Computational Infrastructure), lead institution
    is San Diego Supercomputing Center
  • Rocksultimate cluster-in-a-box tool. Combines
    Linux distribution, database, highly modified
    installer, and a large amount of parallel
    computing applications such as PBS, Maui, SGE,
    MPICH, Atlas, PVFS.
  • Rocks 3.0 based on Red Hat Linux 7.3
  • Rocks 3.1 and greater based on SRPMS of Red Hat
    Enterprise Linux 3.0.

20
Rocks vs. Fermi Linux comparison
Fermi Linux REDHAT Rocks 3.0
R E D H A T 7. 3
Adds Workgroups Yum OpenAFS Fermi
Kerberos/ OpenSSH
Adds Extended kickstart HPC applications MySQL
database
21
Rocks Fermiarchitecture Application
  • Expects all compute nodes on private net behind a
    firewall
  • Reinstall node if any changes
  • All network services (DHCP, DNS, NIS) supplied by
    the frontend.
  • Nodes on public net
  • Users wont allow downtime for frequent reinstall
  • Use yum and other Fermi Linux tools for security
    updates
  • Configure ROCKS to use our external network
    services

22
Fermi extensions to Rocks
  • Fermi production farms currently have 752 nodes
    all installed with Rocks
  • This Rocks cluster has the most CPUs registered
    of any cluster at rocksclusters.org
  • Added extra tables to database for customizing
    kickstart configuration (we have 14 different
    disk configurations)
  • Added Fermi Linux comps files to have all Fermi
    workgroups available in installs, and all added
    Fermi RPMS
  • Made slave frontend install servers during mass
    reinstall phases. During normal operation one
    install server is enough.
  • Added logic to recreate kerberos keytabs

23
S.M.A.R.T Monitoring
  • smartd daemon from smartmontools package gives
    early warning of disk failures
  • Disk failures are 70 of all hardware failures
    in our farms over last 5 years.
  • Run short self-test on all disks every day

24
Temperature/power monitoring
  • Wrappers for lm_sensors feed NGOP and Ganglia.
  • Measure average temperature of nodes over a month
  • Alarm when 5C or 10C above average
  • Page when 50 of any group at 10C above average
  • Automated shutdown script activates when any
    single node is over emergency temperature.
  • Building-wide signal will provide notice that we
    are on UPS power and have 5 minutes to shut down.
  • Automated OS shutdown and SNMP poweroff scripts

25
Reliability is key
  • Can only successfully manage remote clusters if
    hardware is reliable
  • All new contracts are written with vendor
    providing 3 year warranty parts and laborthey
    only make money if they build good hardware
  • 30-day acceptance test is critical to identify
    hardware problems and fix them before production
    begins.
  • With 750 nodes and 99 reliability, still 8 nodes
    would be down a day.
  • Historically reliability is closer to 96 but new
    Intel-based Xeon nodes are much better.
Write a Comment
User Comments (0)
About PowerShow.com