Production Linux Capacity Computing at Los Alamos - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Production Linux Capacity Computing at Los Alamos

Description:

Production Linux Capacity Computing at Los Alamos. Steven ... (soon Turquoise) network. Open. NFS. Servers. 1 dual-processor. BProc master. node. GigE. network ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 31
Provided by: douglas167
Category:

less

Transcript and Presenter's Notes

Title: Production Linux Capacity Computing at Los Alamos


1
Production Linux Capacity Computing at Los Alamos
  • Steven R. Shaw, CCN-7
  • High Performance Computing Systems
  • Computing, Communications, and Networking Division

2
Topics
  • Vision and goals
  • Methodology
  • Clustermatic and other components
  • Current systems
  • Lightning
  • Flash
  • Configuration management
  • Operational opportunities
  • Lessons learned
  • Current and future work
  • Questions

3
Our Capacity Vision
  • to meet requirements of programs within
    reasonable resources, is to consolidate
    architectures, leverage commodity computing,
    Linux open source software, and standardized
    deployment over capacity systems
  • Cheryl Wampler, ASC PI Meeting, March 1-4, 2004.

4
Goals for Production Linux Capacity Computing
  • Respond to the need for additional capacity
    computing
  • Provide stability and continuity for user
    community
  • Lower integration and operational costs by
    leveraging internal resources and open source
    software
  • Use repeatable processes and automation to
    deploy new capacity quickly and to efficiently
    operate and maintain existing systems

5
Goals (continued)
  • Provide more compute cycles to users by making
    systems easier to build and manage do more with
    available resources
  • Move toward a separate common file system, not
    tied to specific platforms
  • Also move toward a separate standards based
    scalable IO network to files systems, archival
    storage and other services.

6
Methodology
  • CCN Division took on role of the system
    integrator
  • Successful collaborative relationships were
    established with CCS-1 (LA-MPI, Science
    Appliance), CCN-8 (Panasas FS, Compliers, and
    tools), CCN-5 (network integration), and
    third-party softwaresuppliers
  • Built upon our Linuxcluster experiencefrom Pink
    and othersystems

7
Pink Configuration
64 dual-processorI/O nodes
958 dual-processor production computingnodes
GigEnetwork
Myrinet
1 dual-processorBProc masternode
2 dual-processorfront-end nodes
Panasas Global FS
LANLYellow(soon Turquoise)network
OpenNFSServers
8
Science Appliance
  • The key software in a Science Appliance is a
    suite that LANL developed called "Clustermatic"
  • Clustermatic can completely control a cluster,
    from the BIOS up to a high level programming
    environment.
  • It features the Beowulf Distributed Process Space
    (BProc), LinuxBios, and a variety of other
    open-source kernel modifications, utilities, and
    libraries.
  • Very quick node boot times
  • Cluster boot and upgrade in minutes
  • Manageable nodes from power-on
  • Single system image for the entire cluster
  • Quick process migration

9
Clustermatic Awards
  • Research and Development Magazines 2004 Research
    and Development 100.
  • Clustermatic is a revolutionary software
    suite for managing, monitoring, administering and
    operating clusters on network-connected computers
    running as a high-performance system.
    Clustermatic increases reliability and
    efficiency, decreases node autonomy, simplifies
    computer programming, reduces administration
    costs, and minimizes a user's reliance on
    unpredictable software, enabling commodity-based
    cluster networks to compete with the higher-cost
    supercomputers.
  • The Clustermatic system was awarded the
    Excellence in Cluster Technology Award for Open
    Source Cluster Solutions at the 2004 ClusterWorld
    Conference Exposition, in April 2004.

10
Clustermatic Components
  • A traditional cluster is built by replicating a
    complete system software environment on every
    node.
  • In a Science Appliance (Clustermatic system), we
    have master nodes and slave nodes, but only the
    master nodes have a fully-configured system.
  • The slave nodes run a minimal software stack
    consisting of LinuxBIOS, Linux, and BProc.
  • Culture change for users, not every tool and
    library exists on the slave nodes.

11
Clustermatic Components
  • Most importantly, BProc enables a distributed
    process space across nodes within the cluster
    all user processes running on the slave nodes
    appear as processes running on the master node.
  • Users create processes on the master node and
    the system migrates them (the processes) to the
    slave nodes.
  • Standard input, output, and error streams are
    redirected to the master node.

Slave nodes
Master node
  • Processes remain visible, controllable on master.

12
Other Key Components
  • Panasas file system
  • LA-MPI
  • User environment similar to other Los Alamos
    systems
  • HPSS
  • LSF
  • TotalView
  • HPC toolkit

13
Science Appliance Systems at LANL
  • Lightning, Pink, Grendels, Flash, TLC
  • MPI LSF are BProc-integrated.
  • Result LANL Science Appliance systems are easy
    to use but are different than other LANL systems

14
Los Alamos Platforms
15
Lightning Capacity System Overview (last week)
  • System Hardware
  • 1408 dual-processor LNXI AMD Opteron nodes
    (11.26 TeraOps peak, 5.6 TB memory)
  • One Arima Rio Works HDAMA system board with AMD
    8111 and 8131 chipsets
  • Two 2.0 GHz 64-bit processors with 1 MB L2
    cache/node
  • Four GB of memory/node
  • One 120-GB disk drive/node
  • One ICEBOX controller/node for hardware
    monitoring
  • Scalable to 2048 nodes (scalable design plans for
    interconnect)
  • Myrinet Interconnect (latency 7 usec, bandwidth
    250 MB/sec)
  • Gigabit copper network to network services such
    as NFS, Panasas
  • A copper-based 10/100 network for system
    monitoring system reboot, etc.
  • System Software
  • Linux
  • Clustermatic software
  • Beoboot, LinuxBios, Bproc, Supermon
  • Compilers
  • Message Passing
  • LAMPI
  • Debugging - TotalView
  • Archival storage - HPSS
  • Resource management - Load Sharing Facility (LSF)

16
Lightning Integration and Deployment
Contract signed mid-July 2003
Level 2 (SCS) Lightning User Environment
Level 2 (PC) November 25, 2005
System Delivered
Beta mode
Integration/Acceptance Test
Limited Availability
General
Laboratory Standdown
Secure Environment
Feb
Dec
Nov
Jan
Sept
Oct
Aug
Sep
Oct
Jun
May
Jul
Mar
Apr
Nov
2003
2004
DP Award of Excellence For Integration Effort
Linpack Run 8.051 TF 64-bit Linpack 6 on Top500
17
Lightning last week
18
  • Linux production and development environment
    model Production segmentsDevelopment
    environmentsSupport and system functions

19
Flash Timeline
  • Assemble hardware 11/17-11/19/04
  • Stabilize hardware 11/20 11/24
  • Acceptance testing complete 12/1
  • Software install 12/2 12/17
  • 88 Person-hours
  • First I/O node system on Opteron
  • Panasas and network setup in parallel
  • Friendly users on 12/19

20
Configuration Management
  • Philosophy
  • All maintenance and installation is done within
    the configuration management system
  • Motivation
  • Do more with available resources
  • Automation is key
  • Expertise is encoded
  • Automated systems are consistent and tireless
  • Prevent errors and mitigate consequences
  • Avoid creating error-likely situations
  • Correlate effect with cause
  • Manual actions reduce the capacity to respond

21
Configuration Management
  • A framework for automating, to the fullest
    extent possible, in a cross-platform and common
    fashion the configuration of a product.
  • Differentiate products at major boundaries that
    make sense (O/S, Linux version, Bproc or not,
    chip architecture, unique service, etc.)
  • Databases become the documentation

22
Configuration Management Culture Change
  • The database is pointless if the system diverges
    from its description due to actions taken outside
    the data base
  • All changes, even temporary and debugging in
    nature, must be done using our configuration
    management tools

23
Configuration Management Tools
  • Rsync High confidence mirroring of files
  • systemimager - Installation, replication and
    disaster recovery of the core system
  • Cfengine Rule based files for installation and
    configuration actions
  • systemimager provides the body, cfengine creates
    the soul

24
More Configuration Management Tools
  • Revision Control System (RCS) track origin and
    history
  • Annotated history within the cfengine database
  • RPM (Redhat Package Manager)
  • Deterministic, verifiable, removable
  • Culture change for some of our suppliers

25
Configuration Management Automation and Discipline
  • Leads to systems that
  • Are more predictable behavior can be
    ascertained from the database
  • More scalable copies are easier
  • Better documented
  • Easier to debug
  • Easier to repair
  • Enables us to accomplish more with our available
    resources

26
Operational Opportunities
  • Hardware maintenance
  • Field replaceable unit is the node
  • Rapid boot time dramatically shortens the time to
    repair
  • Use operations staff for hands-on maintenance,
    vendor becomes a parts supplier and second tier
    support
  • Repair the node during prime time and burn-in,
    maintaining a supply of tested spares.
  • Increased job content and satisfaction for
    operators

27
Operational Opportunities
  • Automated interrupt reporting
  • When a node becomes interrupted, the HPC
    operators are notified by email and a GUI
    display.
  • Event driven notification.
  • A record for the interrupt is generated
    automatically in the Remedy database and its
    status is left open awaiting the problem
    resolution.
  • When a node is returned to service, the Remedy
    ticket is automatically updated with the time.
  • In many cases the cause of the interrupt and
    associated error message are captured in the
    ticket.
  • Results in more complete and accurate information.

28
Lessons Learned
  • Integration issues
  • Be sure your suppliers understand your production
    support needs and are committed
  • Remember you own the complete support chain
  • Culture change issues
  • Users shift from every tool everywhere to a
    more deterministic model
  • Be willing to negotiate the rightweight system
  • Administrators - configuration management
    discipline
  • Software suppliers conform to the configuration
    management requirements
  • BProc Master Nodes loading

29
Current and Future Work
  • Lightning
  • Integrate 256 additional nodes
  • Reconfigure GIG-E and implement I/O nodes
  • Increase Panasas to 200TB
  • 8Gb on all nodes
  • Lightning and Flash
  • 64 bit Linux 2.6 Bproc V4
  • Posix threads
  • OpenMPI
  • PScalBB (Scalable and available I/O network
    design)

30
Thanks and Questions
  • My thanks to the following people for providing
    and helping with content
  • Harvey Wasserman, CCN-7
  • Dave Neal, Jerry DeLapp and Daryl Grunau, CCN-9
  • Ron Minnich, CCS-1
  • Cheryl Wampler, PADNWP
  • Thank you for you attention and now for your
    questions.
Write a Comment
User Comments (0)
About PowerShow.com