SOS8 - PowerPoint PPT Presentation

About This Presentation
Title:

SOS8

Description:

Big and Not so Big Iron at SNL SNL CS R&D Accomplishment Pathfinder for MPP Supercomputing Our Approach Large systems with a few processors per node ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 19
Provided by: DouglasD67
Learn more at: https://www.csm.ornl.gov
Category:

less

Transcript and Presenter's Notes

Title: SOS8


1
Big and Not so Big Iron at SNL
2
SNL CS RD AccomplishmentPathfinder for MPP
Supercomputing
  • Sandia successfully led the DOE/DP revolution
    into MPP supercomputing through CS RD
  • nCUBE-10
  • nCUBE-2
  • IPSC-860
  • Intel Paragon
  • ASCI Red
  • Cplant
  • and gave DOE a strong, scalable parallel
    platforms effort

Computing at SNL is an Applications
success (i.e., uniquely-high scalability
reliability among FFRDCs) because CS RD
paved the way
Cplant
Note There was considerable skepticism in the
community that MPP computing would be a success
3
Our Approach
  • Large systems with a few processors per node
  • Message passing paradigm
  • Balanced architecture
  • Efficient systems software
  • Critical advances in parallel algorithms
  • Real engineering applications
  • Vertically integrated technology base
  • Emphasis on scalability reliability in all
    aspects

4
A Scalable Computing Architecture
5
ASCI Red
  • 4,576 compute nodes
  • 9,472 Pentium II processors
  • 800 MB/sec bi-directional interconnect
  • 3.21 Peak TFlops
  • 2.34 TFlops on Linpack
  • 74 of peak
  • 9632 Processors
  • TOS on Service Nodes
  • Cougar LWK on Compute Nodes
  • 1.0 GB/sec Parallel File System

6
Computational Plant
  • Antarctica - 2,376 Nodes
  • Antarctica has 4 heads with a switchable
    center section
  • Unclassified Restricted Network
  • Unclassified Open Network
  • Classified Network
  • Compaq (HP) DS10L Slates
  • 466MHz EV6, 1GB RAM
  • 600Mhz EV67, 1GB RAM
  • Re-deployed Siberia XP1000 Nodes
  • 500Mhz EV6, 256MB RAM
  • Myrinet
  • 3D Mesh Topology
  • 33MHz 64bit
  • A mix of 1,280 and 2,000 Mbit/sec technology
  • LANai 7.x and 9.x
  • Runtime Software
  • Yod - Application loader
  • Pct - Compute node process control
  • Bebopd - Allocation
  • OpenPBS - Batch scheduling
  • Portals Message Passing API
  • Red Hat Linux 7.2 w/2.4.x Kernel
  • Compaq (HP) Fortran, C, C
  • MPICH over Portals

7
Institutional Computing Clusters
  • Two (classified/unclassified), 256 Node Clusters
    in NM
  • 236 compute nodes
  • Dual 3.06GHz Xeon processors, 2GB memory
  • Myricom Myrinet PCI NIC (XP, REV D, 2MB)
  • 2 Admin nodes
  • 4 Login nodes
  • 2 MetaData Server (NDS) nodes
  • 12 Object Store Target (OST) nodes
  • 256 port Myrinet Switch
  • 128 node (unclassified) and a 64 Node
    (classified) Clusters in CA
  • Compute nodes
  • RedHat Linux 7.3
  • Application Directory
  • MKL math library
  • TotalView client
  • VampirTrace client
  • MPICH-GM
  • OpenPBS client
  • PVFS client
  • Myrinet GM
  • Login nodes
  • RedHat Linux 7.3
  • Kerberos
  • Intel Compilers
  • C, C
  • Fortran
  • Open Source Compilers
  • Gcc
  • Java
  • TotalView
  • VampirTrace
  • Myrinet GM
  • Administrative Nodes
  • Red Hat Linux 7.3
  • OpenPBS
  • Myrinet GM w/Mapper
  • SystemImager
  • Ganglia
  • Mon
  • CAP
  • Tripwire

8
Usage
9
Red Squall Development Cluster
  • Hewlett Packard Collaboration
  • Integration, Testing, System SW support
  • Lustre and Quadrics Expertise
  • RackSaver BladeRack Nodes
  • High Density Compute Server Architecture
  • 66 Nodes (132 processors) per Rack
  • 2.0GHz AMD Opteron
  • Same as Red Storm but w/commercial Tyan
    motherboards
  • 2 Gbytes of main memory per node (same as RS)
  • Quadrics QsNetII (Elan4) Interconnect
  • Best in Class (commercial cluster interconnect)
    Performance
  • I/O subsystem uses DDN S2A8500 Couplets with
    Fiber Channel Disk Drives (same as Red Storm)
  • Best in Class Performance
  • Located in the new JCEL facility

10
(No Transcript)
11
Red Storm Goals
  • Balanced System Performance - CPU, Memory,
    Interconnect, and I/O.
  • Usability - Functionality of hardware and
    software meets needs of users for
  • Massively Parallel Computing.
  • Scalability - System Hardware and Software scale,
    single cabinet system to
  • 20,000 processor system.
  • Reliability - Machine stays up long enough
    between interrupts to make real
  • progress on completing application run (at least
    50 hours MTBI), requires full
  • system RAS capability.
  • Upgradability - System can be upgraded with a
    processor swap and additional
  • cabinets to 100T or greater.
  • Red/Black Switching - Capability to switch major
    portions of the machine
  • between classified and unclassified computing
    environments.
  • Space, Power, Cooling - High density, low power
    system.
  • Price/Performance - Excellent performance per
    dollar, use high volume
  • commodity parts where feasible.

12
Red Storm Architecture
  • True MPP, designed to be a single system.
  • Distributed memory MIMD parallel supercomputer.
  • Fully connected 3-D mesh interconnect. Each
    compute node and service and I/O node processor
    has a high bandwidth, bi-directional connection
    to the primary communication network.
  • 108 compute node cabinets and 10,368 compute node
    processors.
  • (AMD Opteron _at_ 2.0 GHz)
  • 10 TB of DDR memory _at_ 333 MHz
  • Red/Black switching - 1/4, 1/2, 1/4.
  • 8 Service and I/O cabinets on each end (256
    processors for each color).
  • 240 TB of disk storage (120 TB per color).
  • Functional hardware partitioning - service and
    I/O nodes, compute nodes, and RAS nodes.
  • Partitioned Operating System (OS) - LINUX on
    service and I/O nodes, LWK (Catamount) on compute
    nodes, stripped down LINUX on RAS nodes.
  • Separate RAS and system management network
    (Ethernet).
  • Router table based routing in the interconnect.
  • Less than 2 MW total power and cooling.
  • Less than 3,000 square feet of floor space.

13
Red Storm Layout
  • Less than 2 MW total power and cooling.
  • Less than 3,000 square feet of floor space.
  • Separate RAS and system management network
    (Ethernet).
  • 3D Mesh 27 x 16 x 24 (x, y, z)
  • Red/Black split 2688 4992 2688
  • Service I/O 2 x 8 x 16

14
Red Storm Cabinet Layout
  • Compute Node Cabinet
  • 3 Card Cages per Cabinet
  • 8 Boards per Card Cage
  • 4 Processors per Board
  • 4 NIC/Router Chips per Board
  • N1 Power Supplies
  • Passive Backplane
  • Service and I/O Node Cabinet
  • 2 Card Cages per Cabinet
  • 8 Boards per Card Cage
  • 2 Processors per Board
  • 2 NIC/Router Chips per Board
  • PCI-X for each Processor
  • N1 Power Supplies
  • Passive Backplane

15
Red Storm Software
  • Operating Systems
  • LINUX on service and I/O nodes
  • LWK (Catamount) on compute nodes
  • LINUX on RAS nodes
  • File Systems
  • Parallel File System - Lustre (PVFS)
  • Unix File System - Lustre (NFS)
  • Run-Time System
  • Logarithmic loader
  • Node allocator
  • Batch system - PBS
  • Libraries - MPI, I/O, Math
  • Programming Model
  • Message Passing
  • Support for Heterogeneous Applications
  • Tools
  • ANSI Standard Compilers - Fortran, C, C
  • Debugger - TotalView
  • Performance Monitor
  • System Management and Administration
  • Accounting
  • RAS GUI Interface
  • Single System View

16
Red Storm Performance
  • Based on application code testing on production
    AMD Opteron processors we are now expecting that
    Red Storm will deliver around 10 X performance
    improvement over ASCI Red on Sandias suite of
    application codes.
  • Expected MP-Linpack performance - 30 TF.
  • Processors
  • 2.0 GHz AMD Opteron (Sledgehammer)
  • Integrated dual DDR memory controllers _at_ 333 MHz
  • Page miss latency to local processor memory is
    80 nano-seconds.
  • Peak bandwidth of 5.3 GB/s for each processor.
  • Integrated 3 Hyper Transport Interfaces _at_ 3.2
    GB/s each direction
  • Interconnect performance
  • Latency lt2 µs (neighbor) lt5 µs (full machine)
  • Peak Link bandwidth 3.84 GB/s each direction
  • Bi-section bandwidth 2.95 TB/s Y-Z, 4.98 TB/s
    X-Z, 6.64 TB/s X-Y
  • I/O system performance
  • Sustained file system bandwidth of 50 GB/s for
    each color.
  • Sustained external network bandwidth of 25 GB/s
    for each color.

17
HPC RD Efforts at SNL
  • Advanced Architectures
  • Next Generation Processor Interconnect
    Technologies
  • Simulation and Modeling of Algorithm Performance
  • Message Passing
  • Portals
  • Application characterization of message passing
    patterns
  • Light Weight Kernels
  • Project to design a next-generation lightweight
    kernel (LWK) for compute nodes of a distributed
    memory massively parallel system
  • Assess the performance, scalability, and
    reliability of a lightweight kernel versus a
    traditional monolithic kernel
  • Investigate efficient methods of supporting
    dynamic operating system services
  • Light Weight File System
  • only critical I/O functionality (storage,
    metadata mgmt, security)
  • special functionality implemented in I/O
    libraries (above LWFS)
  • Light Weight OS
  • Linux configuration to eliminate the need of a
    remote /root
  • Trimming the kernel to eliminate unwanted and
    unnecessary daemons
  • Cluster Management Tools
  • Diskless Cluster Strategies and Techniques
  • Operating Systems Distribution and Initialization

18
More Information
  • Computation, Computers, Information and
    Mathematics Center
  • http//www.cs.sandia.gov
Write a Comment
User Comments (0)
About PowerShow.com