The Blue GeneL Supercomputer George Chiu - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

The Blue GeneL Supercomputer George Chiu

Description:

The Blue GeneL Supercomputer George Chiu – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 22
Provided by: lesc9
Category:

less

Transcript and Presenter's Notes

Title: The Blue GeneL Supercomputer George Chiu


1
The Blue Gene/L Supercomputer George Chiu
2
June 2005 25th TOP10 of the TOP500 list
IBM is the clear leader in the TOP500 list with
51.8 of systems and 57.9 of installed
performance.
3
BG/L 32768 nodes (IBM Rochester) Linpack 136.8
TF/s sustained, 183.5 TF/s peak 1 TF
1000,000,000,000 Flops
4
Blue Gene/L Sales
  • Advanced Industrial Science and Technology, Japan
    (Yutaka Akiyama) 4 racks 2/05
  • Argonne National Laboratory Consortium (William
    Gropp, Rick Stevens) 1 rack 12/04
  • ASTRON Lofar - Holland (Harvey Butcher) 6
    racks 3/05
  • Boston University (Claudio Rebbi) 1 rack
    12/04
  • Ecole Polytechnique Federale de Lausanne (Henry
    Markram) 4 racks 06/05
  • IBM Yorktown Research Center 22 racks 06/05
  • IBM Almaden Research Center 2 racks 03/05
  • Juelich (Thomas Lippert) 1 rack 7/05
  • Lawrence Livermore National Laboratory (Mark
    Seager, Don Dossa) 65 racks 32 3/05
  • National Center for Atmospheric Research (Richard
    Loft) 1 rack 3/05
  • NIWS (Suesada) 1 rack 1/05
  • San Diego Supercomputing Center (Wayne Pfeiffer)
    - Intimidata 1 rack 12/17/04
  • University of Edinburgh (Anthony Kennedy, Richard
    Kenway) 1 rack 12/04

5
BlueGene/L System Buildup
System
64 Racks, 64x32x32
Rack
32 Node Cards
180/360 TF/s 32 TB
Node Card
(32 chips 4x4x2) 16 compute, 0-2 IO cards
2.8/5.6 TF/s 512 GB
Compute Card
2 chips, 1x2x1
Chip
90/180 GF/s 16 GB
2 processors
5.6/11.2 GF/s 1.0 GB
2.8/5.6 GF/s 4 MB
6
Two Computation Modes for the BG/L Node
  • Mode 1 (Co-processor mode)
  • CPU0 does all the computations
  • CPU1 does all the communications (including
    MPI etc)
  • Communications can overlap with computations
  • Peak compute performance is 5.6/2 2.8
    GFlops
  • Mode 2 (Virtual node mode)
  • CPU0, CPU1 act as independent virtual nodes
  • Each one does both computations and
    communications
  • The two CPUs communicate via common memory
    buffers
  • Computations and communications can not
    overlap.
  • Peak compute performance is 5.6 GFlops

CPU0
CPU1
CPU0
CPU1
7
(No Transcript)
8
Supercomputer Power Efficiencies
9
Microprocessor Power Density Growth
10
(No Transcript)
11
HPC Challenge Global-Random Access (Gup/s)
12
Summary of performance results
  • DGEMM
  • 92.3 of dual core peak on 1 node
  • LINPACK
  • 73.73 of peak on 32,768 nodes (136.8 Tflops/s on
    3/23/05)
  • SPECint2000 316
  • SPECfp2000 436
  • G-FFTE 48.993 GFlop/s
  • STREAM
  • Tuned Copy 3.8 GB/s, Scale 3.3 GB/s, Add
    2.8 GB/s, Triad 3.0 GB/s
  • Standard Copy 1.8 GB/s, Scale 1.3 GB/s, Add
    1.5 GB/s, Triad 1.5 GB/s
  • Competitive with STREAM numbers for most high end
    microprocessors
  • G-Random Access
  • Ranked 1 in HPCC at 0.134994 gups/s
  • MPI
  • Latency 3.3 ls at 700 MHz

13
Comparing Systems
14
BlueGene/L Compute ASIC
  • IBM CU-11, 0.13 µm
  • 11 x 11 mm die size
  • 25 x 32 mm CBGA
  • 474 pins, 328 signal
  • 1.5/2.5 Volt

15
Dual FPU Architecture
  • Two 64 bit floating point units
  • Designed with input from compiler and library
    developers
  • SIMD instructions over both register files
  • FMA operations over double precision data
  • More general operations available with cross and
    replicated operands
  • Useful for complex arithmetic, matrix multiply,
    FFT
  • Parallel (quadword) loads/stores
  • Fastest way to transfer data between processors
    and memory
  • Data needs to be 16-byte aligned
  • Load/store with swap order available
  • Useful for matrix transpose

16
BlueGene/L Interconnection Networks
  • 3 Dimensional Torus
  • Interconnects all compute nodes (65,536)
  • Virtual cut-through hardware routing
  • 1.4Gb/s on all 12 node links (2.1 GB/s per node)
  • 1 µs latency between nearest neighbors, 5 µs to
    the farthest
  • MPI 3.3 µs latency for one hop, 10 µs to the
    farthest
  • Communications backbone for computations
  • 0.7/1.4 TB/s bisection bandwidth, 68TB/s total
    bandwidth
  • Collective Network
  • One-to-all broadcast functionality
  • Reduction operations functionality
  • 2.8 Gb/s of bandwidth per link
  • Latency of one way tree traversal 2.5 µs, MPI 6
    µs
  • 23TB/s total binary tree bandwidth (64k machine)
  • Interconnects all compute and I/O nodes (1024)
  • Ethernet
  • Incorporated into every node ASIC
  • Active in the I/O nodes (164)
  • All external comm. (file I/O, control, user
    interaction, etc.)

17
BlueGene/L System
18
Complete BlueGene/L System at LLNL
BG/L I/O nodes 1,024
WAN
506
visualization
128
archive
128
BG/L compute nodes 65,536
Federated Gigabit Ethernet Switch 2,048 ports
CWFS
226
1024
Front-end nodes
28
Service node
8
8
Control network
19
Applications
  • N-body simulation
  • Classical molecular dynamics AMBER8, Blue
    Matter, ddcMD, DL_POLY, GRASP, LAMMPS, LJ,
    MDCASK, NAMD, PRIME (Schrodinger), SPaSM
  • Quantum chemistry CHARMm, CPMD, FEQMD,
    GAMESS-UK, GAMESS-US, Gaussian, NWChem, Qbox,
    QMC
  • Plasma Physics TBLE
  • Stellar dynamics of galaxies Enzo
  • Complex multiphysics code
  • Climatology CCSM, HOMME
  • Computational Fluid Dynamics FLUENT, Miranda,
    Overflow, POP (Ocean), Raptor, SAGE, sPPM,
    STAR-CD,
  • Astronomy Accretion, planetary formation and
    evolution, stellar evolution, FLASH (supernova),
    Radiotelescope (Astron)
  • Electromagnetics FDTD code
  • Finite element analysis, Car Crash LS-Dyna,
    NASTRAN, PAM-Crash (ESI), HPCMW (RIST)
  • Radiative transport 2-D SPHOT, 3-D UMT2000
  • Neutron transport Sweep3D
  • Weather MM5, IFS (ECMWF)
  • Life Sciences and Biotechnology mpiBLAST,
    Smith-Waterman
  • CAD/CAE AVBP
  • Crystallography with X-Ray Diffraction Shake
    Bake
  • Drug Screening OpenEye Scientific Software,
    Tripos, MOE (CCG Chemical Computing Group)
  • Finance NIWS (Nissei)

20
BlueGene/L will allow overlappingevaluation of
models for first time
Continuum
Mesoscale
s
Time Scale
Microscale
ms
Finite ElementPlasticity of complex shapes
Atomic Scale
Aggregate Materials Aggregate grain response,
poly-crystal plasticity
ms
ns
Dislocation Dynamics Collective behavior of
defects, single crystal plasticity
BlueGene/L simulations bring qualitative change
to material and physics modeling efforts
Molecular Dynamics Unit mechanisms of defect
mobility and interaction
ps
mm
Length Scale
mm
nm
21
Closing Points
  • Blue Gene represents an innovative way to scale
    to multi-teraflops capability
  • Massive scalability
  • Efficient packaging for low power and floor space
    consumption
  • Unique in the market for its balance between
    massive scale-out capacity and preservation of
    familiar user/administrator environments
  • Better than COTS clusters by virtue of density,
    scalability and innovative interconnect design
  • Better than vector-based supercomputers by virtue
    of adherence to Linux and MPI standards
  • Blue Gene is applicable to a wide range of Deep
    Computing workloads
  • Programs are underway to ensure Blue Gene
    technology is accessible to a broad range of
    researchers and technologists
  • Based on PowerPC, Blue Gene leverages and
    advances core IBM technology
  • Blue Gene RD continues so as to ensure the
    program stays vital
Write a Comment
User Comments (0)
About PowerShow.com