Title: Supercomputer Platforms and Its Applications
1Supercomputer Platforms and Its Applications Dr.
George Chiu IBM T.J. Watson Research Center
2Plasma Science International Challenges
- Microturbulence Transport
- What causes plasma transport?
- Macroscopic Stability
- What limits the pressure in plasmas?
- Wave-particle Interactions
- How do particles and plasma waves interact?
- Plasma-wall Interactions
- How can high-temperature plasma and material
surfaces co-exist?
32007-2008 Deep Computing Roadmap Summary
1H07 2H07
1H08
2H08
PHV8
PL4/ML16
PL4/ML32
System P Servers
11S0
P6H
P5 560Q
p6 Blade
11S2
PHV8
p6IH
HV4
p6 Blade
JS21 IB AIX Solution CSM 1.6/RSCT 2.4.7
GPFS 3.1, LoadLeveler 3.4.1 PESSL 3.3, PE
4.3.1PERCS System Design Analysis
p6 IH/Blades IB SolutionsAIX 6.1 CSM 1.7.0.x
,GPFS 3.3, LoadLeveler 3.5, PE 5.1, ESSL 4.4,
PESSL 3.3
p6 IH/Blades IB SolutionsAIX 5.3 and SLES
10 Initial AIX 6.1 support for SMPs Ethernet
System P Software
Initial p6 support for SMPs EthernetGPFS 3.2
filesystem mgtCSM 1.7
x3455 DC
x3455 QC (Barcelona)
System XServers
x3550 Harpertown/ Greencreek Refresh
x3550 QC
x3850 QC
x3755 QC
iDPX Stoakley Planar
iDPX Thurley Planar
LS Blades gt Barcelona QC
HS21
LS21
LS41
System X Software
GPFS 3.3 and CSM 1.7.0.x support for System
x/1350
GPFS 3.2 support for System x/1350RHEL 5 support
CSM RHEL 5 support CSM 1.6/RSCT 2.4.7
CSM 1.7 for System x/1350
M50 R1
M60 R1
M60
Workstations
APro elim impacts DCV
Z40 R1
Z30 R1
Z40
BlueGene
Blue Gene /L
BG/L (EOL)
LA
BG/P 1st Petaflop
.
Blue GeneSoftware
BlueGene/P Support GPFS 3.2, CSM 1.7
LoadLeveler 3.4.2, ESSL 4.3.1
QS22
Cell BE
QS21 Prototype
QS20
SDK 3.0
SDK 4.0
SDK 5.0
SDK 2.1
QS21
System Accept
SystemStorage
DDN OEM Agreement
DCS9550
DCS9550
DS4800
EXP100 Attach
DS4800 Follow-on
DS4700 for HPC
SERVER SYSTEMS LEGEND
1st Petaflop dependent on BG client demand
Specific Exclusive
Repurposed Neither Specific nor exclusive
Specific but not exclusive
Source IBM Deep Computing Strategy 7.18.07
3
4IBM HPC roadmap
Power 7
Power 6
Power 5
Clusters and Blades
5IBM HPC conceptual roadmap POWER
- The POWER series is IBMs mainstream computing
offering - Market is about 60 commercial and 40 technical
- Product line value proposition
- General purpose computing engine
- Robustness, security reliability fitting
mission-critical requirements - Standard programming model and interfaces
- Performance leadership with competitive
performance/price value - Robust integration with industry standards
(hardware and software) - Current status
- POWER 6 announced
- POWER 7 is underway
Power 7
Power 6
Power 5
6ASC Purple
- 100TF Machine based on Power 5
- 1500 8-way Power5 Nodes
- Federation (HPS) 12K CPUs
- (1500 2 multi-plane fat-tree topology, 2x2
GB/s links) - Communication libraries lt 5 µs latency, 1.8
GB/s uni - GPFS 122 GB/s
- Supports NIF
7POWER Server Roadmap
2001
2007
2002-3
2004
2005-06
POWER4
POWER6
POWER4
POWER5
POWER5
65 nm
90 nm
130 nm
130 nm
180 nm
L2 caches
Advanced System Features Switch
Shared L2
Ultra High Frequency Very Large L2 Robust Error
Recovery High ST and HPC Perf High throughput
Perf More LPARs (1024) Enhanced memory
subsystem
Distributed Switch
Simultaneous multi-threading Sub-processor
partitioning Dynamic firmware updates Enhanced
scalability, parallelism High throughput
performance Enhanced memory subsystem
Reduced size Lower power Larger L2 More LPARs (32)
Chip Multi Processing - Distributed Switch -
Shared L2 Dynamic LPARs (16)
Autonomic Computing Enhancements
Planned to be offered by IBM. All statements
about IBMs future direction and intent are
subject to change or withdrawal without notice
and represent goals and objectives only.
8MareNostrum at a Glance
Challenge
IBM e1350 capability Linux cluster platform
comprising 42 IBM eServer p615 servers, 2560 IBM
eServer BladeCenter JS21 servers and IBM
TotalStorage hardware
- Deliver world-class deep-computing and
e-Science services with an attractive
cost/performance ratio - Enable collaboration among leading scientific
teams in the areas of biology, chemistry,
medicine, earth sciences and physics
Innovation
120 m² 750 kW
- Efficient integration of commercially available
commodity components - Modular and scalable open cluster architecture
- computing, storage, networking, software,
management, applications - Diskless capability
- improves node reliability, reducing installation
and maintenance costs - Record cluster density and power efficiency
- Leading price/performance and TCOin High
Performance Computing
94 TF DP (64-bit) 186 TF SP (32-bit) 376 Tops
(8-bit) 20 TB RAM, 370 TB disk Linux 2.6 1 in
Europe 9 in TOP500
9IBM HPC conceptual roadmap Blue Gene
- Blue Gene focuses on ultra-scalability
- Blue Gene works best for applications that
- scale naturally to 100s, 1000s or 100,000s of
processors, - tolerate a relatively small amount of memory per
processor. - For these applications, Blue Gene offers
- Best of breed performance/price value.
- Lowest operating costs through
- a small footprint and low
- power/performance.
Power 7
Power 6
Power 5
10System
BlueGene/P
72 Racks, 72x32x32
Cabled 8x8x16
Rack
32 Node Cards
1 PF/s 144 TB
13.9 TF/s 2 TB
Compute Card
1 chip, 20 DRAMs
435 GF/s 64 GB
Chip
4 processors
13.6 GF/s 2.0 GB DDR2 (4.0GB is an option)
13.6 GF/s 8 MB EDRAM
11(No Transcript)
12HPC Challenge Benchmarks
13System Power Efficiency
Gflops/Watt
14Failures per Month per _at_ 100 TFlops (20 BG/L
racks)unparalleled reliability
Results of survey conducted by Argonne National
Lab on 10 clusters ranging from 1.2 to 365 TFlops
(peak) excluding storage subsystem, management
nodes, SAN network equipment, software outages
15Classical MD ddcMD2005 Gordon Bell Prize
Winner!!
- Scalable, general purpose code for performing
classical molecular dynamics (MD) simulations
using highly accurate MGPT potentials - MGPT semi-empirical potentials, based on a
rigorous expansion of many body terms in the
total energy, are needed in to quantitatively
investigate dynamic behavior of d-shell and
f-shell metals.
524 million atom simulations on 64K nodes
achieved 101.5 TF/s sustained. Superb strong and
weak scaling for full machine - (very impressive
machine says PI)
Visualization of important scientific findings
already achieved on BG/L Molten Ta at 5000K
demonstrates solidification during isothermal
compression to 250 GPa
16Qbox First Principles Molecular
DynamicsFrancois Gygi UCD, Erik Draeger, Martin
Schulz, Bronis de Supinski, LLNLFranz Franchetti
Carnegie mellon, John Gunnels, Vernon Austel, Jim
Sexton, IBM
- Treats electrons quantum mechanically
- Treats nuclii classically
- Developed at LLNL
- BG Supported provided by IBM
- Simulated 1,000 Mo atoms with 12,000 electrons
- Achieves 207.3 Teraflops sustained.
- (56.8 of peak).
Qbox simulation of the transition from a
molecular solid (top) to a quantum liquid
(bottom) that is expected to occur in hydrogen
under high pressure.
17(No Transcript)
18Compute Power of the Gyrokinetic Toroidal
CodeNumber of particles (in million) moved 1
step in 1 second
BG/L at Livermore
Cray XT3/XT4
BG/L Optimal
BG/L
19Compute Power of the Gyrokinetic Toroidal
CodeNumber of particles (in million) moved 1
step in 1 secondBlueGene can reach 150 billion
particles in 2008, gt1 trillion in 2011.POWER6
can reach 1 billion particles in 2008, gt0.3
trillion in 2011.
BG/P at 3.5PF
P6 at 300TF
BG/L at Livermore
IBM Power
BG/L Optimal
Cray XT3/XT4
BG/L
20Rechenzentrum Garching at BG Watson GENE
Strong scaling of GENEv11 for a problem size of
300-500 GB with measurement points for 1k, 2k,
4k, 8k and 16k processors normalized to 1k
processors. Quasi-linear scaling has been
observed with a parallel efficiency of 95 on 8k
processors, and of 89 on 16k processors By
Hermann Lederer, Reinhard Tisma and Frank
Jenko, RZGand IPP, March 21,22 2007
21Current HPC Systems Characteristics
22Summary
- IBM is much involved in ITER applications through
its collaborations - Princeton Plasma Physics Laboratory
- Max-Planck-Institut für Plasma Physik/Rechenzentru
m Garching - Barcelona Supercomputer Center
- Oak Ridge National Laboratory
- IBM is also involved in laser-plasma fusion
through its collaborations - Lawrence Livermore National Laboratory
- Forschungszentrum Jülich
- IBM offers multiple platforms to address ITER
needs - POWER high memory capacity/node, moderate
interprocessor bandwidth, moderate scalability
capability and capacity machine - Blue Gene low power, low memory capacity/node,
high interprocessor bandwidth, highest
scalability - capability and capacity
applications - X Series and white box moderate memory
capacity/node, low interprocessor bandwidth,
limited, moderate scalability mostly capacity
machine.
23 24What BG brings to Core Turbulence Transport
- Benchmark case CYCLONE
- GENE lt 1 day on 64 procs few hours on 1024
procs BG/L - GYSELA 2.5 days on 64 procs
- ORB5 lt 1day on 64 procs few hours on 1024
procs BG/L - Similar ITER-size benchmark
- GENE ½ day on 6K procs BG/L
- GYSELA 10 days on 1024 procs
- ORB5 ½ day on 16K procs BG/L 1 week on 256
procs PC cluster
Courtesy José Mª Cela , Director of Applications,
BSC
25The Gyrokinetic Toroidal Code GTC
- Description
- Particle-in-cell code (PIC)
- Developed by Zhihong Lin (now at UC Irvine)
- Non-linear gyrokinetic simulation of
microturbulence Lee, 1983 - Particle-electric field interaction treated
self-consistently - Uses magnetic field line following coordinates
(y,q,z) - Guiding center Hamiltonian White and Chance,
1984 - Non-spectral Poisson solver Lin and Lee, 1995
- Low numerical noise algorithm (df method)
- Full torus (global) simulation
26BlueGene Key Applications - Major Scientific
Advances
- Qbox (DFT) LLNL 56.5 2006 Gordon-Bell
Award 64 racksCPMD IBM 30 highest
scaling 64 racks - ddcMD (Classical MD) LLNL 27.6 2005
Gordon-Bell Award 64 racksMDCASK LLNL highest
scaling 64 racksSPaSM LANL highest
scaling 64 racksLAMMPS SNL highest
scaling 16 racksBlue Matter IBM highest
scaling 16 racksRosetta UW highest
scaling 20 racksAMBER 8 racks - Quantum Chromodynamics IBM 30 2006 GB Special
Award 64 racksQCD at KEK 10 racks - sPPM (CFD) LLNL 18 highest scaling 64
racksMiranda LLNL highest scaling 64
racksRaptor LLNL highest scaling 64
racksDNS highest scaling 16 racksPETSc
FUN3D ANL 14.2NEK5 (Thermal Hydraulics) ANL
22 - ParaDis (dislocation dynamics) LLNL highest
scaling 64 racks - GFMC (Nuclear Physics) ANL 16
- WRF (Weather) NCAR 14 highest scaling 64
racksPOP (Oceanography) highest scaling 16
racks - HOMME (Climate) NCAR 12 highest scaling 32
racks - GTC (Plasma Physics) PPPL highest scaling 16
racks ORB5 RZG highest scaling 8 racksGENE
RZG 12.5 highest scaling 16 racks - Flash (Supernova Ia) highest scaling 32
racksCactus (General Relativity) highest
scaling 16 racks - AWM (Earthquake) highest scaling 20 racks
27Science
Theory
Experiment
Simulation