Partitioning the CubedSphere for BGL

About This Presentation

Title:

Partitioning the CubedSphere for BGL

Description:

A square collection of spectral elements Ne*Ne. Cube. Total ... Tk := Time to execute on processor k. D := Serial time to execute a single spectral element ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 22

Provided by: JohnD9

Learn more at: https://www.cisl.ucar.edu

Category:

more less

Transcript and Presenter's Notes

Title: Partitioning the CubedSphere for BGL

1
Partitioning the Cubed-Sphere for BG/L

John M. Dennis, Henry M. Tufo, Richard D. Loft
dennis,tufo,loft_at_ucar.edu
National Center for Atmospheric Research
Computational Science Section
Boulder, Colorado USA

2
Overview

Dynamical core of high res. Atmospheric Global
Circulation Model (AGCM) on BG/L
High Order Multi-scale Modeling Environment
(HOMME)
spectral element based
9.5 km mesh at equator (6.7M grid points)
Explicit time-stepping (dt 5 sec)
Project performance to 54K BG/L processors -gt (23
- 39 Tflops)
Similar floating-point rate for 10 km mesh AGCM
on ES (26.6 Tflops)

3
Outline

BG/L Hardware (final and prototype)
Computational Mesh
Cubed-sphere
Performance Model
Description
IBM P690 results
Prototype BG/L results
Prediction for 9.5 km simulation

4
BG/L Hardware

64k node supercomputer
Two PowerPC 440 cores per node (750Mhz)
180/360 Tflops Peak
Networks
3D torus network (64x32x32)
(1.5?s, 175 Mbyte/sec/link)
Fat-Tree reduction network
(5.0?s, 350 Mbyte/sec)

5
Prototype BG/L Hardware

512 nodes
32 nodes per board
1/2 rack
Two 500 Mhz PowerPC per node
Network
3D mesh (8x8x8)

6
Computational MeshCubed-Sphere

Spectral Elements
A quadrilateral patch of gridpoints NpNp
Cube face
A square collection of spectral elements NeNe
Cube
Total number of spectral elements 6NeNe
Partitioning strategy
Place one or more spectral element on each
processor
Use space-filling curves

7
Performance Model
Tk D nelemdk ?( ts s(k,l) Bw) Tserial
D K Speedup Tserial/maxk(Tk)
Parameters K Total number of spectral
elements Tk Time to execute on processor k D
Serial time to execute a single spectral
element Nelemdk Number of spectral elements on
processor k ts Network latency Bw Network
bandwidth (including contention) s(k,l)
Message volume between the kth and lth processor
8
Performance Model (cont)

Explicit time stepping
Very cache friendly
Serial performance NOT dependent on problem size
Semi-implicit time stepping
Preconditioned conjugate gradient solver
Preconditioner is not cache friendly
Serial performance dependent on problem size
Cache size
Memory bandwidth per processor
Memory bandwidth per SMP node

9
HOMME on IBM P690 Cluster

Validate performance model with lower resolution
Spectral elements (Np6)
K1536 elements (Ne16)
16 vertical levels
Perf. Model accurate to 10
Model less accurate at 768 processors
gt 50 communication time

10
HOMME on prototype BG/L Hardware

Machine
512 nodes
8x8x8 mesh
Single processor per node (500 Mhz)
HOMME configuration
Spectral element (Np8)
K1536 elements (Ne16)
16 vertical levels
Impact of contention?
9 messages per link (experimental)
Perf. Model accurate to 1
23 communication time _at_ 512 processors

11
HOMME on BG/L

Computational Mesh
K55296 spectral elements (Np10,Ne 96)
96 vertical levels
9.5 km mesh (6.7M velocity grid points)
Give bounds for BG/L performance predictions
Single Processor 450 to 750 Mflops
Network
50 to 100 of projected
Contention 4.5 per link

12
Possible BG/L configurations
13
Projected HOMME Performance for K55296
14
Conclusions

Perf. Model accurately predicts execution time of
explicit time-stepping on ?(1000) processors (IBM
P690, prototype BG/L)
Perf. Model should accurately predict BG/L
execution time
Communication time 7 on 54K processors
HOMME should achieve 23-39 Tflops on 54K
processors
9.5 km mesh should achieve similar performance
levels versus 10 km mesh AGCM on ES

15
Questions?

Thanks
IBM BlueGene/L development team
Funding
Department of Energy Climate Change Prediction
Program
Contact
John Dennis dennis_at_ucar.edu

16
Partitioning a cubed-sphere on 8 processors
17
Partitioning a cubed-sphere on 8 processors
18
Peano curve construction (P3m)
19
Hilbert curve construction (P2n)
20
Partitioning with Space-Filling Curves
M
Meandering Peano M 3m
Hilbert curve M 2n
Hilbert-Peano curve M 2n3m
21
Application of SFC to cubed-sphere (cont)

Write a Comment

User Comments (0)