Partitioning the CubedSphere for BGL - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Partitioning the CubedSphere for BGL

Description:

A square collection of spectral elements Ne*Ne. Cube. Total ... Tk := Time to execute on processor k. D := Serial time to execute a single spectral element ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 22
Provided by: JohnD9
Category:

less

Transcript and Presenter's Notes

Title: Partitioning the CubedSphere for BGL


1
Partitioning the Cubed-Sphere for BG/L
  • John M. Dennis, Henry M. Tufo, Richard D. Loft
  • dennis,tufo,loft_at_ucar.edu
  • National Center for Atmospheric Research
  • Computational Science Section
  • Boulder, Colorado USA

2
Overview
  • Dynamical core of high res. Atmospheric Global
    Circulation Model (AGCM) on BG/L
  • High Order Multi-scale Modeling Environment
    (HOMME)
  • spectral element based
  • 9.5 km mesh at equator (6.7M grid points)
  • Explicit time-stepping (dt 5 sec)
  • Project performance to 54K BG/L processors -gt (23
    - 39 Tflops)
  • Similar floating-point rate for 10 km mesh AGCM
    on ES (26.6 Tflops)

3
Outline
  • BG/L Hardware (final and prototype)
  • Computational Mesh
  • Cubed-sphere
  • Performance Model
  • Description
  • IBM P690 results
  • Prototype BG/L results
  • Prediction for 9.5 km simulation

4
BG/L Hardware
  • 64k node supercomputer
  • Two PowerPC 440 cores per node (750Mhz)
  • 180/360 Tflops Peak
  • Networks
  • 3D torus network (64x32x32)
  • (1.5?s, 175 Mbyte/sec/link)
  • Fat-Tree reduction network
  • (5.0?s, 350 Mbyte/sec)

5
Prototype BG/L Hardware
  • 512 nodes
  • 32 nodes per board
  • 1/2 rack
  • Two 500 Mhz PowerPC per node
  • Network
  • 3D mesh (8x8x8)

6
Computational MeshCubed-Sphere
  • Spectral Elements
  • A quadrilateral patch of gridpoints NpNp
  • Cube face
  • A square collection of spectral elements NeNe
  • Cube
  • Total number of spectral elements 6NeNe
  • Partitioning strategy
  • Place one or more spectral element on each
    processor
  • Use space-filling curves

7
Performance Model
Tk D nelemdk ?( ts s(k,l) Bw) Tserial
D K Speedup Tserial/maxk(Tk)
Parameters K Total number of spectral
elements Tk Time to execute on processor k D
Serial time to execute a single spectral
element Nelemdk Number of spectral elements on
processor k ts Network latency Bw Network
bandwidth (including contention) s(k,l)
Message volume between the kth and lth processor
8
Performance Model (cont)
  • Explicit time stepping
  • Very cache friendly
  • Serial performance NOT dependent on problem size
  • Semi-implicit time stepping
  • Preconditioned conjugate gradient solver
  • Preconditioner is not cache friendly
  • Serial performance dependent on problem size
  • Cache size
  • Memory bandwidth per processor
  • Memory bandwidth per SMP node

9
HOMME on IBM P690 Cluster
  • Validate performance model with lower resolution
  • Spectral elements (Np6)
  • K1536 elements (Ne16)
  • 16 vertical levels
  • Perf. Model accurate to 10
  • Model less accurate at 768 processors
  • gt 50 communication time

10
HOMME on prototype BG/L Hardware
  • Machine
  • 512 nodes
  • 8x8x8 mesh
  • Single processor per node (500 Mhz)
  • HOMME configuration
  • Spectral element (Np8)
  • K1536 elements (Ne16)
  • 16 vertical levels
  • Impact of contention?
  • 9 messages per link (experimental)
  • Perf. Model accurate to 1
  • 23 communication time _at_ 512 processors

11
HOMME on BG/L
  • Computational Mesh
  • K55296 spectral elements (Np10,Ne 96)
  • 96 vertical levels
  • 9.5 km mesh (6.7M velocity grid points)
  • Give bounds for BG/L performance predictions
  • Single Processor 450 to 750 Mflops
  • Network
  • 50 to 100 of projected
  • Contention 4.5 per link

12
Possible BG/L configurations
13
Projected HOMME Performance for K55296
14
Conclusions
  • Perf. Model accurately predicts execution time of
    explicit time-stepping on ?(1000) processors (IBM
    P690, prototype BG/L)
  • Perf. Model should accurately predict BG/L
    execution time
  • Communication time 7 on 54K processors
  • HOMME should achieve 23-39 Tflops on 54K
    processors
  • 9.5 km mesh should achieve similar performance
    levels versus 10 km mesh AGCM on ES

15
Questions?
  • Thanks
  • IBM BlueGene/L development team
  • Funding
  • Department of Energy Climate Change Prediction
    Program
  • Contact
  • John Dennis dennis_at_ucar.edu

16
Partitioning a cubed-sphere on 8 processors
17
Partitioning a cubed-sphere on 8 processors
18
Peano curve construction (P3m)
19
Hilbert curve construction (P2n)
20
Partitioning with Space-Filling Curves
M
Meandering Peano M 3m
Hilbert curve M 2n
Hilbert-Peano curve M 2n3m
21
Application of SFC to cubed-sphere (cont)
Write a Comment
User Comments (0)
About PowerShow.com