Recent Developments for Parallel CMAQ - PowerPoint PPT Presentation

About This Presentation
Title:

Recent Developments for Parallel CMAQ

Description:

Some background: MPI data communication ghost (halo) regions ... The platform (cypress00, cypress01, cypress02) consists of 3 SP-Nighthawk nodes ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 34
Provided by: jeff273
Category:

less

Transcript and Presenter's Notes

Title: Recent Developments for Parallel CMAQ


1
Recent Developments for Parallel CMAQ
  • Jeff Young AMDB/ASMD/ARL/NOAA
  • David Wong SAIC NESCC/EPA

2
AQF-CMAQ
Running in quasi-operational mode at NCEP
26 minutes for a 48 hour forecast (on 33
processors)
3
Code modifications to improve data locality
Some vdiff optimization Jerry Gipsons MEBI/EBI
chem solver Tried CGRID( SPC, LAYER, COLUMN, ROW
) tests Indicated not good for MPI
communication of data
Some background MPI data communication ghost
(halo) regions
4
Ghost (Halo) Regions
DO J 1, N DO I 1, M DATA( I,J ) A(
I2,J ) A( I,J-1 ) END DO END DO
5
Horizontal Advection and Diffusion Data
Requirements
3
4
5
3
4
Ghost Region
0
2
1
Exterior Boundary
Stencil Exchange Data Communication Function
6
Architectural Changes for Parallel I/O
Tests confirmed latest releases (May, Sep 2003)
not very scalable
Background
Original Version
Latest Release
AQF Version
7
Parallel I/O
2002 Release
Computation only
Read, write and computation
Read and computation
2003 Release
Write and computation
Write only
asynchronous
AQF
Data transfer by message passing
8
Modifications to the I/O API for Parallel I/O For
3D data, e.g.
time interpolation
INTERP3 (
FileName, VarName,
ProgName,
Date, Time,
NcolsNrowsNlays,
Data_Buffer )
spatial subset
StartRow, EndRow,
StartLay, EndLay,
XTRACT3 (
FileName, VarName,
StartCol, EndCol,
Data_Buffer )
Date, Time,
StartCol, EndCol,
StartRow, EndRow,
INTERPX (
FileName, VarName,
ProgName,
StartLay, EndLay,
Data_Buffer )
Date, Time,
WRPATCH (
FileID, VarID,
TimeStamp,
Record_No )
Called from PWRITE3 (pario)
9
Standard (May 2003 Release)
DRIVER read ICs into CGRID begin output
timestep loop advstep (determine sync
timestep) couple begin sync
timestep loop SCIPROC
X-Y-Z advect adjadv
hdiff decouple
vdiff DRYDEP cloud
WETDEP gas chem
(aero VIS) couple
end sync timestep loop decouple
write conc and avg conc CONC, ACONC end
output timestep loop
10
AQF CMAQ
DRIVER set WORKERs, WRITER if WORKER,
read ICs into CGRID begin output timestep
loop if WORKER advstep
(determine sync timestep) couple
begin sync timestep loop
SCIPROC X-Y-Z advect
hdiff decouple
vdiff cloud
gas chem (aero)
couple end sync
timestep loop decouple
MPI send conc, aconc, drydep, wetdep, (vis)
if WRITER completion-wait for
conc, write conc CONC for aconc, write
aconc, etc. end output timestep loop

11
Power3 Cluster (NESCCs LPAR)
12
11
14
13
15
12
11
14
13
15
0
2
1
3
8
7
9
5
4
6
12
11
14
13
10
15
Memory
2X slower than NCEPs
Switch
The platform (cypress00, cypress01, cypress02)
consists of 3 SP-Nighthawk nodes
All cpus share user applications with file
servers, interactive use, etc.
12
Power4 p690 Servers (NCEPs LPAR)
0
3
2
1
0
3
2
1
0
3
2
1
0
3
2
1
Memory
Memory
Memory
Memory
2 colony SW connectors/node 2X performance of
cypress nodes
Switch
Each platform (snow, frost) is composed of 22
p690 (Regatta) servers Each server has 32 cpus
LPAR-ed into 8 nodes per server (4 cpus per
node) Some nodes are dedicated to file servers,
interactive use, etc. There are effectively 20
servers for general use (160 nodes, 640 cpus)
13
ice Beowulf Cluster Pentium 3 1.4
GHz
0
1
0
1
0
1
0
1
0
1
0
1
Memory
Memory
Memory
Memory
Memory
Memory
Internal Network
Isolated from outside network traffic
14
global MPICH Cluster Pentium 4 XEON 2.4
GHz
global
global1
global2
global3
global4
global5
0
1
0
1
0
1
0
1
0
1
0
1
Memory
Memory
Memory
Memory
Memory
Memory
Network
15
RESULTS
5 hour, 12Z 17Z and 24 hour, 12Z 12Z runs
20 Sept 2002 test data set used for developing
the AQF-CMAQ Input Met from ETA, processed thru
PRDGEN and PREMAQ
166 columns X 142 rows X 22 layers at 12 km
resolution
Domain seen on following slides
CB4 mechanism, no aerosols, Pleims Yamartino
advection for AQF-CMAQ
PPM advection for May 2003 Release
16
The Matrix
4 2 X 2
8 4 X 2
16 4 X 4
32 8 X 4
64 8 X 8
64 16 X 4
comparison of run times
run
comparison of relative wall times
wall
for main science processes
17
24Hour AQF vs. 2003 Release
2003 Release
AQF-CMAQ
Data shown at peak hour
18
24Hour AQF vs. 2003 Release
Less than 0.2 ppb diff between
Almost 9 ppb max diff between Yamartino and PPM
AQF on cypress and snow
19
AQF vs. 2003 Release
Absolute Run Times
24 Hours, 8 Worker Processors
Sec
20
AQF vs. 2003 Release on cypress and AQF on
snow
Absolute Run Times
24 Hours, 32 Worker Processors
Sec
21
AQF CMAQ Various Platforms
Relative Run Times
5 Hours, 8 Worker Processors
of Slowest
22
AQF-CMAQ cypress vs. snow
Relative Run Times 5 Hours
of Slowest
23
AQF-CMAQ on Various Platforms
Relative Run Times 5 Hours
of Slowest
Number of Worker Processors
24
AQF-CMAQ on cypress
Relative Wall Times 5 Hours

25
AQF-CMAQ on snow
Relative Wall Times 5 Hours

26
AQF vs. 2003 Release on cypress
Relative Wall Times 24 hr, 8 Worker Processors
PPM

Yamo
27
AQF vs. 2003 Release on cypress
Relative Wall Times 24 hr, 8 Worker Processors

Add snow for 8 and 32 processors
28
AQF Horizontal Advection
Relative Wall Times 24 hr, 8 Worker Processors

x-r-l
y-c-l
x-row-loop
y-column-loop
-hppm
kernel solver in each loop
29
AQF Horizontal Advection
Relative Wall Times 24 hr, 32 Worker Processors

30
AQF Release Horizontal Advection
Relative Wall Times 24 hr, 32 Worker Processors

r-
release
31
Future Work
Add aerosols back in
TKE vdiff
Improve horizontal advection/diffusion scalability
Some I/O improvements
Layer variable horizontal advection time steps
32
Aspect Ratios
71
71
142
83
41
42
166
35
35
36
36
21
20
42
41
33
Aspect Ratios
17
35
36
18
11
10
21
20
Write a Comment
User Comments (0)
About PowerShow.com