1d - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

1d

Description:

RAGS=ROG(IJ)/P(IJ) RIGHT=1.D0. LEFT=1.D0. TOP=1.D0. BOT=1.D0. NFLR=FL(IPJ) NFLL=FL(IMJ) ... Full 3D simulation 60 day / 64 CPU. of one eruption scenario ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 23
Provided by: carloca5
Learn more at: http://www.spscicomp.org
Category:
Tags:

less

Transcript and Presenter's Notes

Title: 1d


1
High Performance Computing Code for Pyroclastic
Flows Simulations
  • Carlo Cavazzoni, Giovanni Erbacci, Silvia
    Baseggio(High Performance Computing Group
    CINECA)
  • Tomaso Esposti Ongaro, Augusto Neri
  • (INGV)

2
Outline
  • Pyroclastic Dispersion Analysis Code (pdac)
  • Fortran90 implementation
  • Parallel implementation
  • 3D simulations
  • Performances, Optimizations
  • Conclusions

3
EXPLORIS
Explosive Eruption Risk and Decision Support for
EU Populations Threatened by Volcanoes
http//exploris.pi.ingv.it/
4
Pyroclastic Flows
Montserrat
Mounth St. Helens
Vesuvius
5
Fluidodynamic simulation of a multiphase,
multicomponent compressible fluid
rectangular non-uniform grid, topography, gas and
particle fields
6
Time evolution
s(tDt) F( s( t ), s( t-Dt ), .. )
System status at time tDt is a Function of the
System at previous time values t, t-Dt, ecc
Usually we use 1st order scheme, dependence
only on s(t)
7
PDAC - Simplified Code Flow Chart
start
compute physical quantities at time t
read input
print output
setup
compute explicit fluxes
t t Dt
iterative solver
compute enthalpy
t tstop ?
stop
yes
no
8
Code Restructuring
code rewritten using F90 with a modular approach
drawback less performance
get rid of the complexity more general and
readable
code parallelization use many CPUs
code ported to many architectures
optimizations recover F77 efficiency
from 2D to 3D
9
Use of F90 / Modular Code
  • Better control over data structures
  • Easier to add new features / extensions
  • More readable
  • Easier to mantain
  • less performances respect F77

10
SUBROUTINE BETAS C INCLUDE
'pdac2d.com' REAL8 LEFT C PARAMETER
(DELG1.D-8) DO 10 J2,JB1 DO 10
I2,IB1 IJI(J-1)IB2
IF(FL(IJ).NE.1) GOTO 10 CALL SUBSC
DRPDR(I)DR(I1) DRMDR(I)DR(I-1)
DZPDZ(J)DZ(J1) DZMDZ(J)DZ(J-1)
RAGSROG(IJ)/P(IJ) RIGHT1.D0
LEFT1.D0 TOP1.D0 BOT1.D0
NFLRFL(IPJ) NFLLFL(IMJ)
NFLTFL(IJP) NFLBFL(IJM) GOTO
(2,1,1,2,1),NFLR 1 RIGHT0.D0 2 GOTO
(4,3,3,4,3),NFLL 3 LEFT0.D0 4 GOTO
(6,5,5,6,5),NFLT 5 TOP0.D0 6 GOTO
(8,7,7,8,7),NFLB 7 BOT0.D0 8
CONTINUE CONV(IJ)DELGRGP(IJ)
RBETAEP(IJ)RAGSDTDT/DZ(J)
((DZ(J1)EP(IJ)DZ(J)EP(IJT))/DZP/DZP2.D0TOP
(DZ(J-1)EP(IJ)DZ(J)EP(IJB))/DZM/DZM
2.D0BOT) DTDT/R(I)/DR(I)
(RB(I)(DR(I1)EP(IJ)DR(I)EP(IJR))/DRP/DRP2.
D0RIGHT RB(I-1)(DR(I-1)EP(IJ)DR(I)
EP(IJL))/DRM/DRM2.D0LEFT)
ABETA(IJ)1.D0/RBETA 10 CONTINUE C
RETURN END C
Typical F77 Subroutine
note the unreadable goto!! Its hard to
understand what the code does
11
Typical F90 Subroutine
include commons replaced by USE
SUBROUTINE betas ! USE grid, ONLY fl
USE dimensions USE eosg_module, ONLY
rags USE gas_solid_density, ONLY rog, rgp
USE grid, ONLY r, rb, dr, dz, ib2, ib1,
jb1 USE pressure_epsilon, ONLY p, ep
USE subsc_module, ONLY subsc,ipj,imj,ijp,ijm,ijr,
ijl,ijt,ijb USE time_PARAMETERs, ONLY dt,
time IMPLICIT NONE ! REAL8
rbtop, rbleft, rbright, rbbot, rbeta REAL8
dt2r, dt2z, dzp, drm, drp, dzm REAL8
indrp, indrm, indzp, indzm INTEGER
nflb, nflt, nfll, nflr INTEGER i, j,
ij ! REAL8, PARAMETER delg1.D-8
DO j2,jb1 DO i2,ib1
iji(j-1)ib2 IF(fl(ij).EQ.1) THEN
CALL subsc(ij)
drpdr(i)dr(i1) drmdr(i)dr(i-1)
dzpdz(j)dz(j1)
dzmdz(j)dz(j-1) ! indrp1.D0/drp
indrm1.D0/drm indzp1.D0/dzp
indzm1.D0/dzm nflrfl(ipj)
nfllfl(imj) nfltfl(ijp)
nflbfl(ijm)
IF( (nflr .NE. 1) .AND.
(nflr .NE. 4) ) THEN rbright 0.D0
ELSE rbright (
dr(i1)ep(ij)dr(i)ep(ijr))indrpindrp2.D0
END IF IF( (nfll .NE. 1) .AND.
(nfll .NE. 4) ) THEN rbleft 0.D0
ELSE rbleft (
dr(i-1)ep(ij)dr(i)ep(ijl) )indrmindrm2.D0
END IF IF( (nflt .NE. 1)
.AND. (nflt .NE. 4) ) THEN rbtop
0.0D0 ELSE rbtop (
dz(j1)ep(ij)dz(j)ep(ijt) ) indzpindzp2.D0
END IF IF( (nflb .NE. 1)
.AND. (nflb .NE. 4) ) THEN
rbbot0.D0 ELSE rbbot (
dz(j-1)ep(ij)dz(j)ep(ijb) ) indzmindzm2.D0
END IF conv(ij) delg
rgp(ij) dt2z dt dt indz(j)
dt2r dt dt inr(i) indr(i)
rags rog(ij) / p(ij) rbeta ep(ij)
rags dt2z ( rbtop rbbot )
dt2r ( rb(i) rbright rb(i-1)
rbleft ) abeta(ij) 1.D0 / rbeta
END IF END DO END DO
RETURN END SUBROUTINE
goto replaced by structured if statements
the subroutine is longer, but by far more readable
12
Parallel Code
  • Domain decomposition / MPI
  • Control/Mapping data structures
  • hiding of communication controls
  • code I/O redesigned
  • overhead and less performances

13
Parallelization Strategies
The main strategy is to define sub grids and
distribute them to the available processors
intermediate results on local data need to be
exchanged across domain boundaries the solve the
equations defined in the global domain.
Block like distribution less overall
communication
layer like distribution, comunication only with
nearest neighbours
14
Efficiency of the 2D Parallel Code
coarse grid 45 35
fine grid 225 150
15
From 2D to 3D
  • Extension (2D-gt3D) of data structures
  • indexes
  • Subroutines and Modules generalized
  • 2D simulations mantained in 3D codes

16
ipjk myijk( ip1_jp0_kp0_ , ijk )
imjk myijk( im1_jp0_kp0_ , ijk )
ippjk myijk( ip2_jp0_kp0_ , ijk )
immjk myijk( im2_jp0_kp0_ , ijk ) ijpk
myijk( ip0_jp1_kp0_ , ijk ) ipjpk
myijk( ip1_jp1_kp0_ , ijk ) imjpk
myijk( im1_jp1_kp0_ , ijk ) ijmk
myijk( ip0_jm1_kp0_ , ijk ) ipjmk
myijk( ip1_jm1_kp0_ , ijk ) imjmk
myijk( im1_jm1_kp0_ , ijk ) ijppk
myijk( ip0_jp2_kp0_ , ijk ) ijmmk
myijk( ip0_jm2_kp0_ , ijk ) ijkp
myijk( ip0_jp0_kp1_ , ijk ) ipjkp
myijk( ip1_jp0_kp1_ , ijk ) imjkp
myijk( im1_jp0_kp1_ , ijk ) ijpkp
myijk( ip0_jp1_kp1_ , ijk ) ijmkp
myijk( ip0_jm1_kp1_ , ijk ) ijkm
myijk( ip0_jp0_km1_ , ijk ) ipjkm
myijk( ip1_jp0_km1_ , ijk ) imjkm
myijk( im1_jp0_km1_ , ijk ) ijpkm
myijk( ip0_jp1_km1_ , ijk ) ijmkm
myijk( ip0_jm1_km1_ , ijk ) ijkpp
myijk( ip0_jp0_kp2_ , ijk ) ijkmm
myijk( ip0_jp0_km2_ , ijk ) ijke
myinds( ip1_jp0_kp0_ , ijk ) ijkw
myinds( im1_jp0_kp0_ , ijk ) ijkee
myinds( ip2_jp0_kp0_ , ijk ) ijkww
myinds( im2_jp0_kp0_ , ijk ) ijkn
myinds( ip0_jp1_kp0_ , ijk ) ijken
myinds( ip1_jp1_kp0_ , ijk ) ijkwn
myinds( im1_jp1_kp0_ , ijk ) ijks
myinds( ip0_jm1_kp0_ , ijk ) ijkes
myinds( ip1_jm1_kp0_ , ijk ) ijkws
myinds( im1_jm1_kp0_ , ijk ) ijknn
myinds( ip0_jp2_kp0_ , ijk ) ijkss
myinds( ip0_jm2_kp0_ , ijk ) ijkt
myinds( ip0_jp0_kp1_ , ijk ) ijket
myinds( ip1_jp0_kp1_ , ijk ) ijkwt
myinds( im1_jp0_kp1_ , ijk ) ijknt
myinds( ip0_jp1_kp1_ , ijk ) ijkst
myinds( ip0_jm1_kp1_ , ijk ) ijkb
myinds( ip0_jp0_km1_ , ijk ) ijkeb
myinds( ip1_jp0_km1_ , ijk ) ijkwb
myinds( im1_jp0_km1_ , ijk ) ijknb
myinds( ip0_jp1_km1_ , ijk ) ijksb
myinds( ip0_jm1_km1_ , ijk ) ijktt
myinds( ip0_jp0_kp2_ , ijk ) ijkbb
myinds( ip0_jp0_km2_ , ijk )
Complexity increases (i.e. cell neighbours
indexes)
2D serial
2D parallel
3D parallel
! ipjij1 ijpijib2 imjij-1
ijmij-ib2 imjmijm-1
ipjmipj-ib2 ipjpijp1
imjpimjib2 ! ijrinds(ij,1)
ijlinds(ij,2) ijtinds(ij,3)
ijbinds(ij,4) ijtrinds(ij,5)
ijtlinds(ij,6) ijbrinds(ij,7)
ijrrinds(ij,8) ijttinds(ij,9)
ijkm myijk( ip0_jp0_km1_, ijk )
imjk myijk( im1_jp0_kp0_, ijk ) ipjk
myijk( ip1_jp0_kp0_, ijk ) ijkp
myijk( ip0_jp0_kp1_, ijk ) ipjkm myijk(
ip1_jp0_km1_, ijk ) ipjkp myijk(
ip1_jp0_kp1_, ijk ) imjkm myijk(
im1_jp0_km1_, ijk ) imjkp myijk(
im1_jp0_kp1_, ijk ) ijkpp myijk(
ip0_jp0_kp2_, ijk ) ippjk myijk(
ip2_jp0_kp0_, ijk ) immjk myijk(
im2_jp0_kp0_, ijk ) ijkmm myijk(
ip0_jp0_km2_, ijk ) ijke
myinds(ip1_jp0_kp0_, ijk ) ijkt
myinds(ip0_jp0_kp1_, ijk ) ijkw
myinds(im1_jp0_kp0_, ijk ) ijkb
myinds(ip0_jp0_km1_, ijk ) ijket
myinds(ip1_jp0_kp1_, ijk ) ijkwt
myinds(im1_jp0_kp1_, ijk ) ijkeb
myinds(ip1_jp0_km1_, ijk ) ijkwb
myinds(im1_jp0_km1_, ijk ) ijkee
myinds(ip2_jp0_kp0_, ijk ) ijktt
myinds(ip0_jp0_kp2_, ijk ) ijkww
myinds(im2_jp0_kp0_, ijk ) ijkbb
myinds(ip0_jp0_km2_, ijk )
computational grid
ijp
ij
ipj
imj
ijm
17
Code optimizations
recover the performances of the F77 original code
Slower
2
1.8
1.6
1.4
1.2
Execution time (relative to F77 code)
1
0.8
0.6
0.4
Faster
0.2
0
F77 Serial
F90 Serial
2D Parallel
3D Parallel
Unrolling
Memory
Parameter
access
dependent
sub.
code evolution
18
IF( a(1,1) / 0.D0 ) THEN div
1.D0 / a(1,1) a(1,2) a(1,2) div
a(1,3) a(1,3) div b(1)
b(1) div a(1,1) 0.D0
!li2 amul a(2,1)
a(2,2) a(2,2) - amul a(1,2)
a(2,3) a(2,3) - amul a(1,3)
b(2) b(2) - amul b(1) !li3
amul a(3,1) a(3,2)
a(3,2) - amul a(1,2) a(3,3)
a(3,3) - amul a(1,3) b(3)
b(3) - amul b(1) END IF IF( a(2,2) / 0.D0 )
THEN div1.D0/a(2,2)
a(2,3)a(2,3)div
b(2)b(2)div a(2,2)0.D0
!li1 amula(1,2)
a(1,3)a(1,3)-amula(2,3)
b(1)b(1)-amulb(2) !li3
amula(3,2) a(3,3)a(3,3)-amula(2
,3) b(3)b(3)-amulb(2) END IF IF(
a(3,3) / 0.D0 ) THEN
div1.D0/a(3,3) b(3)b(3)div
a(3,3)0.D0 !li1
amula(1,3) b(1)b(1)-amulb(3)
!li2 amula(2,3)
b(2)b(2)-amulb(3) END IF
Example of optimization
DO l1,nphase IF(au1(l,l) /
0.D0) THEN lp1l1
div1.D0/au1(l,l) DO ljlp1,nphase
au1(l,lj)au1(l,lj)div
END DO bu1(l)bu1(l)div
au1(l,l)0.D0 DO li1,nphase
amulau1(li,l) DO
ljlp1,nphase au1(li,lj)au1(li,lj
)-amulau1(l,lj) END DO
bu1(li)bu1(li)-amulbu1(l) END DO
END IF END DO
loop unrolling, parameters dependend code
19
we gain a factor of 2!
20
Parallel Performances
Computational Grid 90x90x70
PDAC before optimizations
PDAC after optimizations
16
16
14
14
12
12
ideal speed-up
ideal speed-up
10
10
Speed Up
Speed Up
8
8
6
6
4
4
2
2
0
0
0
2
4
6
8
10
12
14
16
0
2
4
6
8
10
12
14
16
Processors
Processors
no parallel performance loss due to optimizations
21
3D Simulation Run
Fine Grid 2563 cells 1 simulation step 30
sec / 64 CPU 30 minutes of real 180000
steps simulation time Full 3D simulation 60 day
/ 64 CPU of one eruption scenario code after
optimization 30 day / 64 CPU and tuning. next
computer 7 day / 64 CPU (computer power
double every 18 months, Moore law)
22
Conclusion
Im really excited about this project and before
the end of this year we will run the first 3D
simulation of an eruption scenario
Real and Syntetic Vesuvius
Write a Comment
User Comments (0)
About PowerShow.com