EFFECTIVE PARALLELIZATION OF A TURBULENT FLOW SIMULATION - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

EFFECTIVE PARALLELIZATION OF A TURBULENT FLOW SIMULATION

Description:

Department of Nuclear Engineering, Kyoto Univ. Contents. Background ... (nuclear fusion reactor and a chemical plant) Analysis method for turbulent flow ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 29

Provided by: masami8

Category:

more less

Transcript and Presenter's Notes

Title: EFFECTIVE PARALLELIZATION OF A TURBULENT FLOW SIMULATION

1
EFFECTIVE PARALLELIZATION OF A TURBULENT FLOW
SIMULATION

Masami Takata,
Yoshinobu Yamamoto, Hayaru Shouno, Tomoaki
Kunugi, Kazuki Joe
Graduate School of Human Culture, Nara Womens
Univ.
Department of Nuclear Engineering, Kyoto Univ.

2
Contents

Background
Direct Numerical Simulation (DNS) of
free-surface turbulent flow
Parallelization methods
Evaluation of the parallelization methods
Conclusion

3
Background

Free-surface turbulent flow
The industrial devices
(nuclear fusion reactor and a chemical plant)
Analysis method for turbulent flow
Direct Numerical Simulations (DNSs)
A transformation into parallelized form

High grid density Huge Calculation time
ltFor distributed memory parallel
computersgt MPI(Message Passing Interface)
4
DNS of the turbulent flow (1)

Calculation conditions
Reynolds number 2270
Prandtl-Number 1
Grid(x,y,z) (64,82,64)

5
DNS of the turbulent flow (2)

The incompressible Navier-Stokes equation
Integration of the master equations
A fractional step method
Time integration
A second order Adams-Bashforth scheme
A Crank-Nicholson scheme
Spatial discretization
A second order central differencing scheme

6
DNS of the turbulent flow (3) The arrays for the
DNS

x,y,z Grid intervals and coordinates in
the three-dimensional direction
dist x,dist y,dist z The temperatures
in water surface or wall
u,v,w Flow velocities in x, y, and z
directions
fu,fuo,fv,fvo,fw,fwo (convective
term)(viscous term)
t The temperature
ft,fto (convective term)(viscous term)
p (pressure)/(density)

7
DNS of the turbulent flow (4) the program flow
of the DNS
Program flow
dist x
dist y
dist z
x
y
z
u
v
w
p
t
fu,fuo,fv,fvo, fw,fwo,t,ft,fto
File input (u,v,w,p)
File output
iteration
dist x dist y dist z x,y,z u,v,w fufuo fvfvo f
wfwo p,t ftfto
fu
fv
fw
t
u
v
w
fuo
fvo
fwo
u
v
w
output
u
v
w
p
output
ft
fto
t
8
Parallelization method

Parallelized program
Data distribution
The minimum data communication
MPI synchronous protocol
asynchronous protocol

9
Parallelization method 1 Data communication
protocols (1)

Synchronous protocol
Processors for receive operations are suspended
until the completion of communications.
A processor can use
only one communication protocol exclusively.

Receive operation
Completion of the communications
10
Parallelization method 1 Data communication
protocols (2)

Asynchronous protocol
Send and receive operations can be executed
independently.
A processor can use
several communication protocols simultaneously.

receive operation
Completion of the communications
11
Parallelization method 1 Data communication
protocols (3)

Asynchronous protocol
MPI_ALLREDUCE
A function for returning the results of
reduce-operations to all processors in a
communication group
MPI_BCAST
A broadcast function

12
Parallelization method 2 (1)
10 i 1,100000 10 x(i) i min
y(1) 20 j 2,100000 if(min .gt. y(j))
then min y(j) endif 20 continue 30 k
1,100000 30 z(k) z(k)k 40 i
1,100000 40 w(i) x(I(i) )min
do do do do
13
Parallelization method 2 (2)

The number of synchronous communications is two
for each processor.

14
Parallelization method 2 (3)

The number of asynchronous communications is
eight for each processor.

No asynchronous communication for global reduce
operations
15
Parallelization method 2 (4)

The number of asynchronous communications is at
most three.

With partial strip mining
16
Parallelization method 2 (5)
17
Parallelization method 2 The arrays for the
DNS

x,y,z Grid intervals and coordinates in
the three-dimensional direction
dist x,dist y,dist z The temperatures
in water surface or wall
u,v,w Flow velocities in x, y, and z
directions
fu,fuo,fv,fvo,fw,fwo (convective
term)(viscous term)
t The temperature
ft,fto (convective term)(viscous term)
p (pressure)/(density)

18
Parallelization method 2 the initialization
part for the DNS
Program flow
fu,fuo,fv,fvo, fw,fwo,t,ft,fto
dist x
dist y
dist z
x
y
z
u
v
w
p
t
File input (u,v,w,p)
File output
19
Parallelization method 2 the calculation
part for the DNS
Program flow
iteration
fu
fv
fw
t
u
v
w
fuo
fvo
fwo
u
v
w
output
u
v
w
p
output
ft
fto
t
dist x dist y dist z x,y,z u,v,w fufuo fvfvo f
wfwo p,t ftfto
20
Parallelization method 3 The conjugate
residual method in Poisson equation lt1gt
(subroutine press)

Usage an array p and some local arrays
Dependencies exist in decomposed arrays at the
boundaries.
If parallelization method (2) is
adopted, MPI_WAIT (that makes a processor be
suspended until the completion of the
asynchronous communication) causes large waiting
time.
Two partitioning methods for subroutine press

21
Parallelization method 3 subroutine press
lt2gt

In the method A(target eight processors)
A processor subroutine norm (that returns
the maximum element in a given array)
The rest seven processors the calculation
for update
The disadvantage !
The assignment of each array to seven processors
(the number of array elements is defined with a
multiple of two)
Programmers development efforts increase.
(the communication number increase)

22
Parallelization method 3 subroutine press
lt3gt

In the method B
Strip mining to all the processors
Required synchronous communications
MPI_ALLREDUCE (with synchronous protocol)
(Function for returning the results of
reduce-operations to all processors in a
communication group.)
subroutine norm (that returns the maximum element
in a given array)
subroutine inprod (that returns the total of the
elements of a given array)
MPI_BCAST (with synchronous protocol)
The array p calculated by each processor must be
broadcast after the execution of subroutine press.

23
Parallelization method 3 subroutine press
lt4gt

The characteristic
It mainly consists of a series of doall
statements
Strip mining for all available processors
The synchronous communications do not cause too
much overhead
(each doall statement requires about the same
calculation time)
The method B for subroutine press is adopted.

24
Evaluation of the parallelization methods 1

Using the MPI
Parallelized programs for four and eight
processors
For the experiments
Sparc SUN Workstation Ultra-2(SunOS 5.6)
Memory capacity of 512MB
LAN using 100base-TX as a communication media

25
Evaluation of the parallelization methods 2
26
The Effectiveness

A few ideal (waiting) time
It corresponds mainly to the file I/O time for
the calculation results.
In reducing execution time
Theoretically
(the execution time of the sequential
program)/(number of processors)
Actually, waiting and system time increase.
Because of the communication overhead
4 processors lt28.7gt
8 processors lt15.79gt

Effective extremely
27
Conclusion

For a direct numerical simulation
Using MPI
Proposed parallelization method
The parallelization methods are more effective
when the number of processors becomes a multiple
of four.
The execution time of the parallelized programs
for four and eight processors was decreased to
1/4 and 1/8 of the original sequential program
respectively.

28
Future work

Whether the parallelization methods are effective
with more processors?
Parallelization methods when physical memory
capacity is smaller than the total data set size.

Write a Comment

User Comments (0)