EFFECTIVE PARALLELIZATION OF A TURBULENT FLOW SIMULATION - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

EFFECTIVE PARALLELIZATION OF A TURBULENT FLOW SIMULATION

Description:

Department of Nuclear Engineering, Kyoto Univ. Contents. Background ... (nuclear fusion reactor and a chemical plant) Analysis method for turbulent flow ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 29
Provided by: masami8
Category:

less

Transcript and Presenter's Notes

Title: EFFECTIVE PARALLELIZATION OF A TURBULENT FLOW SIMULATION


1
EFFECTIVE PARALLELIZATION OF A TURBULENT FLOW
SIMULATION
  • Masami Takata,
  • Yoshinobu Yamamoto, Hayaru Shouno, Tomoaki
    Kunugi, Kazuki Joe
  • Graduate School of Human Culture, Nara Womens
    Univ.
  • Department of Nuclear Engineering, Kyoto Univ.

2
Contents
  • Background
  • Direct Numerical Simulation (DNS) of
  • free-surface turbulent flow
  • Parallelization methods
  • Evaluation of the parallelization methods
  • Conclusion

3
Background
  • Free-surface turbulent flow
  • The industrial devices
  • (nuclear fusion reactor and a chemical plant)
  • Analysis method for turbulent flow
  • Direct Numerical Simulations (DNSs)
  • A transformation into parallelized form

High grid density Huge Calculation time
ltFor distributed memory parallel
computersgt MPI(Message Passing Interface)
4
DNS of the turbulent flow (1)
  • Calculation conditions
  • Reynolds number 2270
  • Prandtl-Number 1
  • Grid(x,y,z) (64,82,64)

5
DNS of the turbulent flow (2)
  • The incompressible Navier-Stokes equation
  • Integration of the master equations
  • A fractional step method
  • Time integration
  • A second order Adams-Bashforth scheme
  • A Crank-Nicholson scheme
  • Spatial discretization
  • A second order central differencing scheme

6
DNS of the turbulent flow (3) The arrays for the
DNS
  • x,y,z Grid intervals and coordinates in
    the three-dimensional direction
  • dist x,dist y,dist z The temperatures
    in water surface or wall
  • u,v,w Flow velocities in x, y, and z
    directions
  • fu,fuo,fv,fvo,fw,fwo (convective
    term)(viscous term)
  • t The temperature
  • ft,fto (convective term)(viscous term)
  • p (pressure)/(density)

7
DNS of the turbulent flow (4) the program flow
of the DNS
Program flow
dist x
dist y
dist z
x
y
z
u
v
w
p
t
fu,fuo,fv,fvo, fw,fwo,t,ft,fto
File input (u,v,w,p)
File output
iteration
dist x dist y dist z x,y,z u,v,w fufuo fvfvo f
wfwo p,t ftfto
fu
fv
fw
t
u
v
w
fuo
fvo
fwo
u
v
w
output
u
v
w
p
output
ft
fto
t
8
Parallelization method
  • Parallelized program
  • Data distribution
  • The minimum data communication
  • MPI synchronous protocol
  • asynchronous protocol

9
Parallelization method 1 Data communication
protocols (1)
  • Synchronous protocol
  • Processors for receive operations are suspended
    until the completion of communications.
  • A processor can use
  • only one communication protocol exclusively.

Receive operation
Completion of the communications
10
Parallelization method 1 Data communication
protocols (2)
  • Asynchronous protocol
  • Send and receive operations can be executed
    independently.
  • A processor can use
  • several communication protocols simultaneously.

receive operation
Completion of the communications
11
Parallelization method 1 Data communication
protocols (3)
  • Asynchronous protocol
  • MPI_ALLREDUCE
  • A function for returning the results of
    reduce-operations to all processors in a
    communication group
  • MPI_BCAST
  • A broadcast function

12
Parallelization method 2 (1)
10 i 1,100000 10 x(i) i min
y(1) 20 j 2,100000 if(min .gt. y(j))
then min y(j) endif 20 continue 30 k
1,100000 30 z(k) z(k)k 40 i
1,100000 40 w(i) x(I(i) )min
do do do do
13
Parallelization method 2 (2)
  • The number of synchronous communications is two
    for each processor.

14
Parallelization method 2 (3)
  • The number of asynchronous communications is
    eight for each processor.

No asynchronous communication for global reduce
operations
15
Parallelization method 2 (4)
  • The number of asynchronous communications is at
    most three.

With partial strip mining
16
Parallelization method 2 (5)
17
Parallelization method 2 The arrays for the
DNS
  • x,y,z Grid intervals and coordinates in
    the three-dimensional direction
  • dist x,dist y,dist z The temperatures
    in water surface or wall
  • u,v,w Flow velocities in x, y, and z
    directions
  • fu,fuo,fv,fvo,fw,fwo (convective
    term)(viscous term)
  • t The temperature
  • ft,fto (convective term)(viscous term)
  • p (pressure)/(density)

18
Parallelization method 2 the initialization
part for the DNS
Program flow
fu,fuo,fv,fvo, fw,fwo,t,ft,fto
dist x
dist y
dist z
x
y
z
u
v
w
p
t
File input (u,v,w,p)
File output
19
Parallelization method 2 the calculation
part for the DNS
Program flow
iteration
fu
fv
fw
t
u
v
w
fuo
fvo
fwo
u
v
w
output
u
v
w
p
output
ft
fto
t
dist x dist y dist z x,y,z u,v,w fufuo fvfvo f
wfwo p,t ftfto
20
Parallelization method 3 The conjugate
residual method in Poisson equation lt1gt
(subroutine press)
  • Usage an array p and some local arrays
  • Dependencies exist in decomposed arrays at the
    boundaries.
  • If parallelization method (2) is
    adopted, MPI_WAIT (that makes a processor be
    suspended until the completion of the
    asynchronous communication) causes large waiting
    time.
  • Two partitioning methods for subroutine press

21
Parallelization method 3 subroutine press
lt2gt
  • In the method A(target eight processors)
  • A processor subroutine norm (that returns
    the maximum element in a given array)
  • The rest seven processors the calculation
    for update
  • The disadvantage !
  • The assignment of each array to seven processors
  • (the number of array elements is defined with a
    multiple of two)
  • Programmers development efforts increase.
    (the communication number increase)

22
Parallelization method 3 subroutine press
lt3gt
  • In the method B
  • Strip mining to all the processors
  • Required synchronous communications
  • MPI_ALLREDUCE (with synchronous protocol)
  • (Function for returning the results of
    reduce-operations to all processors in a
    communication group.)
  • subroutine norm (that returns the maximum element
    in a given array)
  • subroutine inprod (that returns the total of the
    elements of a given array)
  • MPI_BCAST (with synchronous protocol)
  • The array p calculated by each processor must be
    broadcast after the execution of subroutine press.

23
Parallelization method 3 subroutine press
lt4gt
  • The characteristic
  • It mainly consists of a series of doall
    statements
  • Strip mining for all available processors
  • The synchronous communications do not cause too
    much overhead
  • (each doall statement requires about the same
  • calculation time)
  • The method B for subroutine press is adopted.

24
Evaluation of the parallelization methods 1
  • Using the MPI
  • Parallelized programs for four and eight
    processors
  • For the experiments
  • Sparc SUN Workstation Ultra-2(SunOS 5.6)
  • Memory capacity of 512MB
  • LAN using 100base-TX as a communication media

25
Evaluation of the parallelization methods 2
26
The Effectiveness
  • A few ideal (waiting) time
  • It corresponds mainly to the file I/O time for
    the calculation results.
  • In reducing execution time
  • Theoretically
  • (the execution time of the sequential
    program)/(number of processors)
  • Actually, waiting and system time increase.
  • Because of the communication overhead
  • 4 processors lt28.7gt
  • 8 processors lt15.79gt

Effective extremely
27
Conclusion
  • For a direct numerical simulation
  • Using MPI
  • Proposed parallelization method
  • The parallelization methods are more effective
    when the number of processors becomes a multiple
    of four.
  • The execution time of the parallelized programs
    for four and eight processors was decreased to
    1/4 and 1/8 of the original sequential program
    respectively.

28
Future work
  • Whether the parallelization methods are effective
    with more processors?
  • Parallelization methods when physical memory
    capacity is smaller than the total data set size.
Write a Comment
User Comments (0)
About PowerShow.com