Title: Fast Parallel Blood Flow Simulator
1Fast Parallel Blood Flow Simulator
Modeling and Computational Science Lab
Bilel Hadri University of Houston, Computer
Science Department Advisor Marc Garbey
2Contents
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
- Conclusions
3 Introduction and Motivations
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
4Facts
- ?Cardiovascular Disease is the number one cause
of death and disability in the US and Europe.
(37.3 of all death in the US) - ? In the United States, one person dies very 35
seconds from heart disease, Aneurysm, stenos are
the main cardiovascular problems -
- ? Need to make early diagnostic
- ? These diseases are linked to the hemodynamic
properties of the blood flow. - ? Development of Image-Based Computational
Simulation of Flow Dynamics
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
5Goal
From an angiogram get the image segmentation and
the flow simulation .
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
? Need to design a real time simulation
6Goal
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
- ? Goal
- providing a fast hemodynamic simulator
- assist endovascular surgeons in their decision
process - build a large data base of medical data
7Methodology
- ? Fact The most consuming part of a code
is the resolution of some linear system - ? Focus Fast elliptic solver For
Incompressible Navier-Stokes Flow code - ? context - Finite Volume
- - mesh topologically equivalent
to Cartesian mesh, - distributed
computing with high latency network. - ? Method - L2 penalty method for a fast
prototyping to the NS flow - - Level set method
- - efficient subdomain solver (LU,
Krylov, Multigrid) - - Aikten Schwarz is a domain
decomposition technique designed for distributed
computing with slow network.
- Overview on UH
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
8 Formulation of Navier Stokes
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
9Navier Stokes Formulation
- Incompressible NS flow in large Vessels
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
- ? ?w solid wall
- L2 Penalty method ? ltlt1. ref( Caltagirone 84,
Bruneau et al 99, Schneider et al 2005). - ? ? is a mask function provided by by a level set
method used in the image segmentation of the
blood vesel
10Navier Stokes Numerical Approximation
- Time Step Projection Method (Chorin)
- Step 1 Prediction of the velocity ûk1 by
solving either
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
Step 2 Projection of the predicted velocity to
the space of divergence free functions
Momentum equation
Pressure equation
89.9
? Focus design of the optimum solver
11 Design of the Elliptic Solver
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
12Additive Schwarz
G1 G 2
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
Additive Schwarz algorithm 1/ Solve in each
domain 2/ Update the solution in the interface 3/
Repeat until convergence
O1 O2
O
If the approximation of the operator follows
the maximum principal, then Additive Schwarz is
converging and is a robust solver, however the
convergence is slow ! Aikten like Acceleration
method allows to accelerate this convergence.
Ref (Garbey-Tromeur Dervout 2002)
13Aitken Schwarz Algorithm
M. Garbey and D. Tromeur Dervout "On some Aitken
like acceleration of the Schwarz Method,
International Journal for Numerical Methods in
Fluids. Vol. 40(12),pp 1493-1513, 2002. Aitken
Schwarz is a domain decomposition method using
the framework of Additive Schwarz and based on an
approximate reconstruction of the dominant
eigenvectors of the trace transfer
operator. Thanks to the IBM, regular Cartesian
grid can be used. ?We get a direct solver since
the eigenvectors are known analytically. Algorith
m Step1 -apply additive Schwarz with a
subdomain solver Step 2 - compute the sine
(or cosine) expansion of the traces on the
artificial interface for the initial boundary
condition u0G and the solution given by on
Schwarz iterative u1G - apply generalized Aitken
acceleration to get u8G - recompose the trace in
physical space Step 3 -Compute in parallel
the solution in each subdomain, with the new
inner BCs u8G.
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
14Aitken Schwarz Algorithm
- Main advantages of the Aitken Schwarz method
- Arithmetic complexity of the order of the
arithmetic complexity - the fast subdomain solver the number of
subdomains. - Scale on a parallel computer for moderate number
of subdomains. - Good speedup on Beowulf clusters with Gigabit
ethernet switch. - Easy to implement.
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
15How to solve a linear system ?
- Many approaches to solve a linear Axb
- direct solvers
- Krylov solvers
- multigrid
-
-
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
? Need an interface to help the user
16Subdomain interface
- ? This interface gathers Lapack, Sparskit and
Hypre -
-
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
call interface_solver (A,RHS,u,choice_solveur,nx,n
y,options)
Size of the problem
RHS
Choice of the solver
flag for preconditioning and more.
1/ Lapack (LU) 2/ Sparskit (Krylov) 3/ Hypre
(Multi grid)
solution
pentadiagonal matrix
? How to choose the fastest solver ?
17 Performance Analysis (single processor)
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
? for choosing the fastest solver
18Surface Response
- Build a model prediction from least square
quadratic polynomial approximation based on few
runs. - ? Goal
- Predict the behavior for various subdomains sizes
- Provide an indicator on the reliability of the
model - Elapsed time T, depending on the size
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
Nx
Ny
19Performance of subdomain solvers with an
incompressible flow in a curved pipe
- BL1 and BL2 fit the wall and have orthogonal
meshes to approximate the boundary layer. - The domain denoted RD for the central part of the
pipe is polygonal and it is overlapping the
boundary subdomains by few mesh cells. - This is basically a Chimera approach that is
convenient to compute fluid structure interaction.
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
Composite Mesh for the curved pipe flow problem.
20Performance
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
Comparison of the elapsed time for each
sub-domain with preconditioning ( the left
graphic) and with a precomputed preconditioner
(right graphic) for the curved pipe flow problem
- The optimum choice of the solver for each
subdomain depends on - the type of subdomain,
- the fact that one reuse or not the same
preconditioner or decomposition - the architecture of the processor,
- the size of the problem.
- The choice of the wrong solver for a specific
domain can slow down the computation.
21 Systematic performance computation with Lapack,
Sparskit and Hypre as a function of the grid size
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
Hypre
The surface is very smooth for Lapack and
Sparkit while for Hypre, there are a lot of
variation due to the high sensitivity of
algebraic multigrid to grid sizes
22Predictions
- From the performance evaluation , the
elapsed time depends on - - the size
- - the boundary condition
- - the architecture of the machine
- How can we choose the best solver ?
- Regression along 9 points to get a model
use that a least square quadratic polynomial
approximation
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
23Comparison between different solvers
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
For small size problem, it is better to solve
the linear system with the LU decomposition
because it is faster than BICGTAB and AMG-GMRES
for the Porous Environment problem.
24Comparison between different architectures
Dual processors AMD 1800 Athlon with 2GB of RAM
Dual processors 900 MHz Itanium2 with 3 GB of RAM
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
For the same problem, the elapsed time is not
the same on two different architecture. The
region where BICGSTAB is faster , is not the
same. It depends on the architecture of the
computer.
25- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
Parallel Efficiency Results
26Performance of Aitken Schwarz
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
Speedup of the AS solved with LU
Speedup of the AS solved with GMRES
Aitken Schwarz performs very well on small
problems. Further, the Krylov method seems to be
more sensitive to the cache effect, since we have
a superlinear speedup.
? Does the prediction model apply for the
parallel runs ?
27Parallel subdomain solver performance
Let us consider the Poisson problem and study the
scalability
Elapsed time depending on the solver
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
- This prediction is correct for the 2 processors
computation. However as the number of processors
grows, this prediction is slightly incorrect, and
one should favor the Krylov solver.
Prediction of the best subdomain solver. LU
decomposition is faster in the light blue while
Sparskit with GMRES is faster in the dark blue.
? Surface response modeling may requiresa 3rd
dimension ( number of processors)
28Surface response depending on the processors
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
? For the same number of unknowns, Krylov solver
seems to be faster for small size when the number
of processors is increased.
29Comparison AS with PETSc
Speedup
Time
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
? PETSc is faster than AS with 2 and 3
processors. As the number of processors
increases - the PETScs multigrid solver does
not speed up well, while AS is performing better.
- AS gives a better elapsed time than the
multigrid solver . ? For simple problems , and
with high latency network, the AS algorithm is
very efficient. ? Best compromise with PETSc
to solve each subdomain
30Comparison AS and SuperLU
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
Comparison of the Elpased time between Super LU
and AS with either LU or GMRES depending on the
number of processor and for small, medium and
large size problem
31- Overview on UH
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
- Overview on UH
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
NS Applications
32NS benchmark
Benchmark scheme
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
Contour of u
Contour of v
? Results checked by ADINA
33Performance on the Itanium2
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
Scalability Performance
Speedup Performance
34NS 3D applications
Geometry of a Bended Artery
- Overview on UH
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
Velocity and Pressure
35NS 3D applications carotid
- Overview on UH
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
Geometry of the carotid
Pressure in the carotid
Velocity field
36NS3D sequential Results Fortran
- Overview on UH
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
Elapsed time in second for one step time
depending on the number of subdomains and the
grid size
37NS3D sequential Results Analysis
? 250 000 unknowns 0.5s /Time step ?
2 000 000 unknowns 5s /Time step ? 16 000
000 unknowns 50s /Time step Size is
multiplied by 8 and the elapsed time is
multiplied by 10. ? A linear behavior.
- Overview on UH
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
38Elapsed Time Speedup for the Parallel Solver
Elapsed Time in seconds
- Overview on UH
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
Speedup based upon the performance on 2 processors
39Speedup compared to a sequential code
- Overview on UH
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
Speedup based upon the best performance of a
sequential code with the optimum number of
subdomains (between 8-16-20, depending on the
grid size) Amdahls law since 95 of the code is
parallelized.
40Conclusion
- ? Optimum tuning or the solver provides us the
fastest subdomain solver. - ? Parallel processing is one way to deal with
the complexity of computational medicine. - ? Aikten Schwarz a domain decomposition
framework for elliptic solver, is efficient and
robust for distributed computing. - ? This method provides good scalability and
improves performances for small problems with
large number of processors. -
- ? Penalty method allows a fast resolution for
an incompressible Navier-Stokes flow simulations. - ? Bring blood flow simulation close to the
level of efficiency of image processing . -
- Introduction and motivation
- Formulation of Navier Stokes
- Method
- Discretization
- Projection method
- Design of the Elliptic Solver
- Domain Decomposition
- Aitken Schwarz
- Interface solver
- Performance Analysis
- Parallel Results
- NS Applications
41Thank you ! ?