Title: A package of parallel algebraic two-level preconditioners based on PSBLAS
1A package of parallel algebraic two-level
preconditioners based on PSBLAS
3rd IASC world conference on
Computational Statistics Data
Analysis CSDA 2005 ERCIM session on QR and
other factorizations
- P. DAmbra, ICAR-CNR, Naples Branch, Italy
- D. di Serafino, Second University of Naples,
Italy - S. Filippone, University of Rome Tor-Vergata,
Italy - daniela.diserafino_at_unina2.it
Limassol, Cyprus, October 28-31, 2005
2Outline
- Motivations for the package
- Parallel 2-level algebraic Schwarz
preconditioners - Algebraic formulation
- Computational kernels
- PSBLAS-based software architecture of the package
- Performance results comparisons
- Future work
3Motivations for the package
- Multilevel Schwarz preconditioners
- Natural parallelism
- Good convergence properties
- Suitable for large-scale scientific engineering
applications - PSBLAS a library of basic linear algebra
operators on sparse matrices for
distributed-memory parallel computers - (Filippone et al., ACM TOMS 26, 2000)
- Follows the standardization effort of the BLAS
Technical Forum - Infrastructure for portability performance
- Auxiliary routines for ease of use
extensibility - Modern (OO) Fortran 95 features
- Smooth upgrade path for integration into legacy
applications
4(1-lev) Additive Schwarz basic ingredients
Adjacency graph of A
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
0-overlap partition of W
d-overlap partition of W
5AS basic ingredients (contd)
Restriction/prolongation operators
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
Restriction of A
6AS preconditioners computational kernels
global
Classical AS
build
global
local
apply
global
global
Restricted AS (RAS)
build
global
local
apply
local
global
AS with Harmonic Extension (ASH)
build
local
local
apply
global
7Coarse level correction basic ingredients
Algebraic coarsening uncoupled aggregation
(Vanek et al. 1996 Tuminaro et al., 2000)
Tentative prolongation operator
Smoothed prol./restr. operators
Coarse-level matrix
8Coarse-lev. correction computational kernels
local
global
build
global
global
apply
global/local
global
92-lev Schwarz precond. computational kernels
global
Additive
build
global
global
glob/loc
apply
global
local
Hybrid
global
build
global
global
global
apply
glob/loc
global
102-lev Schwarz prec. comput. kernels (contd)
Symmetrized Hybrid
global
build
global
global
global
glob/loc
apply
global
global
global
11Sw architecture of preconditioner package
iterative solver
psb_prcbld
psb_prcaply
psb_asaply
psb_asbld
psb_2lbld
psb_2laply
2lev DD prec. comput. aux. routines
psb_ovrlbld
UMFPACK
psb_ decaggr
psb_ smthbld
psb_ coarsebld
psb_bjac
psb_sphalo
psb_axpy
psb_spsm
psb_spnmri
parallel comput. auxiliary routines
psb_halo
psb_ovrl
psb_spmm
BLACS MPI
PSBLAS
spilu
sp2mm
spgtdiag
cssm
spscal
sptrans
csmm
serial kernels
12Test matrices
- Thermo steady-state 2D thermal diffusion problem
- 5-point finite difference discretization on a
1000 x 600 mesh (n 600000, nnz
2996800)
- Kivap simulation of commercial automotive engine
with KIVA3V - Pressure correction in a semi-implicit algorithm
to solve the Navier-Stokes equations for unsteady
compr. flows - Discretiz. with 100000 control volumes, mesh
varying during the simulation - Kivap1 n 86304, nnz 1575568 Kivap2 n
42204, nnz 755416
Kivap1
13Computational environment test details
- Beowulf-class cluster
- 16 (1500 MHz) Pentium IV processors with 512 MB
RAM and 256 KB L2 cache - Fast Ethernet (100 Mbit/sec)
- ATLAS 3.6.0 BLAS, BLACS 1.1, mpich 1.2.5, UMFPACK
4.4 - Linux Red Hat 7.3, Intel ifc 7.1, gcc 2.96
14Thermo number of iterations
np OV 0 OV 0 OV 0 OV 0
np 2LDI 2LDU 2LRI 2LRU
1 194 5 194 5
2 185 19 227 6
4 177 31 232 6
8 200 44 224 6
16 178 61 204 6
np OV 1 OV 1 OV 1 OV 1
np 2LDI 2LDU 2LRI 2LRU
1 194 5 194 5
2 177 18 204 5
4 175 26 213 5
8 189 35 216 5
16 179 55 187 5
np OV 2 OV 2 OV 2 OV 2
np 2LDI 2LDU 2LRI 2LRU
1 194 5 194 5
2 193 18 210 5
4 170 30 216 5
8 176 38 218 5
16 175 56 196 5
2LDI, 2LDU 4 block-Jacobi sweeps
BiCGSTABRAS does not converge in 500 iterations
15Thermo execution time
16Kivap1 number of iterations
np OV 1 OV 1 OV 1 OV 1 OV 1
np RAS 2LDI 2LDU 2LRI 2LRU
1 12 6 7 6 7
2 13 6 7 6 6
4 13 6 6 6 6
8 14 7 6 6 6
16 15 7 6 6 6
np OV 0 OV 0 OV 0 OV 0 OV 0
np RAS 2LDI 2LDU 2LRI 2LRU
1 12 6 7 6 7
2 16 8 8 7 8
4 16 9 8 8 8
8 20 9 9 9 9
16 22 10 9 10 10
np OV 2 OV 2 OV 2 OV 2 OV 2
np RAS 2LDI 2LDU 2LRI 2LRU
1 12 6 7 6 7
2 11 6 6 6 7
4 12 6 6 6 5
8 14 6 6 6 5
16 14 6 6 6 5
2LDI, 2LDU 4 block-Jacobi sweeps
17Kivap1 execution times
18Kivap2 number of iterations
np OV 1 OV 1 OV 1 OV 1 OV 1
np RAS 2LDI 2LDU 2LRI 2LRU
1 38 19 12 19 12
2 44 20 16 20 12
4 45 20 16 19 12
8 63 25 19 23 13
16 65 26 24 21 16
np OV 0 OV 0 OV 0 OV 0 OV 0
np RAS 2LDI 2LDU 2LRI 2LRU
1 38 19 12 19 12
2 58 22 16 21 14
4 54 22 18 22 14
8 72 25 22 22 16
16 90 29 27 25 21
np OV 2 OV 2 OV 2 OV 2 OV 2
np RAS 2LDI 2LDU 2LRI 2LRU
1 38 19 12 19 12
2 40 19 15 21 13
4 46 19 15 18 12
8 55 22 19 21 13
16 59 25 21 20 15
2LDI, 2LDU 4 block-Jacobi sweeps
19Kivap2 - execution times
20Trilinos-ML
- OO approach
- C language, MPI for message-passing
- Trilinos C interface
- Different Multilevel schemes (smoothed
aggregation, classical AMG, special scheme for
Maxwells equations) - Gauss-Seidel and polynomial smoothers
- Allows using fine- and coarse-level solvers from
other packages (AztecOO, UMFPACK, )
Comparison with 2-lev hybrid preconditioner /
RAS-ILU / UMFPACK
21PSBLAS vs Trilinos-ML Thermo
22PSBLAS vs Trilinos-ML Kivap1 Kivap2
23Future work
- More coarse-level and fine-level solvers
- More sophisticated aggregation algorithms
- Multi-level preconditioners
- Integration and testing of preconditioners in
fluid-dynamics applications
24Kivap3 number of iterations
np OV 1 OV 1 OV 1 OV 1 OV 1
np RAS 2LDI 2LDU 2LRI 2LRU
1 33 13 8 13 8
2 34 13 10 13 9
4 36 12 11 13 8
8 37 13 11 13 8
16 37 14 13 13 10
np OV 0 OV 0 OV 0 OV 0 OV 0
np RAS 2LDI 2LDU 2LRI 2LRU
1 33 13 8 13 8
2 50 14 11 15 11
4 47 13 13 14 11
8 50 15 14 16 11
16 58 19 15 15 13
np OV 2 OV 2 OV 2 OV 2 OV 2
np RAS 2LDI 2LDU 2LRI 2LRU
1 33 13 8 13 8
2 33 12 10 14 10
4 34 12 10 13 8
8 37 12 11 14 8
16 38 13 12 14 8
Kivap3 n 56904, nnz 1028800
25Kivap3 execution times
26(No Transcript)