A package of parallel algebraic two-level preconditioners based on PSBLAS - PowerPoint PPT Presentation

About This Presentation
Title:

A package of parallel algebraic two-level preconditioners based on PSBLAS

Description:

(1-lev) Additive Schwarz: basic ingredients. Adjacency graph of A. d-overlap partition of W ... 2-lev hybrid preconditioner / RAS-ILU / UMFPACK. CSDA 2005 - ERCIM2 ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 24
Provided by: danieladi3
Category:

less

Transcript and Presenter's Notes

Title: A package of parallel algebraic two-level preconditioners based on PSBLAS


1
A package of parallel algebraic two-level
preconditioners based on PSBLAS
3rd IASC world conference on
Computational Statistics Data
Analysis CSDA 2005 ERCIM session on QR and
other factorizations
  • P. DAmbra, ICAR-CNR, Naples Branch, Italy
  • D. di Serafino, Second University of Naples,
    Italy
  • S. Filippone, University of Rome Tor-Vergata,
    Italy
  • daniela.diserafino_at_unina2.it

Limassol, Cyprus, October 28-31, 2005
2
Outline
  • Motivations for the package
  • Parallel 2-level algebraic Schwarz
    preconditioners
  • Algebraic formulation
  • Computational kernels
  • PSBLAS-based software architecture of the package
  • Performance results comparisons
  • Future work

3
Motivations for the package
  • Multilevel Schwarz preconditioners
  • Natural parallelism
  • Good convergence properties
  • Suitable for large-scale scientific engineering
    applications
  • PSBLAS a library of basic linear algebra
    operators on sparse matrices for
    distributed-memory parallel computers
  • (Filippone et al., ACM TOMS 26, 2000)
  • Follows the standardization effort of the BLAS
    Technical Forum
  • Infrastructure for portability performance
  • Auxiliary routines for ease of use
    extensibility
  • Modern (OO) Fortran 95 features
  • Smooth upgrade path for integration into legacy
    applications

4
(1-lev) Additive Schwarz basic ingredients
Adjacency graph of A
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
0-overlap partition of W
d-overlap partition of W
5
AS basic ingredients (contd)
Restriction/prolongation operators
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
Restriction of A
6
AS preconditioners computational kernels
global
Classical AS
build
global
local
apply
global
global
Restricted AS (RAS)
build
global
local
apply
local
global
AS with Harmonic Extension (ASH)
build
local
local
apply
global
7
Coarse level correction basic ingredients
Algebraic coarsening uncoupled aggregation
(Vanek et al. 1996 Tuminaro et al., 2000)
Tentative prolongation operator
Smoothed prol./restr. operators
Coarse-level matrix
8
Coarse-lev. correction computational kernels
local
global
build
global
global
apply
global/local
global
9
2-lev Schwarz precond. computational kernels
global
Additive
build
global
global
glob/loc
apply
global
local
Hybrid
global
build
global
global
global
apply
glob/loc
global
10
2-lev Schwarz prec. comput. kernels (contd)
Symmetrized Hybrid
global
build
global
global
global
glob/loc
apply
global
global
global
11
Sw architecture of preconditioner package
iterative solver
psb_prcbld
psb_prcaply
psb_asaply
psb_asbld
psb_2lbld
psb_2laply
2lev DD prec. comput. aux. routines
psb_ovrlbld
UMFPACK
psb_ decaggr
psb_ smthbld
psb_ coarsebld
psb_bjac
psb_sphalo
psb_axpy
psb_spsm
psb_spnmri
parallel comput. auxiliary routines
psb_halo
psb_ovrl
psb_spmm
BLACS MPI
PSBLAS
spilu
sp2mm
spgtdiag
cssm
spscal
sptrans
csmm
serial kernels
12
Test matrices
  • Thermo steady-state 2D thermal diffusion problem
  • 5-point finite difference discretization on a
    1000 x 600 mesh (n 600000, nnz
    2996800)
  • Kivap simulation of commercial automotive engine
    with KIVA3V
  • Pressure correction in a semi-implicit algorithm
    to solve the Navier-Stokes equations for unsteady
    compr. flows
  • Discretiz. with 100000 control volumes, mesh
    varying during the simulation
  • Kivap1 n 86304, nnz 1575568 Kivap2 n
    42204, nnz 755416

Kivap1
13
Computational environment test details
  • Beowulf-class cluster
  • 16 (1500 MHz) Pentium IV processors with 512 MB
    RAM and 256 KB L2 cache
  • Fast Ethernet (100 Mbit/sec)
  • ATLAS 3.6.0 BLAS, BLACS 1.1, mpich 1.2.5, UMFPACK
    4.4
  • Linux Red Hat 7.3, Intel ifc 7.1, gcc 2.96

14
Thermo number of iterations
np OV 0 OV 0 OV 0 OV 0
np 2LDI 2LDU 2LRI 2LRU
1 194 5 194 5
2 185 19 227 6
4 177 31 232 6
8 200 44 224 6
16 178 61 204 6
np OV 1 OV 1 OV 1 OV 1
np 2LDI 2LDU 2LRI 2LRU
1 194 5 194 5
2 177 18 204 5
4 175 26 213 5
8 189 35 216 5
16 179 55 187 5
np OV 2 OV 2 OV 2 OV 2
np 2LDI 2LDU 2LRI 2LRU
1 194 5 194 5
2 193 18 210 5
4 170 30 216 5
8 176 38 218 5
16 175 56 196 5
2LDI, 2LDU 4 block-Jacobi sweeps
BiCGSTABRAS does not converge in 500 iterations
15
Thermo execution time
16
Kivap1 number of iterations
np OV 1 OV 1 OV 1 OV 1 OV 1
np RAS 2LDI 2LDU 2LRI 2LRU
1 12 6 7 6 7
2 13 6 7 6 6
4 13 6 6 6 6
8 14 7 6 6 6
16 15 7 6 6 6
np OV 0 OV 0 OV 0 OV 0 OV 0
np RAS 2LDI 2LDU 2LRI 2LRU
1 12 6 7 6 7
2 16 8 8 7 8
4 16 9 8 8 8
8 20 9 9 9 9
16 22 10 9 10 10
np OV 2 OV 2 OV 2 OV 2 OV 2
np RAS 2LDI 2LDU 2LRI 2LRU
1 12 6 7 6 7
2 11 6 6 6 7
4 12 6 6 6 5
8 14 6 6 6 5
16 14 6 6 6 5
2LDI, 2LDU 4 block-Jacobi sweeps
17
Kivap1 execution times
18
Kivap2 number of iterations
np OV 1 OV 1 OV 1 OV 1 OV 1
np RAS 2LDI 2LDU 2LRI 2LRU
1 38 19 12 19 12
2 44 20 16 20 12
4 45 20 16 19 12
8 63 25 19 23 13
16 65 26 24 21 16
np OV 0 OV 0 OV 0 OV 0 OV 0
np RAS 2LDI 2LDU 2LRI 2LRU
1 38 19 12 19 12
2 58 22 16 21 14
4 54 22 18 22 14
8 72 25 22 22 16
16 90 29 27 25 21
np OV 2 OV 2 OV 2 OV 2 OV 2
np RAS 2LDI 2LDU 2LRI 2LRU
1 38 19 12 19 12
2 40 19 15 21 13
4 46 19 15 18 12
8 55 22 19 21 13
16 59 25 21 20 15
2LDI, 2LDU 4 block-Jacobi sweeps
19
Kivap2 - execution times
20
Trilinos-ML
  • OO approach
  • C language, MPI for message-passing
  • Trilinos C interface
  • Different Multilevel schemes (smoothed
    aggregation, classical AMG, special scheme for
    Maxwells equations)
  • Gauss-Seidel and polynomial smoothers
  • Allows using fine- and coarse-level solvers from
    other packages (AztecOO, UMFPACK, )

Comparison with 2-lev hybrid preconditioner /
RAS-ILU / UMFPACK
21
PSBLAS vs Trilinos-ML Thermo
22
PSBLAS vs Trilinos-ML Kivap1 Kivap2
23
Future work
  • More coarse-level and fine-level solvers
  • More sophisticated aggregation algorithms
  • Multi-level preconditioners
  • Integration and testing of preconditioners in
    fluid-dynamics applications

24
Kivap3 number of iterations
np OV 1 OV 1 OV 1 OV 1 OV 1
np RAS 2LDI 2LDU 2LRI 2LRU
1 33 13 8 13 8
2 34 13 10 13 9
4 36 12 11 13 8
8 37 13 11 13 8
16 37 14 13 13 10
np OV 0 OV 0 OV 0 OV 0 OV 0
np RAS 2LDI 2LDU 2LRI 2LRU
1 33 13 8 13 8
2 50 14 11 15 11
4 47 13 13 14 11
8 50 15 14 16 11
16 58 19 15 15 13
np OV 2 OV 2 OV 2 OV 2 OV 2
np RAS 2LDI 2LDU 2LRI 2LRU
1 33 13 8 13 8
2 33 12 10 14 10
4 34 12 10 13 8
8 37 12 11 14 8
16 38 13 12 14 8
Kivap3 n 56904, nnz 1028800
25
Kivap3 execution times
26
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com